Notes
Fig. 8B - Perfect reads
Fig. 8C - Targeting oligos
Fig. 8D - Essential targets.
8E. Example genes from each library
Fig. 8F - long TF deletions overview
Fig. 8G - Upstream and downstream read counts.
Fig 8H - Mutant abundance vs. oligo parameters
Assemble figure

Notes

This figure contains data from three ORBIT experiments where mutant libraries were created with mixtures of targeting oligos (from Twist Biosciences). Data figures were made in R notebooks and exported as pdfs. Cosmetic improvements were made in Adobe Illustrator. Note the diagram for Figures 8A was made in Adobe Illustrator.

See related code notebooks:

Supplemental figures

Twist oligo design

Mutant library characterization

Twist TO abundance

Setup packages and plotting for the notebook:

# Check packages
source("../tools/package_setup.R")
# Load packages
library(tidyverse)
library(cowplot)
library(patchwork)
library(kableExtra)
library(ggrastr)
# Code display options
knitr::opts_chunk$set(tidy.opts=list(width.cutoff=60),tidy=FALSE, echo = TRUE, message=FALSE, warning=FALSE, fig.align="center", fig.retina = 2)
# Load plotting tools
source("../tools/plotting_tools.R")
#Modify the plot theme
theme_set(theme_notebook())

Fig. 8B - Perfect reads

df_read_nearest_down <- read_csv('../../data/seq_data/left_side_TO_libs/df_read_nearest_left.csv')
df_read_nearest_up <- read_csv('../../data/seq_data/right_side_TO_libs/df_read_nearest_right.csv')
df_read_nearest_up_down <- bind_rows(df_read_nearest_up %>% mutate(up_down = 'up'),df_read_nearest_down %>%  mutate(up_down = 'down')) %>% 
  mutate(off_target = ifelse(min_val >500, T, F)) %>% 
  mutate(perfect = ifelse(min_val == 0, T, F))

df_read_nearest_up_down

## # A tibble: 536,686 × 15
##    seqnames strand cigar qwidth   start    end width njunc   lib read_…¹ gene_id
##    <chr>    <chr>  <chr>  <dbl>   <dbl>  <dbl> <dbl> <dbl> <dbl>   <dbl> <chr>  
##  1 U00096.3 -      79M2S     81 2978946 2.98e6    79     0     0 2979024 lysR   
##  2 U00096.3 -      6S47M     53 2095079 2.10e6    47     0     0 2095125 hisA   
##  3 U00096.3 +      81M       81 4277913 4.28e6    81     0     0 4277913 soxR   
##  4 U00096.3 -      76M       76 3535810 3.54e6    76     0     0 3535885 ompR   
##  5 U00096.3 +      2S78M     80  317133 3.17e5    78     0     0  317133 ykgA   
##  6 U00096.3 -      51M       51 3059708 3.06e6    51     0     0 3059758 argP   
##  7 U00096.3 -      2S30M     32 2095096 2.10e6    30     0     0 2095125 hisA   
##  8 U00096.3 -      81M       81 3535805 3.54e6    81     0     0 3535885 ompR   
##  9 U00096.3 +      75M       75 4099474 4.10e6    75     0     0 4099474 rhaR   
## 10 U00096.3 +      34M2S     36  417811 4.18e5    34     0     0  417811 phoB   
## # … with 536,676 more rows, 4 more variables: min_val <dbl>, up_down <chr>,
## #   off_target <lgl>, perfect <lgl>, and abbreviated variable name
## #   ¹read_start_pos

Let’s look at both up and down junctions and see how many of the reads (that passed filter) are perfect matches to the targeting oligo deletions we designed.

df_read_nearest_up_down_annot <- df_read_nearest_up_down %>% 
  filter(lib >0) %>% 
  mutate(status = paste0('off_',off_target, '_p_',perfect)) %>% 
  mutate(status = fct_relevel(status, 'off_TRUE_p_FALSE','off_FALSE_p_FALSE','off_FALSE_p_TRUE'))
  
plot_perfect_reads <-   ggplot(df_read_nearest_up_down_annot, aes(x = factor(lib), fill = status)) + 
    geom_bar(color = 'black',position = 'fill', linewidth = 0.25) +
    scale_x_discrete(breaks = c('1','2','3'),labels = c('#1 short TFs', '#2 long TFs', '#3 small RNAs'))+
    scale_y_continuous(labels = scales::label_percent())+
    scale_fill_viridis_d(labels = c('Not perfect >500 bp','Not Perfect <500 bp','Perfect match'), option = 'A')  + 
    labs(y = 'Reads', x = 'Library', fill = NULL) + theme(legend.position = 'bottom')

plot_perfect_reads

Let’s calculate the perfect rate (by read) for each library. Let’s separate by the up and downstream junctions to make sure there are no obvious issues.

df_read_nearest_up_down_annot %>% group_by(lib, perfect, up_down) %>% summarise(n=n()) %>% pivot_wider(names_from = 'perfect', values_from = 'n', names_prefix = 'perfect_') %>% 
  mutate(frac_perfect = perfect_TRUE / (perfect_TRUE + perfect_FALSE))

## # A tibble: 6 × 5
## # Groups:   lib [3]
##     lib up_down perfect_FALSE perfect_TRUE frac_perfect
##   <dbl> <chr>           <int>        <int>        <dbl>
## 1     1 down             1814        71727        0.975
## 2     1 up               2362       118939        0.981
## 3     2 down             2429        48276        0.952
## 4     2 up               3835        93068        0.960
## 5     3 down             2287        51286        0.957
## 6     3 up               2832        72820        0.963

Looks similar to what the plot tells us.

We can also look at the reads assigned to each target gene and ask what fraction of those reads are “perfect.”

df_read_nearest_up_down_annot %>% group_by(lib, gene_id, perfect) %>% summarise(n=n()) %>% pivot_wider(names_from = 'perfect', values_from = 'n', names_prefix = 'perfect_') %>% 
  mutate(frac_perfect = perfect_TRUE / (perfect_TRUE + perfect_FALSE))

## # A tibble: 387 × 5
## # Groups:   lib, gene_id [387]
##      lib gene_id perfect_FALSE perfect_TRUE frac_perfect
##    <dbl> <chr>           <int>        <int>        <dbl>
##  1     1 alpA               31         3528        0.991
##  2     1 argR                3          453        0.993
##  3     1 ariR               15          149        0.909
##  4     1 arsR               61          553        0.901
##  5     1 asnC              102          979        0.906
##  6     1 bolA               31         2714        0.989
##  7     1 caiF               96         4097        0.977
##  8     1 cedA               41          602        0.936
##  9     1 cspA               38          267        0.875
## 10     1 cspC               25         1476        0.983
## # … with 377 more rows

And let’s get a summary on that - the mean and median perfect rates (by gene) for each library:

df_read_nearest_up_down_annot %>% group_by(lib, gene_id, perfect) %>% summarise(n=n()) %>% pivot_wider(names_from = 'perfect', values_from = 'n', names_prefix = 'perfect_') %>% 
  mutate(frac_perfect = perfect_TRUE / (perfect_TRUE + perfect_FALSE)) %>% group_by(lib) %>% summarise(mean_perf = mean(frac_perfect, na.rm = T), med_perf = median(frac_perfect, na.rm = T))

## # A tibble: 3 × 3
##     lib mean_perf med_perf
##   <dbl>     <dbl>    <dbl>
## 1     1     0.914    0.983
## 2     2     0.928    0.985
## 3     3     0.872    0.978

Looks very good. Now let’s move on to the target centric analysis.

Fig. 8C - Targeting oligos

First, let’s find all of the targets that have at least 1 perfect read for both the upstream and downstream junctions. We will consider those targets as “found” (score = 2) and targets that do not have perfect reads will be considered “not found.” (score = 0)

df_TOs <- read_csv("../../../ecoli_ORBIT_paper_final/data/seq_data/right_side_TO_libs/df_TOs.csv")


#Must have at least 1 perfect read
df_read_nearest_all_summary_down <- df_read_nearest_down %>% 
  filter(min_val == 0) %>% 
  group_by(lib, gene_id) %>% 
  summarise( n=n(), min_val = min(min_val, na.rm = T))

df_read_nearest_all_summary_up <- df_read_nearest_up %>% 
  filter(min_val == 0) %>% 
  group_by(lib, gene_id) %>% 
  summarise( n=n(), min_val = min(min_val, na.rm = T))

df_TOs_found_down <- left_join(df_TOs, df_read_nearest_all_summary_down ,by = c('gene'='gene_id', 'lib')) %>% 
  mutate(found_down = ifelse(is.na(n) & is.na(min_val), F, T)) 

df_TOs_found_up_down <- left_join(df_TOs_found_down, df_read_nearest_all_summary_up, suffix = c('_down','_up') ,by = c('gene'='gene_id', 'lib'))%>% 
  mutate(found_up = ifelse(is.na(n_up) & is.na(min_val_up), F, T)) %>% 
  mutate(found_score = found_down + found_up)

df_TOs_found_up_down %>% group_by(lib, found_score) %>% summarise(n=n())

## # A tibble: 10 × 3
## # Groups:   lib [4]
##      lib found_score     n
##    <dbl>       <int> <int>
##  1     0           2    27
##  2     1           0     4
##  3     1           1     1
##  4     1           2    69
##  5     2           0    18
##  6     2           1     3
##  7     2           2   205
##  8     3           0     5
##  9     3           1     5
## 10     3           2    83

Let’s express that same data as a plot:

plot_tos <- df_TOs_found_up_down %>% 
  filter(lib >0) %>% 
  ggplot(aes(x = factor(lib), fill = factor(found_score))) + geom_bar(position = 'stack', color = 'black', linewidth = 0.25)+
  scale_fill_viridis_d( option = 'A', labels = c('Absent','Present in Up or Dn','Present in both')) + 
  scale_x_discrete(breaks =c('0','1','2','3'),labels = c('#0 oPool','#1 short TFs','#2 long TFs','#3 small RNAs'))+
  labs(fill = 'Mutations found', x = "Mutant library", y = '# Designed targeting oligos') + theme(legend.position = 'bottom')

plot_tos

Fig. 8D - Essential targets.

Now let’s look at the absent target genes in more detail. We categorized these targets both programmatically (essential gene lists) and manually (inspecting loci) as essential, perturbing or nonessesnital. See upstream_downstream_combined_analysis.Rmd for details.

df_TOs_found_essential_updated <- read_csv("../../data/high_throughput_experiments/df_TOs_found_essential_updated.csv")

df_TOs_found_essential_updated %>% select(lib, gene, found_down, found_up, found_score, essential, essential_score)

## # A tibble: 420 × 7
##      lib gene  found_down found_up found_score essential                 essen…¹
##    <dbl> <chr> <lgl>      <lgl>          <dbl> <chr>                       <dbl>
##  1     1 alpA  TRUE       TRUE               2 not essential                   0
##  2     1 argR  TRUE       TRUE               2 probable essential / per…       1
##  3     1 ariR  TRUE       TRUE               2 not essential                   0
##  4     1 arsR  TRUE       TRUE               2 not essential                   0
##  5     1 asnC  TRUE       TRUE               2 not essential                   0
##  6     1 bolA  TRUE       TRUE               2 not essential                   0
##  7     1 caiF  TRUE       TRUE               2 not essential                   0
##  8     1 cedA  TRUE       TRUE               2 not essential                   0
##  9     1 cspA  TRUE       TRUE               2 not essential                   0
## 10     1 cspC  TRUE       TRUE               2 not essential                   0
## # … with 410 more rows, and abbreviated variable name ¹essential_score

Let’s plot this data simply, for the subset of genes that were totally absent from the ORBIT mutant libraries (found_score = 0).

plot_essential <- ggplot(df_TOs_found_essential_updated %>% filter(found_score == 0) %>% mutate(essential = factor(essential, levels = c('essential','probable essential / perturbing','not essential'))), 
       aes(x = essential)) + 
  geom_bar(fill = 'light gray')+
  labs(y = 'absent target genes')

plot_essential

8E. Example genes from each library

Let’s look at a specific target gene example from each of the three libraries: soxS, lysR, and mcaS.

First we need to do some housekeeping to setup the plot - read in gene positions, construct gene arrows, and organize reads by length.

df_genes <- read_delim("../../../ecoli_ORBIT_paper_final/data/seq_data/All_instances_of_Genes_in_Escherichia_coli_K-12_substr._MG1655.txt") %>% 
  mutate(left_pos = `Left-End-Position`, right_pos = `Right-End-Position`) %>% 
  select(-`Left-End-Position`, -`Right-End-Position`) %>% 
  mutate(fwd = ifelse(Direction =='+', T, F))

df_genes_lib <- left_join(df_TOs %>% select(lib, 'gene_id'='gene'), df_genes, by = c('gene_id'='Genes'))


df_poly <- df_genes_lib %>% 
 #filter(right_pos<20000) %>% 
 pivot_longer(c(left_pos, right_pos), values_to = 'x_pos', names_to = 'left_right') %>% 
 mutate( y_pos_bottom = 10,y_pos_top = 100, y_pos_center=31.623) %>% 
 pivot_longer(c(y_pos_bottom,y_pos_top, y_pos_center), values_to = 'y_pos', names_to = 'top_bottom') %>% 
 mutate(x_pos = ifelse(left_right == 'left_pos' & Direction == '-' & top_bottom !='y_pos_center', x_pos + 50, x_pos)) %>% 
 mutate(x_pos = ifelse(left_right == 'right_pos' & Direction == '+' & top_bottom !='y_pos_center', x_pos - 50, x_pos)) %>% 
 mutate(point_num = case_when(left_right == 'left_pos' & top_bottom == 'y_pos_bottom' ~ 1,
left_right == 'left_pos' & top_bottom == 'y_pos_center' ~ 2,
left_right == 'left_pos' & top_bottom == 'y_pos_top' ~ 3,
left_right == 'right_pos' & top_bottom == 'y_pos_top' ~ 4,
left_right == 'right_pos' & top_bottom == 'y_pos_center' ~ 5,
left_right == 'right_pos' & top_bottom == 'y_pos_bottom' ~ 6)) %>% 
 arrange(gene_id, point_num)



 kb_label <- function(x){
 # from s to ns
 lab <- paste(x / 1000)
 }

df_read_nearest_up_down_org <- df_read_nearest_up_down %>% 
  arrange(lib, gene_id, up_down, perfect, width ) %>% 
  group_by(lib, gene_id, up_down, perfect) %>% 
  mutate(read_num = row_number())

df_read_nearest_up_down_org

## # A tibble: 536,686 × 16
## # Groups:   lib, gene_id, up_down, perfect [1,485]
##    seqnames strand cigar qwidth  start    end width njunc   lib read_s…¹ gene_id
##    <chr>    <chr>  <chr>  <dbl>  <dbl>  <dbl> <dbl> <dbl> <dbl>    <dbl> <chr>  
##  1 U00096.3 -      22M       22 568000 568021    22     0     0   568021 acrR   
##  2 U00096.3 -      22M       22 568000 568021    22     0     0   568021 acrR   
##  3 U00096.3 -      22M       22 568000 568021    22     0     0   568021 acrR   
##  4 U00096.3 -      22M       22 568000 568021    22     0     0   568021 acrR   
##  5 U00096.3 -      22M       22 568000 568021    22     0     0   568021 acrR   
##  6 U00096.3 -      22M       22 568000 568021    22     0     0   568021 acrR   
##  7 U00096.3 -      22M       22 568000 568021    22     0     0   568021 acrR   
##  8 U00096.3 -      22M       22 568000 568021    22     0     0   568021 acrR   
##  9 U00096.3 -      22M       22 568000 568021    22     0     0   568021 acrR   
## 10 U00096.3 -      23M       23 508201 508223    23     0     0   508223 acrR   
## # … with 536,676 more rows, 5 more variables: min_val <dbl>, up_down <chr>,
## #   off_target <lgl>, perfect <lgl>, read_num <int>, and abbreviated variable
## #   name ¹read_start_pos

Now let’s actually map the reads onto each gene diagram. We will “rasterise” the read data for avoid plotting thousands of individual objects.

gene_list <- c('soxS', 'lysR','mcaS')


plot_gene_examples <- ggplot()+
  geom_polygon(data = df_poly %>% filter(lib >0 & gene_id %in% gene_list)%>% mutate(gene_id = factor(gene_id, levels = gene_list)), aes(x = x_pos, y = y_pos, group = gene_id),fill = 'light gray')+
    rasterise(geom_segment(data = df_read_nearest_up_down_org %>% filter(lib >0 & gene_id %in% gene_list) %>% filter(min_val <500) %>% mutate(gene_id = factor(gene_id, levels = gene_list)),
             aes(x = start, xend = end, y = read_num, yend = read_num, color = perfect), alpha = 0.25, linewidth = 0.5), dpi = 2000, dev = "cairo")+
  labs(x = "Genomic position (Kb)", y = 'Read count')+
   scale_x_continuous(labels = kb_label)+
  scale_y_log10(limits =c(1, NA))+
  scale_color_manual(values = c("#440154FF","#21908CFF"))+
  guides(fill = 'none')+
  facet_wrap(~gene_id, scales = 'free', ncol = 1,strip.position = 'top') + guides(color = 'none')

plot_gene_examples

And let’s get the actual read statistics for this plot:

# need to exclude the lysR reads in lib#0

df_read_nearest_up_down %>% filter(gene_id %in% gene_list & lib >0) %>% 
  filter(min_val<=500) %>% 
  group_by(gene_id, perfect) %>% 
  summarise(n=n())%>% 
  pivot_wider(names_from = 'perfect', values_from = 'n', names_prefix = 'perfect_') %>% 
  mutate(n_total = perfect_TRUE + perfect_FALSE, frac_perfect = perfect_TRUE / (perfect_TRUE + perfect_FALSE))

## # A tibble: 3 × 5
## # Groups:   gene_id [3]
##   gene_id perfect_FALSE perfect_TRUE n_total frac_perfect
##   <chr>           <int>        <int>   <int>        <dbl>
## 1 lysR               27         3244    3271        0.992
## 2 mcaS               42         4063    4105        0.990
## 3 soxS               82         9345    9427        0.991

Fig. 8F - long TF deletions overview

For this genome wide overview, we will make three different plots as sort genome “tracks” and then stack them. First are the designed TOs and they are marked for whether mutants were found or not. Next we have all raw reads displayed (relatively transparent) by whether they are perfect or not. Finally, we have a binned histogram of that read data, making it easier to see the fraction of reads that are imperfect across the library.

plot_to_track <- ggplot(df_TOs_found_essential_updated %>% filter(lib == 2), aes(x = left_oligo_pos, y= found_up, shape = found_up))+
    geom_point(fill = NA, alpha = 0.5) + 
  scale_shape_manual(values = c(13,21))+
  scale_y_discrete(limits=c(T,F))+
  theme_nothing()

plot_hist <- ggplot(df_read_nearest_up %>% filter(lib==2) %>% mutate(perfect = ifelse(min_val ==0, T, F))) + 
  geom_histogram(aes(x =start, fill =perfect ),binwidth=100000,color = 'black', size = 0.25) +
  #geom_point(data = df_TOs_found_essential_updated %>% filter(found_up ==T)%>% filter(lib==2), aes(x = left_oligo_pos, y = -500), shape =21, fill = NA, alpha =0.5)+
  #geom_point(data = df_TOs_found_essential_updated %>% filter(found_up ==F)%>% filter(lib==2), aes(x = left_oligo_pos, y = -1000 ), shape =13, fill = NA, alpha = 0.5)+
  labs(x = "Genomic position (bp)", y = "Reads (in 100kb bin)", fill = 'Perfect match') +
  scale_fill_manual(values = c("#440154FF","#21908CFF"))+
  scale_x_continuous(labels = Mb_label)
  #ylim(-500,NA)+
  #theme(legend.position = 'bottom')

# plot_jitter <- ggplot(df_read_nearest_up %>% filter(lib==2) %>% mutate(perfect = ifelse(min_val ==0, T, F)), aes(x = start, y = perfect, color = perfect)) + 
#   rasterise(geom_jitter(alpha =0.02, shape = 3, size = 0.5, width = 0), dpi = 5000, dev = "cairo")+
#   scale_color_manual(values = c("#440154FF","#21908CFF"))+
#   #scale_x_continuous(labels = Mb_label)+
#   #scale_y_discrete(labels = c('Off target reads','Perfectly matched reads'))+
#   theme_nothing()

plot_jitter <-ggplot(df_read_nearest_up %>% filter(lib==2) %>% mutate(perfect = ifelse(min_val ==0, T, F)), aes(x = start, y = perfect, color = perfect)) + 
  rasterise(geom_text(label = '|', alpha =0.03, position = position_jitter(width = 0, height = 0.25), size = 2), dpi = 2000, dev = "cairo")+
  scale_color_manual(values = c("#440154FF","#21908CFF"))+
  #scale_x_continuous(labels = Mb_label)+
  #scale_y_discrete(labels = c('Off target reads','Perfectly matched reads'))+
  theme_nothing()

#plot_tos/plot_jitter / plot_hist

#plot_lib_2 <- plot_grid(plot_jitter, plot_hist, align = 'v', axis = 'lr', rel_heights = c(0.3, 1), ncol = 1)

plot_lib_2 <- plot_grid(plot_to_track, plot_jitter, plot_hist, align = 'v', axis = 'lr', rel_heights = c(0.2, 0.33, 1), ncol = 1)


plot_lib_2

Fig. 8G - Upstream and downstream read counts.

Now let’s go back and look at the total read counts (upstream + downstream) for all the mutants in the three libraries. We will also use this data for the correlation plots in 8H.

df_vars <- read_csv("../../code/supp_figs/df_vars.csv")

df_vars

## # A tibble: 420 × 28
##    lib       gene  left_o…¹ right…² oligo product go_term n_down min_v…³ found…⁴
##    <chr>     <chr>    <dbl>   <dbl> <chr> <chr>   <chr>    <dbl>   <dbl> <lgl>  
##  1 short TFs alpA   2758649 2758836 CTCT… CP4-57… DNA bi…   1622       0 TRUE   
##  2 short TFs argR   3384708 3385153 GCAC… DNA-bi… DNA bi…    170       0 TRUE   
##  3 short TFs ariR   1216374 1216615 AAAC… putati… protei…    115       0 TRUE   
##  4 short TFs arsR   3648533 3648861 TCGA… DNA-bi… transc…    131       0 TRUE   
##  5 short TFs asnC   3926565 3926998 TAAA… DNA-bi… protei…    550       0 TRUE   
##  6 short TFs bolA    454477  454769 GCGC… DNA-bi… protei…    969       0 TRUE   
##  7 short TFs caiF     34305   34675 ACGC… DNA-bi… DNA-bi…    789       0 TRUE   
##  8 short TFs cedA   1813441 1813658 GCAA… cell d… DNA bi…    195       0 TRUE   
##  9 short TFs cspA   3720054 3720241 TCGC… cold s… nuclei…     80       0 TRUE   
## 10 short TFs cspC   1907246 1907430 CACA… stress… nuclei…    339       0 TRUE   
## # … with 410 more rows, 18 more variables: n_up <dbl>, min_val_up <dbl>,
## #   found_up <lgl>, found_score <dbl>, down_total <dbl>, up_total <dbl>,
## #   down_norm <dbl>, up_norm <dbl>, to_abun <dbl>, lib_total <dbl>,
## #   to_abun_norm <dbl>, oligo_monomer <dbl>, oligo_homodimer <dbl>,
## #   arm_1_duplex <dbl>, arm_2_duplex <dbl>, arm_total_duplex <dbl>,
## #   ori_dist <dbl>, del_len <dbl>, and abbreviated variable names
## #   ¹left_oligo_pos, ²right_oligo_pos, ³min_val_down, ⁴found_down

Represent the data as boxplots and show individual genes as points.

plot_n_up_down <- ggplot(df_vars %>% filter(lib >0), aes(x = factor(lib), y = n_down + n_up)) + 
  geom_boxplot() + 
  geom_jitter(width = 0.2, height =0, alpha = 0.1, shape =21) + 
  scale_y_log10(limits = c(NA, NA))+
  scale_x_discrete(labels = c('short TFs','long TFs','small RNAs')) +
  labs(x = "Library", y = 'Perfect reads per target\n(Upstream & downstream junctions)')

plot_n_up_down

Fig 8H - Mutant abundance vs. oligo parameters

For all these correlations, only the upstream junction read data was used, since that library had higher counts.

First, let’s look at the Twist TO abundance.

df_vars %>% 
  filter(lib == 'long TFs') %>% 
  filter(n_up>0 & to_abun>10) %>% 
  mutate(n_up = log(n_up, 10), to_abun = log(to_abun, 10)) %>% 
  lm(n_up ~ to_abun, data =.) %>%
  summary()

## 
## Call:
## lm(formula = n_up ~ to_abun, data = .)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.92039 -0.41289  0.07131  0.45950  1.46334 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -0.9070     0.7855  -1.155     0.25    
## to_abun       1.1040     0.2727   4.048 7.32e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6259 on 204 degrees of freedom
## Multiple R-squared:  0.07437,    Adjusted R-squared:  0.06983 
## F-statistic: 16.39 on 1 and 204 DF,  p-value: 7.322e-05

plot_to_abun <- ggplot(df_vars %>% filter(to_abun >10 & lib=='long TFs'), aes(x = to_abun, y = n_up)) +
  geom_smooth(method = 'lm')+ 
  geom_point(shape = 21, alpha = 0.25) +
  #scale_x_log10() +
  scale_y_log10()+
  labs(x = 'Reads per oligo', y = 'Perfect reads per target', title = 'Library #2 Long TFs',subtitle = "Reads per target ~ oligo abundance (adj R2 = 0.07, p = 0.00007)")

plot_to_abun

Next we examine the homology arm free energy.

df_vars %>% 
  filter(lib == 'long TFs') %>% 
  filter(n_up>0) %>% 
  mutate(n_up = log(n_up, 10)) %>% 
  lm(n_up ~ arm_total_duplex, data =.) %>%
  summary()

## 
## Call:
## lm(formula = n_up ~ arm_total_duplex, data = .)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.77576 -0.40572  0.08949  0.39535  1.43891 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      -0.800815   0.571063  -1.402    0.162    
## arm_total_duplex -0.018644   0.003463  -5.384 1.98e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6084 on 205 degrees of freedom
## Multiple R-squared:  0.1239, Adjusted R-squared:  0.1196 
## F-statistic: 28.99 on 1 and 205 DF,  p-value: 1.981e-07

plot_hom_energy <- ggplot(df_vars %>% filter(lib == 'long TFs') , aes(x = arm_total_duplex, y = n_up)) +
  geom_smooth(method = 'lm')+ 
  geom_point(shape = 21, alpha = 0.25) +
  #scale_x_log10() +
  scale_y_log10()+
  labs(x = 'Homology arm free energy (kcal/mol)', y = 'Perfect reads per target', title = 'Library #2 Long TFs',subtitle = "Reads per target ~ homology arm free energy (adj R2 = 0.12, p = 1.98e-7)")

plot_hom_energy

Finally we look at the distance to the origin of replication. We will fit a linear model here just for consistency, but we are only displaying a loess smoothed trend line to show that the ori_dist relationship does not look linear.

df_vars %>% 
  filter(lib == 'long TFs') %>% 
  filter(n_up>0) %>% 
  mutate(n_up = log(n_up, 10)) %>% 
  lm(n_up ~ ori_dist, data =.) %>%
  summary()

## 
## Call:
## lm(formula = n_up ~ ori_dist, data = .)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.34970 -0.35669  0.09216  0.43716  1.29633 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.472e+00  8.664e-02  28.536  < 2e-16 ***
## ori_dist    -1.830e-07  6.585e-08  -2.779  0.00596 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6381 on 205 degrees of freedom
## Multiple R-squared:  0.03631,    Adjusted R-squared:  0.0316 
## F-statistic: 7.723 on 1 and 205 DF,  p-value: 0.005959

plot_ori_dist <- ggplot(df_vars %>% filter(lib == 'long TFs') , aes(x = ori_dist, y = n_up)) +
  geom_smooth()+ 
  geom_point(shape = 21, alpha = 0.25) +
  scale_x_continuous(labels = scales::label_comma()) +
  scale_y_log10()+
  labs(x = 'Distance to ori (bp)', y = 'Perfect reads per target', title = 'Library #2 Long TFs',subtitle = "smooth")

plot_ori_dist

Assemble figure

theme_set(theme_figure())

plot_top <- plot_grid( plot_perfect_reads, plot_tos,NULL, 
                      NULL, NULL, plot_essential, 
                      align = 'hv', axis = 'lr', rel_widths = c(2,2,1), rel_heights = c(1,0.4), ncol = 3, nrow =2)

plot_top

#save_plot("../../figures/r_pdf_figs/main_figs/fig_8_top.pdf", plot_top, base_width = 5, base_height = 3.25)

plot_lib_2 <- plot_grid(plot_to_track, plot_jitter, plot_hist, align = 'v', axis = 'lr', rel_heights = c(0.2, 0.33, 1), ncol = 1)

plot_middle_pdf <- plot_grid(plot_gene_examples, plot_lib_2, align = 'hv', axis = 'lr', rel_widths = c(1,3), ncol = 2)

plot_middle_pdf

#save_plot("../../figures/r_pdf_figs/main_figs/fig_8_middle.pdf", plot_middle_pdf, base_width = 9, base_height = 3)

plot_bottom <- plot_grid(plot_n_up_down, plot_to_abun, plot_hom_energy, plot_ori_dist, align = 'hv', axis = 'tblr', ncol = 4)

plot_bottom

#save_plot("../../figures/r_pdf_figs/main_figs/fig_8_bottom.pdf", plot_bottom, base_width = 9, base_height = 2)

sessionInfo()

## R version 4.2.0 (2022-04-22)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur/Monterey 10.16
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] patchwork_1.1.2   ggrastr_1.0.1     kableExtra_1.3.4  cowplot_1.1.1    
##  [5] viridis_0.6.2     viridisLite_0.4.1 knitr_1.41        forcats_0.5.2    
##  [9] stringr_1.5.0     dplyr_1.1.0       purrr_0.3.5       readr_2.1.3      
## [13] tidyr_1.2.1       tibble_3.1.8      ggplot2_3.4.0     tidyverse_1.3.2  
## 
## loaded via a namespace (and not attached):
##  [1] httr_1.4.4          sass_0.4.4          splines_4.2.0      
##  [4] bit64_4.0.5         vroom_1.6.0         jsonlite_1.8.3     
##  [7] modelr_0.1.10       bslib_0.4.1         assertthat_0.2.1   
## [10] highr_0.9           vipor_0.4.5         googlesheets4_1.0.1
## [13] cellranger_1.1.0    yaml_2.3.6          lattice_0.20-45    
## [16] pillar_1.8.1        backports_1.4.1     glue_1.6.2         
## [19] digest_0.6.30       rvest_1.0.3         colorspace_2.0-3   
## [22] Matrix_1.5-3        htmltools_0.5.4     pkgconfig_2.0.3    
## [25] broom_1.0.1         haven_2.5.1         scales_1.2.1       
## [28] webshot_0.5.4       svglite_2.1.0       tzdb_0.3.0         
## [31] timechange_0.1.1    googledrive_2.0.0   mgcv_1.8-41        
## [34] farver_2.1.1        generics_0.1.3      ellipsis_0.3.2     
## [37] cachem_1.0.6        withr_2.5.0         cli_3.4.1          
## [40] magrittr_2.0.3      crayon_1.5.2        readxl_1.4.1       
## [43] evaluate_0.18       fs_1.5.2            fansi_1.0.3        
## [46] nlme_3.1-160        xml2_1.3.3          beeswarm_0.4.0     
## [49] Cairo_1.6-0         tools_4.2.0         hms_1.1.2          
## [52] gargle_1.2.1        lifecycle_1.0.3     munsell_0.5.0      
## [55] reprex_2.0.2        compiler_4.2.0      jquerylib_0.1.4    
## [58] systemfonts_1.0.4   rlang_1.1.1         grid_4.2.0         
## [61] rstudioapi_0.14     labeling_0.4.2      rmarkdown_2.18     
## [64] gtable_0.3.1        DBI_1.1.3           R6_2.5.1           
## [67] gridExtra_2.3       lubridate_1.9.0     fastmap_1.1.0      
## [70] bit_4.0.5           utf8_1.2.2          stringi_1.7.8      
## [73] ggbeeswarm_0.7.1    parallel_4.2.0      vctrs_0.5.2        
## [76] dbplyr_2.2.1        tidyselect_1.2.0    xfun_0.35

Figure 8: TO twist libraries

E. coli ORBIT 2022

Scott H. Saunders