Read in data and figure out TO / read orientation
Match reads to putative TOs
How many of the intended TO mutations were recovered?

library(tidyverse)
library(GenomicAlignments)
library(kableExtra)
library(ggridges)

knitr::opts_chunk$set(tidy.opts=list(width.cutoff=60),tidy=FALSE, echo = TRUE, message=FALSE, warning=FALSE, fig.align="center", fig.retina = 2)

source("../tools/plotting_tools.R")
theme_set(theme_notebook())

This notebook performs the analysis on the mapped reads for the four ORBIT libraries that were constructed. As a reminder, these different deletion libraries were as follows:

Library #0: galK, hisA, metA, leuD (gold standards) + 23 TF genes (27 targets total)
Library #1: All TF genes < 575 bp (76 targets total)
Library #2: All TF genes > 575 bp (226 targets total)
Library #3: All small RNA genes (90 targets total)

Within each library the targeting oligo deletes a target gene, and pInt_kanR should integrate into the newly formed attB site on the genome. This should leave us with a library with the target genes deleted, replaced by the pInt_kanR backbone flanked by the attR and attL sites. To test the accuracy with which these libraries were constructed sequencing libraries were prepared to amplify these genome-pInt junctions. This was done by shearing genomic DNA, adding C tails to free DNA ends and then amplifying junctions with a polyG and pInt specific primer. This notebook takes the processed (filtered, trimmed, mapped) reads from the processing notebook and uses these mapped sequences to analyze how accurately the designed ORBIT mutants were created.

Read in data and figure out TO / read orientation

First, we need to read in the mapped reads for each left side library. These are stored as bam files. Here we combine all of the mapped reads into one dataframe that will go through analysis.

bam_path <- "../../data/seq_data/left_side_TO_libs/6_bam_files/"

df_bam_0_1 <- readGAlignments(file = paste0(bam_path,"lib_0_left_mapped.bam")) %>% as_tibble() %>% mutate(lib=0)

df_bam_1_1 <- readGAlignments(file = paste0(bam_path,"lib_1_left_mapped.bam")) %>% as_tibble() %>% mutate(lib=1)

df_bam_2_1 <- readGAlignments(file = paste0(bam_path,"lib_2_left_mapped.bam")) %>% as_tibble() %>% mutate(lib=2)

df_bam_3_1 <- readGAlignments(file = paste0(bam_path,"lib_3_left_mapped.bam")) %>% as_tibble() %>% mutate(lib=3)

#df_bam_0 %>% group_by(start) %>% summarise(n=n()) %>% arrange(desc(n))


df_data_all <- bind_rows(df_bam_0_1,df_bam_1_1,df_bam_2_1,df_bam_3_1)

df_data_all %>% head(10) %>% kable() %>% kable_styling()

seqnames	cigar	qwidth	start	end	width
U00096.3	55M6S	61	2979936	2979990	55
U00096.3	30M	30	2979936	2979965	30
U00096.3	48M	48	2979936	2979983	48
U00096.3	52M5S	57	2979936	2979987	52
U00096.3	39M	39	2979936	2979974	39
U00096.3	32M1D42M	74	2977610	2977684	75
U00096.3	57M	57	2095844	2095900	57
U00096.3	75M	75	4098577	4098651	75
U00096.3	45M	45	417103	417147	45
U00096.3	61M	61	417087	417147	61

This full data frame is ~330k lines, meaning that we are working with that many reads total. Next we need to read in the information for the targeting oligos that we designed. Each oligo was ordered to delete a certain region of the genome, as specified by a left and right genomic position, corresponding roughly to the start and end of the target genes. Here we read in the information for the different TO libraries.

df_TOs <- read_csv("../../../ecoli_ORBIT_paper_final/data/seq_data/left_side_TO_libs/df_TOs.csv")

df_TOs

## # A tibble: 420 × 7
##      lib gene  left_oligo_pos right_oligo_pos oligo              product go_term
##    <dbl> <chr>          <dbl>           <dbl> <chr>              <chr>   <chr>  
##  1     1 alpA         2758649         2758836 CTCTCTGCAACCAAAGT… CP4-57… DNA bi…
##  2     1 argR         3384708         3385153 GCACAATAATGTTGTAT… DNA-bi… DNA bi…
##  3     1 ariR         1216374         1216615 AAACTTATACTTAATAA… putati… protei…
##  4     1 arsR         3648533         3648861 TCGAAGAGAGACACTAC… DNA-bi… transc…
##  5     1 asnC         3926565         3926998 TAAAATAGAATGAATCA… DNA-bi… protei…
##  6     1 bolA          454477          454769 GCGCCGACAGTTGCAAA… DNA-bi… protei…
##  7     1 caiF           34305           34675 ACGCCCGGTATCAACGA… DNA-bi… DNA-bi…
##  8     1 cedA         1813441         1813658 GCAAAACCGGGAGGTAT… cell d… DNA bi…
##  9     1 cspA         3720054         3720241 TCGCCGAAAGGCACACT… cold s… nuclei…
## 10     1 cspC         1907246         1907430 CACACTTCAGATCAGTG… stress… nuclei…
## # … with 410 more rows

Figure out starting position of read for each TO. Remember that only the fwd sequence of attB was used in the actual oligo to prevent PCR issues. Normally the attB sequence would be directed the same way as the gene, but here the only thing that affects the directionality is the replichore. So for replichore 1 (>3.9M & <1.6M), the oligo sequence comes from the reverse strand, meaning that the attB also points “reverse”. Therefore when sequencing the “right side library” the read maps to the forward + strand and the “left side library” maps to the reverse - strand. For replichore 2 it is the opposite. The TO sequence comes from the + strand, meaning attB points fwd. Therefore the upstream (aka right) library maps to the - strand and the downstream (aka left) maps to the + strand.

We can see the evidence that this is actually how the ORBIT library sequences are oriented by looking at which strand the reads map to vs. the genomic position (replichore).

ggplot(df_data_all %>% mutate(strand = factor(strand, levels = c('-','+'))) , 
       aes(x = start, y = strand)) + 
  geom_vline(xintercept = c(1590250.5,3923882.5), color = 'red')+
  geom_jitter(height=0.25, shape =21, alpha = 0.01)+
  facet_wrap(~lib, ncol = 1, scales = 'free')

It is clear that most reads are coming from the + strand on replichore 1 for all libraries, and the - strand for replichore 2 as expected. Interestingly, the small number of reads that do not conform to this, may indicate an off target integration that wasn’t specified in our designed TO library.

This orientation issue is important to understand, because we need to be able to match the mapped reads to where we expect a read to start if the designed TO worked perfectly. Now that we know our orientation intuition seems right, we can look at the list of TOs and assign what should be the theoretical “read_start_position” if the mutant were made perfectly. Remember these oligos were specified by the region between two genomic positions (left_oligo_pos & right_oligo_pos), so the first genomic sequence outside of these insertions should either be the left or right specified position. The upstream junction (aka right) should start at the right_oligo_pos for replichore 1 targets, and the left_oligo_pos for replichore 2 targets based on their orientations.

Therefore, we will just assign the correct oligo pos to the TOs.

ori <-  3923882.5
ter <-  1590250.5

##*** CHANGING FOR RIGHT SIDE

df_TOs <- df_TOs %>% 
  mutate(gen_rep = ifelse(left_oligo_pos < ori & left_oligo_pos > ter, 2,1)) %>% 
  mutate(read_start_pos = ifelse(gen_rep==1, right_oligo_pos,left_oligo_pos)) %>% 
  mutate(read_start_pos = ifelse(gene == 'metA', left_oligo_pos, read_start_pos)) #metA is the only special case, because it uses the - direction attB site in lib 0.

df_TOs

## # A tibble: 420 × 9
##      lib gene  left_oligo_pos right_olig…¹ oligo product go_term gen_rep read_…²
##    <dbl> <chr>          <dbl>        <dbl> <chr> <chr>   <chr>     <dbl>   <dbl>
##  1     1 alpA         2758649      2758836 CTCT… CP4-57… DNA bi…       2 2758649
##  2     1 argR         3384708      3385153 GCAC… DNA-bi… DNA bi…       2 3384708
##  3     1 ariR         1216374      1216615 AAAC… putati… protei…       1 1216615
##  4     1 arsR         3648533      3648861 TCGA… DNA-bi… transc…       2 3648533
##  5     1 asnC         3926565      3926998 TAAA… DNA-bi… protei…       1 3926998
##  6     1 bolA          454477       454769 GCGC… DNA-bi… protei…       1  454769
##  7     1 caiF           34305        34675 ACGC… DNA-bi… DNA-bi…       1   34675
##  8     1 cedA         1813441      1813658 GCAA… cell d… DNA bi…       2 1813441
##  9     1 cspA         3720054      3720241 TCGC… cold s… nuclei…       2 3720054
## 10     1 cspC         1907246      1907430 CACA… stress… nuclei…       2 1907246
## # … with 410 more rows, and abbreviated variable names ¹right_oligo_pos,
## #   ²read_start_pos

There is one exception here, which is the metA oligo. This is because this oligo was only in the library 0 from IDT, which did not require PCR amplification, so the same oligo sequence was used as from the rest of the paper, which has attB facing the opposite orientation as the scheme described above (attB faces the same way as the gene).

Match reads to putative TOs

With all that sorted we can now iterate through every read in the dataset and try to find the closest TO match. Then we can evaluate how close the reads are to the expected start sites for their inferred TO.

This for loop achieves this very simply, by iterating through each read, then measuring the distance between the read start position and the start positions of every TO designed for that library. The nearest TO is assigned to the read as a ‘gene_id’ and that minimum distance is recorded with the read. This loop takes ~30 min to run on a desktop, but the csv output can be read in without running.

# Takes ~30 min to run

#Initialize dataframe to store reads
df_read_nearest <- tibble()

#Based on alignment, if read maps to + strand, then the illumina start of the read is the start of the alignment.
#if read maps to - strand then illumina read start is actually the end of the mapped read
df_test <- df_data_all %>% 
  mutate(read_start_pos = ifelse(strand =='+', start, end))

#Pre split the different TOs into dataframes by library to make loop run faster.
df_TO_0 <- df_TOs %>% filter(lib==0)
df_TO_1 <- df_TOs %>% filter(lib==1)
df_TO_2 <- df_TOs %>% filter(lib==2)
df_TO_3 <- df_TOs %>% filter(lib==3)

#Iterates through each mapped read
for( row in 1:nrow(df_test)){
  
  #picks the set of TOs that matches the library number
  if(df_test$lib[row]==0){
    df_to <- df_TO_0
    }else if(df_test$lib[row]==1){
    df_to <- df_TO_1
    }else if(df_test$lib[row]==2){
    df_to <- df_TO_2
    }else if(df_test$lib[row]==3){
    df_to <- df_TO_3
    }
  
  #Compares the `read start pos` of each designed TO in the library, to the `read start pos` of the read (vector).
  #Then finds the minimum difference min(abs()) of the distances between this read and all TOs
  min_val <- min(abs(df_to$read_start_pos - df_test$read_start_pos[row]))
  
  #Same as above, but returns the TO index of the min val, 
  #which is then used to get the actual TO that has the closest start pos to the read. 
  #The gene this TO deletes is assigned to the read as `gene_id` and the min_val is also stored
  min_index <- which.min(abs(df_to$read_start_pos - df_test$read_start_pos[row]))
  gene_id <- df_to$gene[min_index]
  df_reads <- tibble(df_test[row,],gene_id = gene_id, min_val = min_val) 
  
  #add this read to the growing dataframe of processed reads
  df_read_nearest <- bind_rows(df_read_nearest, df_reads)
  
} 

df_read_nearest

write_csv(df_read_nearest, '../../data/seq_data/left_side_TO_libs/df_read_nearest_left.csv')

Now that we have gone through and categorized each read and calculated the minimum distance to a TO target, we can further classify reads as perfect (min_val=0) or as off target if it falls beyond 500bp away (min_val>500). Then we can get summary statistics. What fraction of reads are off target?

df_read_nearest <- read_csv('../../data/seq_data/left_side_TO_libs/df_read_nearest_left.csv')

df_read_nearest <- df_read_nearest %>% 
  mutate(gene_id = ifelse(min_val >500, 'off_target',gene_id)) %>% 
  mutate(off_target = ifelse(gene_id=='off_target',T,F)) %>% 
  mutate(perfect = ifelse(min_val==0, T, F))

df_read_nearest %>% group_by(lib, off_target) %>% summarise(n=n()) %>% 
  pivot_wider(names_from = off_target, names_prefix = 'off_target_', values_from = n) %>% 
  mutate(total =off_target_TRUE + off_target_FALSE) %>% 
  mutate(frac_off_target = off_target_TRUE / (off_target_TRUE + off_target_FALSE))

## # A tibble: 4 × 5
## # Groups:   lib [4]
##     lib off_target_FALSE off_target_TRUE total frac_off_target
##   <dbl>            <int>           <int> <int>           <dbl>
## 1     0            24497            2517 27014          0.0932
## 2     1            72541            1000 73541          0.0136
## 3     2            48883            1822 50705          0.0359
## 4     3            52633             940 53573          0.0175

Less than 4% of reads are off target across all the twist libraries! About 9.3% of reads were off target for the IDT lib #0. And we can also look at how many reads were perfect:

df_read_nearest %>% group_by(lib, perfect) %>% summarise(n=n()) %>% 
  pivot_wider(names_from = perfect, names_prefix = 'perfect_', values_from = n) %>% 
  mutate(total = perfect_TRUE + perfect_FALSE) %>% 
  mutate(frac_perfect = perfect_TRUE / (perfect_TRUE + perfect_FALSE))

## # A tibble: 4 × 5
## # Groups:   lib [4]
##     lib perfect_FALSE perfect_TRUE total frac_perfect
##   <dbl>         <int>        <int> <int>        <dbl>
## 1     0          2859        24155 27014        0.894
## 2     1          1814        71727 73541        0.975
## 3     2          2429        48276 50705        0.952
## 4     3          2287        51286 53573        0.957

More than 95% of reads from all the twist libraries were perfect (>89% of the IDT lib0)! We can visualize the perfect and off target score graphically, which is shown as a summary metric in the main fig:

df_read_nearest %>% 
  mutate(status = paste0('off_',off_target, '_p_',perfect)) %>% 
  mutate(status = fct_relevel(status, 'off_TRUE_p_FALSE','off_FALSE_p_FALSE','off_FALSE_p_TRUE')) %>% 
  ggplot(aes(x = lib, fill = status)) + 
    geom_bar(color = 'black',position = 'fill') +
    scale_y_continuous(labels = scales::label_percent())+
    scale_fill_viridis_d(labels = c('Off target / Not Perfect','On Target / Not Perfect','On Target / Perfect'), option = 'A')  + 
    labs(y = 'Reads', x = 'Library', fill = 'Category')

How many of the intended TO mutations were recovered?

Instead of taking a “read-centric” view, we can instead take a “target-centric” view and ask how many of the TOs we designed actually yielded mutants in our libraries. First we will count up all the reads for each gene_id - this will also be useful later.

#Group by lib and gene and count up reads assigned to each gene
df_read_nearest_summary <- df_read_nearest %>% 
  group_by(lib, gene_id) %>% 
  summarise( n=n(), min_val = mean(min_val))

df_read_nearest_summary %>% head(10) %>% kable() %>% kable_styling()

gene_id	n	min_val
acrR	1079	0.0787766
alaS	1	0.0000000
argP	468	0.0064103
arsR	117	0.1025641
envR	51	0.1176471
fnr	121	0.0247934
fur	19	0.0000000
galK	2040	0.0470588
galR	1622	0.0197287
hisA	2476	0.0076737

And then we will see if there are any TOs that have no reads. From there we can look at the fraction of each TO library for which a mutant was not recovered (within the 500bp threshold).

#Merge the read counts 
df_TOs_found <- left_join(df_TOs, df_read_nearest_summary, by = c('gene'='gene_id', 'lib')) %>% 
  mutate(found = ifelse(is.na(n) & is.na(min_val), F, T))

df_TOs_found %>% 
  group_by(lib, found) %>% 
  summarise(n=n()) %>% 
  pivot_wider(names_from = found, names_prefix = 'found_', values_from = n, values_fill = 0) %>% 
  mutate(frac_found = found_TRUE / (found_TRUE + found_FALSE))

## # A tibble: 4 × 4
## # Groups:   lib [4]
##     lib found_TRUE found_FALSE frac_found
##   <dbl>      <int>       <int>      <dbl>
## 1     0         27           0      1    
## 2     1         70           4      0.946
## 3     2        206          20      0.912
## 4     3         88           5      0.946

So for all 4 libraries, at least 90% of the TOs were found with mutants represented. These are the mutants that were not found:

df_TO_not_found <- df_TOs_found %>% filter(found == F)

df_TO_not_found %>% select(-go_term,-oligo) %>% kable() %>% kable_styling()

lib	gene	left_oligo_pos	right_oligo_pos	product	gen_rep	read_start_pos	n	min_val	found
1	dicA	1647939	1648321	DNA-binding transcriptional dual regulator DicA	2	1647939	NA	NA	FALSE
1	hns	1292529	1292917	DNA-binding transcriptional dual regulator H-NS	1	1292917	NA	NA	FALSE
1	mqsA	3167871	3168241	antitoxin of the MqsRA toxin-antitoxin system / DNA-binding transcriptional repressor MqsA	2	3167871	NA	NA	FALSE
1	racR	1419785	1420236	Rac prophage; DNA-binding transcriptional repressor RacR	1	1420236	NA	NA	FALSE
2	ada	2309361	2310400	DNA-binding transcriptional dual regulator / DNA repair protein Ada	2	2309361	NA	NA	FALSE
2	alaS	2819401	2822006	alanine—tRNA ligase/DNA-binding transcriptional repressor	2	2819401	NA	NA	FALSE
2	arcA	4639610	4640301	ArcA-P<sup>asp54</sup>-phosphate // Phosphorylated DNA-binding transcriptional dual regulator ArcA // DNA-binding transcriptional dual regulator ArcA	1	4640301	NA	NA	FALSE
2	birA	4173087	4174027	DNA-binding transcriptional repressor/biotin-[acetyl-CoA-carboxylase] ligase BirA	1	4174027	NA	NA	FALSE
2	dcuR	4349335	4350029	DcuR // DNA-binding transcriptional activator DcuR	1	4350029	NA	NA	FALSE
2	dnaA	3882346	3883724	chromosomal replication initiator protein DnaA	2	3882346	NA	NA	FALSE
2	fnr	1398794	1399521	DNA-binding transcriptional dual regulator FNR	1	1399521	NA	NA	FALSE
2	gntR	3577751	3578721	DNA-binding transcriptional repressor GntR	2	3577751	NA	NA	FALSE
2	lexA	4257120	4257703	DNA-binding transcriptional repressor LexA	1	4257703	NA	NA	FALSE
2	metR	4011883	4012811	DNA-binding transcriptional dual regulator MetR	1	4012811	NA	NA	FALSE
2	nusA	3316059	3317521	transcription termination/antitermination protein NusA	2	3316059	NA	NA	FALSE
2	obgE	3330602	3331749	GTPase ObgE	2	3330602	NA	NA	FALSE
2	rpoD	3213052	3214868	RNA polymerase, sigma 70 (sigma D) factor	2	3213052	NA	NA	FALSE
2	rpoE	2709457	2710007	RNA polymerase sigma E factor	2	2709457	NA	NA	FALSE
2	rpoH	3599949	3600778	RNA polymerase, sigma 32 (sigma H) factor	2	3599949	NA	NA	FALSE
2	sgrR	75664	77294	DNA-binding transcriptional dual regulator SgrR	1	77294	NA	NA	FALSE
2	uvrY	1994723	1995354	Phosphorylated DNA-binding transcriptional activator UvrY // DNA-binding transcriptional activator UvrY	2	1994723	NA	NA	FALSE
2	yfiE	2714459	2715315	putative LysR-type DNA-binding transcriptional regulator YfiE	2	2714459	NA	NA	FALSE
2	yheO	3475738	3476435	DNA-binding transcriptional regulator YheO	2	3475738	NA	NA	FALSE
2	yidP	3863904	3864595	putative DNA-binding transcriptional regulator YidP	2	3863904	NA	NA	FALSE
3	ffs	476447	476562	signal recognition particle 4.5S RNA	1	476562	NA	NA	FALSE
3	ohsC	2700519	2700599	small regulatory RNA OhsC	2	2700519	NA	NA	FALSE
3	orzP	2204401	2204485	small RNA OrzP	2	2204401	NA	NA	FALSE
3	sroE	2640594	2640687	small RNA SroE	2	2640594	NA	NA	FALSE
3	tff	189711	189848	putative small RNA T44	1	189848	NA	NA	FALSE

There are 29 spread across the 3 twist libraries. We will investigate in further detail in combination with the left side library.

sessionInfo()

## R version 4.2.0 (2022-04-22)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur/Monterey 10.16
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] viridis_0.6.2               viridisLite_0.4.1          
##  [3] ggridges_0.5.4              kableExtra_1.3.4           
##  [5] GenomicAlignments_1.34.0    Rsamtools_2.14.0           
##  [7] Biostrings_2.66.0           XVector_0.38.0             
##  [9] SummarizedExperiment_1.28.0 Biobase_2.58.0             
## [11] MatrixGenerics_1.10.0       matrixStats_0.63.0         
## [13] GenomicRanges_1.50.1        GenomeInfoDb_1.34.4        
## [15] IRanges_2.32.0              S4Vectors_0.36.0           
## [17] BiocGenerics_0.44.0         forcats_0.5.2              
## [19] stringr_1.5.0               dplyr_1.0.10               
## [21] purrr_0.3.5                 readr_2.1.3                
## [23] tidyr_1.2.1                 tibble_3.1.8               
## [25] ggplot2_3.4.0               tidyverse_1.3.2            
## 
## loaded via a namespace (and not attached):
##  [1] bitops_1.0-7           fs_1.5.2               bit64_4.0.5           
##  [4] lubridate_1.9.0        webshot_0.5.4          httr_1.4.4            
##  [7] tools_4.2.0            backports_1.4.1        bslib_0.4.1           
## [10] utf8_1.2.2             R6_2.5.1               DBI_1.1.3             
## [13] colorspace_2.0-3       withr_2.5.0            gridExtra_2.3         
## [16] tidyselect_1.2.0       bit_4.0.5              compiler_4.2.0        
## [19] cli_3.4.1              rvest_1.0.3            xml2_1.3.3            
## [22] DelayedArray_0.24.0    labeling_0.4.2         sass_0.4.4            
## [25] scales_1.2.1           systemfonts_1.0.4      digest_0.6.30         
## [28] svglite_2.1.0          rmarkdown_2.18         pkgconfig_2.0.3       
## [31] htmltools_0.5.3        highr_0.9              dbplyr_2.2.1          
## [34] fastmap_1.1.0          rlang_1.0.6            readxl_1.4.1          
## [37] rstudioapi_0.14        farver_2.1.1           jquerylib_0.1.4       
## [40] generics_0.1.3         jsonlite_1.8.3         vroom_1.6.0           
## [43] BiocParallel_1.32.4    googlesheets4_1.0.1    RCurl_1.98-1.9        
## [46] magrittr_2.0.3         GenomeInfoDbData_1.2.9 Matrix_1.5-3          
## [49] munsell_0.5.0          fansi_1.0.3            lifecycle_1.0.3       
## [52] stringi_1.7.8          yaml_2.3.6             zlibbioc_1.44.0       
## [55] grid_4.2.0             parallel_4.2.0         crayon_1.5.2          
## [58] lattice_0.20-45        haven_2.5.1            hms_1.1.2             
## [61] knitr_1.41             pillar_1.8.1           codetools_0.2-18      
## [64] reprex_2.0.2           glue_1.6.2             evaluate_0.18         
## [67] modelr_0.1.10          vctrs_0.5.1            tzdb_0.3.0            
## [70] cellranger_1.1.0       gtable_0.3.1           assertthat_0.2.1      
## [73] cachem_1.0.6           xfun_0.35              broom_1.0.1           
## [76] googledrive_2.0.0      gargle_1.2.1           timechange_0.1.1      
## [79] ellipsis_0.3.2

Seq data analysis: Left side TO libs

E. coli ORBIT 2022

Scott H. Saunders

Read in data and figure out TO / read orientation

Match reads to putative TOs

How many of the intended TO mutations were recovered?