import pandas as pd
import Bio.Seq as Seq
import Bio.SeqIO
pd.options.display.max_colwidth = 200
The cloning scheme that we will use to get single stranded oligos with no PCR handle overhangs from the oligo pool comes from this paper on 'MO-MAGE'. Note that there is detailed information in the supplement.
Their strategy was to amplify subpools of oligos as usual, but there are a few clever modifications. First, when amplifying, the reverse primer is modified to include a 5' phosphate group. This 5' end, i.e. the '-' strand will be selectively degraded by the lambda exonuclease (neb link). Also during that PCR it seems that the fwd primer has multiple 5' end PO bonds that selectively protect the '+' strand.
To cleave off the PCR handles, they used a uracil in the fwd primer to introduce a site for the USER enzyme, and they included a DpnII site in the reverse primer sequence. By annealing just the reverse primer you can create a double stranded template for DpnII, which will cleanly cleave off the 5' end of the '+' strand. See the diagram below:
In this way, they got ssDNA oligos with no overhangs.
I will modify their approach slightly, mainly because the USER enzyme is quite expensive, and I would like to avoid it if possible. For my cloning scheme I will introduce two restriction sites: btsI-V2 and dpnII / mboI. DpnII seemed to work well enough in their scheme for cleaving the 3'end, but it requires a special buffer, so I will actually use MboI to start, which cleaves at the exact same sequence. Throughout this document I may refer to the dpnII site, but remember that downstream I will actually use MboI. I chose BtsI-V2 for the 5' end because it does not leave a 3' overhang I can still include a T as the 5' overhang, allowing us to use the USER enzyme if this approach fails.
Further, both of these enzymes are quite cheap, and they both work in the Cutsmart buffer @ 37 degrees. I think that many different restriction enzymes could work for these two sites, for example nlaIII could work for the 5' site as well, but it wouldn't allow us to cleanly include a 'T' as a backup plan...but good to keep in mind for the future. Keep in mind that REs that work at temps > 50 degrees might cause the guide oligo do melt, leaving ssDNA, which would probably reduce the efficiency of the reaction.
For short recognition sequences, we can actually just find orthogonal kosuri primers that have the desired restriction site sequences. It would actually benefit this particular order to have longer sequences to be closer in length to the reg-seq constructs. The only issue is that when we purify / clean up this reaction we will be trying to purify a 128 bp oligo from 20 bp oligo, which already may be difficult. The longer those flanking oligos are the harder that step may be...that said these pcr handles should have no homology to genome and hopefully wouldn't affect anything even if they are electroporated directly into cell.
So, this notebook will find orthogonal primers that match our restriction sites (and add a 'CT' to BtsI-V2 site) to append to the ORBIT sequences of interest.
First, let's read in the fwd and reverse orthogonal primers. We'll go ahead and reverse complement these sequences since we will need them in that format to append to the ORBIT sequences.
df_rev = pd.DataFrame()
i = 0
for record in Bio.SeqIO.parse("reverse_finalprimers.fasta", "fasta"):
df_rev.loc[i,'rev_seq']=str(record.seq)
df_rev.loc[i, 'rev_seq_comp']=str(record.seq.reverse_complement())
df_rev.loc[i, 'rev_primer_name'] = record.name
i = i+1
df_rev
rev_seq | rev_seq_comp | rev_primer_name | |
---|---|---|---|
0 | AAGTATCTTTCCTGTGCCCA | TGGGCACAGGAAAGATACTT | skpp-1-R |
1 | TGGTAGTAATAAGGGCGACC | GGTCGCCCTTATTACTACCA | skpp-2-R |
2 | AGGGGTATCGGATACTCAGA | TCTGAGTATCCGATACCCCT | skpp-3-R |
3 | ATCGATTCCCCGGATATAGC | GCTATATCCGGGGAATCGAT | skpp-4-R |
4 | TACTAACTGCTTCAGGCCAA | TTGGCCTGAAGCAGTTAGTA | skpp-5-R |
... | ... | ... | ... |
2995 | GTCCGTGTAGGATCGCCTTT | AAAGGCGATCCTACACGGAC | skpp-2996-R |
2996 | GACTCTAGTGCGGGTGGTAC | GTACCACCCGCACTAGAGTC | skpp-2997-R |
2997 | TTGACCAGGGTAAGCCGATC | GATCGGCTTACCCTGGTCAA | skpp-2998-R |
2998 | GATTCAAGACGGCACTCGGA | TCCGAGTGCCGTCTTGAATC | skpp-2999-R |
2999 | GTAACACCTGTTCGCCGACT | AGTCGGCGAACAGGTGTTAC | skpp-3000-R |
3000 rows × 3 columns
Looks good. Now let's look for specific primers that end with the DpnII recognition site GATC.
df_rev_DpnII = df_rev.loc[df_rev['rev_seq'].str.endswith('GATC', na = False)]
df_rev_DpnII
rev_seq | rev_seq_comp | rev_primer_name | |
---|---|---|---|
349 | CCAACCAGAATCGAACGATC | GATCGTTCGATTCTGGTTGG | skpp-350-R |
468 | GTGACATCACACGGTTGATC | GATCAACCGTGTGATGTCAC | skpp-469-R |
527 | AAGAGGGTCGTATTCCGATC | GATCGGAATACGACCCTCTT | skpp-528-R |
861 | CAGCTTTTGGACGATGGATC | GATCCATCGTCCAAAAGCTG | skpp-862-R |
1584 | AAAGCCCCACGGAATTGATC | GATCAATTCCGTGGGGCTTT | skpp-1585-R |
1695 | TCCGGCTCTCCCTTAAGATC | GATCTTAAGGGAGAGCCGGA | skpp-1696-R |
1856 | CGGCTAAGTGAAGTCCGATC | GATCGGACTTCACTTAGCCG | skpp-1857-R |
1888 | AACGGCAGGGATGAAAGATC | GATCTTTCATCCCTGCCGTT | skpp-1889-R |
1910 | ATCTTCGGAGGGGAGAGATC | GATCTCTCCCCTCCGAAGAT | skpp-1911-R |
2389 | GGCCGTTTAAGGGATCGATC | GATCGATCCCTTAAACGGCC | skpp-2390-R |
2889 | ATTGCGTTTCGCCATGGATC | GATCCATGGCGAAACGCAAT | skpp-2890-R |
2997 | TTGACCAGGGTAAGCCGATC | GATCGGCTTACCCTGGTCAA | skpp-2998-R |
Ok, there are 12 different reverse primers that contain the restriction site.
Now let's read in the fwd primers.
df_fwd = pd.DataFrame()
i = 0
for record in Bio.SeqIO.parse("forward_finalprimers.fasta", "fasta"):
df_fwd.loc[i,'fwd_seq']=str(record.seq)
df_fwd.loc[i, 'fwd_primer_name'] = record.name
#df_fwd.loc[i, 'fwd_rev_comp']=str(record.seq.reverse_complement())
i = i+1
df_fwd
fwd_seq | fwd_primer_name | |
---|---|---|
0 | ATATAGATGCCGTCCTAGCG | skpp-1-F |
1 | CCCTTTAATCAGATGCGTCG | skpp-2-F |
2 | TTGGTCATGTGCTTTTCGTT | skpp-3-F |
3 | GGGTGGGTAAATGGTAATGC | skpp-4-F |
4 | TCCGACGGGGAGTATATACT | skpp-5-F |
... | ... | ... |
2995 | GTCGATCACCGCCCCTTTTA | skpp-2996-F |
2996 | CACGGAGGCAGCAAGACTTA | skpp-2997-F |
2997 | AGGTCGAAGTGTCGCGTAAA | skpp-2998-F |
2998 | TGTGCACTATCGATCACGGG | skpp-2999-F |
2999 | GTTTCGTTGTTTTCGGCCGT | skpp-3000-F |
3000 rows × 2 columns
Let's look for the BtsI recognition site GCAGTGNN. Remember we'll add the two NN nucleotides after the primer sequence.
df_fwd_BtsI = df_fwd.loc[df_fwd['fwd_seq'].str.endswith('GCAGTG', na = False)]
df_fwd_BtsI
fwd_seq | fwd_primer_name |
---|
Uh oh, no primers with that 6bp sequence. There are only 4 with the first 5 bp, so let's try just the first 4 bp GCAG :
df_fwd_BtsI = df_fwd.loc[df_fwd['fwd_seq'].str.endswith('GCAG', na = False)]
df_fwd_BtsI
fwd_seq | fwd_primer_name | |
---|---|---|
469 | CCGTAGATAACACAACGCAG | skpp-470-F |
765 | TCGTCTTAGTACGATCGCAG | skpp-766-F |
1074 | GATTGGATAAATGGCCGCAG | skpp-1075-F |
1140 | AATCTAAGACTCCGTCGCAG | skpp-1141-F |
2097 | GGACCTAGCATCAAACGCAG | skpp-2098-F |
2392 | CATGGAGAAGGCACTTGCAG | skpp-2393-F |
Ok, there are 6 fwd primers that contain this ending sequence.
Now let's add the rest of the BtsI site and the final two NNs, making sure the last nt is a 'T', which will allow us to use the USER enzyme as a backup plan. We'll just make the last two NNs 'CT' to balance the GC content. Therefore to get GCAGTGNN from GCAG we need to add TGCT. Remember, we are not limited for length by these ORBIT oligos, since so the oligo homology + attB is only 128 nt, with primers 168 nt, significantly less than our 200 nt limit. In fact we will simply add random sequence to these oligos to make them 200 nt as the final step in combine_orders.ipynb
.
df_fwd_BtsI['fwd_seq_full'] = df_fwd['fwd_seq'] + 'TGCT'
df_fwd_BtsI
<ipython-input-32-4ac4b6df3b3b>:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy df_fwd_BtsI['fwd_seq_full'] = df_fwd['fwd_seq'] + 'TGCT'
fwd_seq | fwd_primer_name | fwd_seq_full | |
---|---|---|---|
469 | CCGTAGATAACACAACGCAG | skpp-470-F | CCGTAGATAACACAACGCAGTGCT |
765 | TCGTCTTAGTACGATCGCAG | skpp-766-F | TCGTCTTAGTACGATCGCAGTGCT |
1074 | GATTGGATAAATGGCCGCAG | skpp-1075-F | GATTGGATAAATGGCCGCAGTGCT |
1140 | AATCTAAGACTCCGTCGCAG | skpp-1141-F | AATCTAAGACTCCGTCGCAGTGCT |
2097 | GGACCTAGCATCAAACGCAG | skpp-2098-F | GGACCTAGCATCAAACGCAGTGCT |
2392 | CATGGAGAAGGCACTTGCAG | skpp-2393-F | CATGGAGAAGGCACTTGCAGTGCT |
With that we can concatenate the fwd and reverse primer dataframes.
df_fwd_rev = pd.concat([df_fwd_BtsI.reset_index(drop = True), df_rev_DpnII[0:6].reset_index(drop = True)], axis = 1, sort = False)
df_fwd_rev
fwd_seq | fwd_primer_name | fwd_seq_full | rev_seq | rev_seq_comp | rev_primer_name | |
---|---|---|---|---|---|---|
0 | CCGTAGATAACACAACGCAG | skpp-470-F | CCGTAGATAACACAACGCAGTGCT | CCAACCAGAATCGAACGATC | GATCGTTCGATTCTGGTTGG | skpp-350-R |
1 | TCGTCTTAGTACGATCGCAG | skpp-766-F | TCGTCTTAGTACGATCGCAGTGCT | GTGACATCACACGGTTGATC | GATCAACCGTGTGATGTCAC | skpp-469-R |
2 | GATTGGATAAATGGCCGCAG | skpp-1075-F | GATTGGATAAATGGCCGCAGTGCT | AAGAGGGTCGTATTCCGATC | GATCGGAATACGACCCTCTT | skpp-528-R |
3 | AATCTAAGACTCCGTCGCAG | skpp-1141-F | AATCTAAGACTCCGTCGCAGTGCT | CAGCTTTTGGACGATGGATC | GATCCATCGTCCAAAAGCTG | skpp-862-R |
4 | GGACCTAGCATCAAACGCAG | skpp-2098-F | GGACCTAGCATCAAACGCAGTGCT | AAAGCCCCACGGAATTGATC | GATCAATTCCGTGGGGCTTT | skpp-1585-R |
5 | CATGGAGAAGGCACTTGCAG | skpp-2393-F | CATGGAGAAGGCACTTGCAGTGCT | TCCGGCTCTCCCTTAAGATC | GATCTTAAGGGAGAGCCGGA | skpp-1696-R |
And finally let's clean it up just to the sequences we will append to the beginning (fwd_seq_t
) and end (rev_seq_comp
) of the ORBIT oligos. In the future, we can return to this notebook to get the actual primer sequences we will use to amplify the ORBIT constructs.
df_fwd_rev = df_fwd_rev[['fwd_seq_full', 'rev_seq_comp', 'fwd_primer_name','rev_primer_name']]
df_fwd_rev
fwd_seq_full | rev_seq_comp | fwd_primer_name | rev_primer_name | |
---|---|---|---|---|
0 | CCGTAGATAACACAACGCAGTGCT | GATCGTTCGATTCTGGTTGG | skpp-470-F | skpp-350-R |
1 | TCGTCTTAGTACGATCGCAGTGCT | GATCAACCGTGTGATGTCAC | skpp-766-F | skpp-469-R |
2 | GATTGGATAAATGGCCGCAGTGCT | GATCGGAATACGACCCTCTT | skpp-1075-F | skpp-528-R |
3 | AATCTAAGACTCCGTCGCAGTGCT | GATCCATCGTCCAAAAGCTG | skpp-1141-F | skpp-862-R |
4 | GGACCTAGCATCAAACGCAGTGCT | GATCAATTCCGTGGGGCTTT | skpp-2098-F | skpp-1585-R |
5 | CATGGAGAAGGCACTTGCAGTGCT | GATCTTAAGGGAGAGCCGGA | skpp-2393-F | skpp-1696-R |
Now let's actually make our final TWIST constructs that contain our PCR handles, RE sites, and ORBIT targeting oligo.
df_1 = pd.read_csv("twist_orbit_tf_del_FL_short.csv")
df_2 = pd.read_csv("twist_orbit_tf_del_FL_long.csv")
df_3 = pd.read_csv("twist_orbit_tf_del_AO_short.csv")
df_4 = pd.read_csv("twist_orbit_tf_del_AO_long.csv")
len(df_1['oligo'][1])
128
ends_1 = df_fwd_rev.iloc[0,:].str.lower()
df_1['seq'] = ends_1['fwd_seq_full'] + df_1['oligo'] + ends_1['rev_seq_comp']
df_1['construct'] = 'orbit_tf_del_FL_short'
df_1['forward_primers_0'] = [(int(ends_1['fwd_primer_name'].split('-')[1])-1, 0)] * len(df_1['construct'])
df_1['reverse_primers_0'] = [(int(ends_1['rev_primer_name'].split('-')[1])-1, 152)] * len(df_1['construct'])
df_1_clean = df_1[['seq','construct','forward_primers_0','reverse_primers_0']]
df_1_clean
seq | construct | forward_primers_0 | reverse_primers_0 | |
---|---|---|---|---|
0 | ccgtagataacacaacgcagtgctAATCTCTCTGCAACCAAAGTGAACCAATGAGAGGCAACAAGAATGggcttgtcgacgacggcggtctccgtcgtcaggatcatTGAGGGTGTTACATGAATTCATACTCAATTGCTGTCATCGGAGTGgatcgttcgattctggttgg | orbit_tf_del_FL_short | (469, 0) | (349, 152) |
1 | ccgtagataacacaacgcagtgctTATGCACAATAATGTTGTATCAACCACCATATCGGGTGACTTATGggcttgtcgacgacggcggtctccgtcgtcaggatcatTAATCTCTGCCCCGTCGTTTCTGACGGCGGGGAAAATGTTGCTTAgatcgttcgattctggttgg | orbit_tf_del_FL_short | (469, 0) | (349, 152) |
2 | ccgtagataacacaacgcagtgctGATGAATGAGTTTTCTATAAACTTATACTTAATAATTAGAAGTTAggcttgtcgacgacggcggtctccgtcgtcaggatcatCATGGTAACCTCTCATCTTACTTATGAAATTTTAATGTATTCTGTgatcgttcgattctggttgg | orbit_tf_del_FL_short | (469, 0) | (349, 152) |
3 | ccgtagataacacaacgcagtgctGCTTCGAAGAGAGACACTACCTGCAACAATCAGGAGCGCAATATGggcttgtcgacgacggcggtctccgtcgtcaggatcatTAAAAATTTAGCTAAACACATATGAATTTTCAGATGTGTTTTATCgatcgttcgattctggttgg | orbit_tf_del_FL_short | (469, 0) | (349, 152) |
4 | ccgtagataacacaacgcagtgctGGCTAAAATAGAATGAATCATCAATCCGCATAAGAAAATCCTATGggcttgtcgacgacggcggtctccgtcgtcaggatcatTGATCGGCTTTTTTAATCCCATACTTTTCCACAGGTAGATCCCAAgatcgttcgattctggttgg | orbit_tf_del_FL_short | (469, 0) | (349, 152) |
... | ... | ... | ... | ... |
69 | ccgtagataacacaacgcagtgctTAAGGGCATCTGTTTTTTATATTCAAGAATGAAAAATTTTTGTCAggcttgtcgacgacggcggtctccgtcgtcaggatcatCATTACCAATACCTTACATATATTACTCATTAATGTATGTGCGAAgatcgttcgattctggttgg | orbit_tf_del_FL_short | (469, 0) | (349, 152) |
70 | ccgtagataacacaacgcagtgctATATGAGTGTCGAATCCTTATCCAAAACAAGAGGTAACTCTCATGggcttgtcgacgacggcggtctccgtcgtcaggatcatTGAACAAATTTTATCAGGTGACGTTCCGTAAAAAGTTGTATGGAGgatcgttcgattctggttgg | orbit_tf_del_FL_short | (469, 0) | (349, 152) |
71 | ccgtagataacacaacgcagtgctAGCCATGCACCGTAGACCAGATAAGCTCAGCGCATCCGGCAGTTAggcttgtcgacgacggcggtctccgtcgtcaggatcatCATTTCATACTTACCTTTTTGTACGTACTTACTAAAAGTAAGTTTgatcgttcgattctggttgg | orbit_tf_del_FL_short | (469, 0) | (349, 152) |
72 | ccgtagataacacaacgcagtgctGGTTATTTAACGGCGCGAGTGTAATCCTGCCAGTGCAAAAAATCAggcttgtcgacgacggcggtctccgtcgtcaggatcatCATACATACTCCACTAGTTATCGTTGATTTTGTCCAACAACTTGTgatcgttcgattctggttgg | orbit_tf_del_FL_short | (469, 0) | (349, 152) |
73 | ccgtagataacacaacgcagtgctGGTAAAGTAAGGACATTCTTAACCCCCACTTTGAGGTGCCCGATGggcttgtcgacgacggcggtctccgtcgtcaggatcatTAAGAGGGCGTACATCCTTGTACACGTCGGGCAGGAGGGATTAATgatcgttcgattctggttgg | orbit_tf_del_FL_short | (469, 0) | (349, 152) |
74 rows × 4 columns
ends_2 = df_fwd_rev.iloc[1,:].str.lower()
df_2['seq'] = ends_2['fwd_seq_full'] + df_2['oligo'] + ends_2['rev_seq_comp']
df_2['construct'] = 'twist_orbit_tf_del_FL_long'
df_2['forward_primers_0'] = [(int(ends_2['fwd_primer_name'].split('-')[1])-1, 0)] * len(df_2['construct'])
df_2['reverse_primers_0'] = [(int(ends_2['rev_primer_name'].split('-')[1])-1, 152)] * len(df_2['construct'])
df_2_clean = df_2[['seq','construct','forward_primers_0','reverse_primers_0']]
df_2_clean
seq | construct | forward_primers_0 | reverse_primers_0 | |
---|---|---|---|---|
0 | tcgtcttagtacgatcgcagtgctCTATATTATGTGATCTAAATCACTTTTAAGTCAGAGTGAATAATGggcttgtcgacgacggcggtctccgtcgtcaggatcatTAATTCATATTGTACTGTTACGTTGTACAAACCTGTGCCAACGGGgatcaaccgtgtgatgtcac | twist_orbit_tf_del_FL_long | (765, 0) | (468, 152) |
1 | tcgtcttagtacgatcgcagtgctGAGTCTGGCGGATGTCGACAGACTCTATTTTTTTATGCAGTTTTAggcttgtcgacgacggcggtctccgtcgtcaggatcatCATGACGCCACCGATAACCGTTATTTATCAGACCAAAGAAACTGGgatcaaccgtgtgatgtcac | twist_orbit_tf_del_FL_long | (765, 0) | (468, 152) |
2 | tcgtcttagtacgatcgcagtgctCGACGAAAATGTCCAGGAAAAATCCTGGAGTCAGATTCAGGGTTAggcttgtcgacgacggcggtctccgtcgtcaggatcatCATATGTTCGTGAATTTACAGGCGTTAGATTTACATACATTTGTGgatcaaccgtgtgatgtcac | twist_orbit_tf_del_FL_long | (765, 0) | (468, 152) |
3 | tcgtcttagtacgatcgcagtgctGTGGCTCTTGCCACGGTTCAGCATCGGCAAACAGATCCAACATTAggcttgtcgacgacggcggtctccgtcgtcaggatcatCATAATCAGCTCCCTGGTTAAGGATAGCCTTTAGGCTGCCCGGTCgatcaaccgtgtgatgtcac | twist_orbit_tf_del_FL_long | (765, 0) | (468, 152) |
4 | tcgtcttagtacgatcgcagtgctTTAGCGAGAACTGGTCTTTTATTCGCACTCAGGAGTACATGTATGggcttgtcgacgacggcggtctccgtcgtcaggatcatTGATTTTTAACCTTAACGAAGAGCTATATTAATAACGGCATCAGCgatcaaccgtgtgatgtcac | twist_orbit_tf_del_FL_long | (765, 0) | (468, 152) |
... | ... | ... | ... | ... |
221 | tcgtcttagtacgatcgcagtgctAAAGAATTTCGCCAGTTAATGCATCTTTAATCGGGAACTTTCATGggcttgtcgacgacggcggtctccgtcgtcaggatcatTAACGTCAGAAGGTTAATTCTGTTTCCAGCAGCGTCAGGATACTTgatcaaccgtgtgatgtcac | twist_orbit_tf_del_FL_long | (765, 0) | (468, 152) |
222 | tcgtcttagtacgatcgcagtgctCGCGGAATAATCACGCAATTAACTAAACAAGGTTTAGTGAAGATGggcttgtcgacgacggcggtctccgtcgtcaggatcatTGATGGCGCGATAACGTAGAAAGGCTTCCCGAAGGAAGCCTTGATgatcaaccgtgtgatgtcac | twist_orbit_tf_del_FL_long | (765, 0) | (468, 152) |
223 | tcgtcttagtacgatcgcagtgctCTATGTGATCTCCATTTCGATTGATTTAGTGTTTATTGACGTATGggcttgtcgacgacggcggtctccgtcgtcaggatcatTGATTATAAAAAAAACTTATTATTTATTTTAGTTTTTATCAGTGGgatcaaccgtgtgatgtcac | twist_orbit_tf_del_FL_long | (765, 0) | (468, 152) |
224 | tcgtcttagtacgatcgcagtgctTGACGATTTTCCCCGTTCCCGGTTGCTGTACCGGGAACGTATTTAggcttgtcgacgacggcggtctccgtcgtcaggatcatCATTTCTCCAGCACTCTGGAGAAATAGGCAAGACATTGGCAGAAAgatcaaccgtgtgatgtcac | twist_orbit_tf_del_FL_long | (765, 0) | (468, 152) |
225 | tcgtcttagtacgatcgcagtgctCCGGAAAGATATCGGCTGGCGCGCTATCGAACGCGAGCAGAACTAggcttgtcgacgacggcggtctccgtcgtcaggatcatCATCCTTGTGGGTCCTTACGCGTAATATTGACCGGAAGCCAGAGGgatcaaccgtgtgatgtcac | twist_orbit_tf_del_FL_long | (765, 0) | (468, 152) |
226 rows × 4 columns
ends_3 = df_fwd_rev.iloc[2,:].str.lower()
df_3['seq'] = ends_3['fwd_seq_full'] + df_3['oligo'] + ends_3['rev_seq_comp']
df_3['construct'] = 'orbit_tf_del_AO_short'
df_3['forward_primers_0'] = [(int(ends_3['fwd_primer_name'].split('-')[1])-1, 0)] * len(df_3['construct'])
df_3['reverse_primers_0'] = [(int(ends_3['rev_primer_name'].split('-')[1])-1, 152)] * len(df_3['construct'])
df_3_clean = df_3[['seq','construct','forward_primers_0','reverse_primers_0']]
df_3_clean
seq | construct | forward_primers_0 | reverse_primers_0 | |
---|---|---|---|---|
0 | gattggataaatggccgcagtgctCTCTCTGCAACCAAAGTGAACCAATGAGAGGCAACAAGAATGAACggcttgtcgacgacggcggtctccgtcgtcaggatcatCAACGCTGTAAACTTATTTGAGGGTGTTACATGAATTCATACTCAgatcggaatacgaccctctt | orbit_tf_del_AO_short | (1074, 0) | (527, 152) |
1 | gattggataaatggccgcagtgctGCACAATAATGTTGTATCAACCACCATATCGGGTGACTTATGCGAggcttgtcgacgacggcggtctccgtcgtcaggatcatCTGTTCGACCAGGAGCTTTAATCTCTGCCCCGTCGTTTCTGACGGgatcggaatacgaccctctt | orbit_tf_del_AO_short | (1074, 0) | (527, 152) |
2 | gattggataaatggccgcagtgctAAACTTATACTTAATAATTAGAAGTTACATATCATCAGCTGTGTAggcttgtcgacgacggcggtctccgtcgtcaggatcatAAGCATGGTAACCTCTCATCTTACTTATGAAATTTTAATGTATTCgatcggaatacgaccctctt | orbit_tf_del_AO_short | (1074, 0) | (527, 152) |
3 | gattggataaatggccgcagtgctTCGAAGAGAGACACTACCTGCAACAATCAGGAGCGCAATATGTCAggcttgtcgacgacggcggtctccgtcgtcaggatcatAGTAAGAACATTTGCAGTTAAAAATTTAGCTAAACACATATGAATgatcggaatacgaccctctt | orbit_tf_del_AO_short | (1074, 0) | (527, 152) |
4 | gattggataaatggccgcagtgctTAAAATAGAATGAATCATCAATCCGCATAAGAAAATCCTATGGAAggcttgtcgacgacggcggtctccgtcgtcaggatcatATGCGTACCATCAAGCCCTGATCGGCTTTTTTAATCCCATACTTTgatcggaatacgaccctctt | orbit_tf_del_AO_short | (1074, 0) | (527, 152) |
... | ... | ... | ... | ... |
69 | gattggataaatggccgcagtgctATATTCAAGAATGAAAAATTTTTGTCATTCCTTATGCTCCTTACAggcttgtcgacgacggcggtctccgtcgtcaggatcatCGCCATTACCAATACCTTACATATATTACTCATTAATGTATGTGCgatcggaatacgaccctctt | orbit_tf_del_AO_short | (1074, 0) | (527, 152) |
70 | gattggataaatggccgcagtgctTGAGTGTCGAATCCTTATCCAAAACAAGAGGTAACTCTCATGCTTggcttgtcgacgacggcggtctccgtcgtcaggatcatAATCTCAAAAGACGATACTGAACAAATTTTATCAGGTGACGTTCCgatcggaatacgaccctctt | orbit_tf_del_AO_short | (1074, 0) | (527, 152) |
71 | gattggataaatggccgcagtgctAGATAAGCTCAGCGCATCCGGCAGTTATGCCGCACGTTCATCCCGggcttgtcgacgacggcggtctccgtcgtcaggatcatACTCATTTCATACTTACCTTTTTGTACGTACTTACTAAAAGTAAGgatcggaatacgaccctctt | orbit_tf_del_AO_short | (1074, 0) | (527, 152) |
72 | gattggataaatggccgcagtgctGTGTAATCCTGCCAGTGCAAAAAATCAACAACCACTCTTAACGCCggcttgtcgacgacggcggtctccgtcgtcaggatcatATACATACATACTCCACTAGTTATCGTTGATTTTGTCCAACAACTgatcggaatacgaccctctt | orbit_tf_del_AO_short | (1074, 0) | (527, 152) |
73 | gattggataaatggccgcagtgctAAAGTAAGGACATTCTTAACCCCCACTTTGAGGTGCCCGATGGAAggcttgtcgacgacggcggtctccgtcgtcaggatcatGTGAAAAAGAAACCGCGTTAAGAGGGCGTACATCCTTGTACACGTgatcggaatacgaccctctt | orbit_tf_del_AO_short | (1074, 0) | (527, 152) |
74 rows × 4 columns
ends_4 = df_fwd_rev.iloc[3,:].str.lower()
df_4['seq'] = ends_4['fwd_seq_full'] + df_4['oligo'] + ends_4['rev_seq_comp']
df_4['construct'] = 'orbit_tf_del_AO_long'
df_4['forward_primers_0'] = [(int(ends_4['fwd_primer_name'].split('-')[1])-1, 0)] * len(df_4['construct'])
df_4['reverse_primers_0'] = [(int(ends_4['rev_primer_name'].split('-')[1])-1, 152)] * len(df_4['construct'])
df_4_clean = df_4[['seq','construct','forward_primers_0','reverse_primers_0']]
df_4_clean
seq | construct | forward_primers_0 | reverse_primers_0 | |
---|---|---|---|---|
0 | aatctaagactccgtcgcagtgctTATTATGTGATCTAAATCACTTTTAAGTCAGAGTGAATAATGGAAggcttgtcgacgacggcggtctccgtcgtcaggatcatGGGCGCGGGAAAGAGAAGTAATTCATATTGTACTGTTACGTTGTAgatccatcgtccaaaagctg | orbit_tf_del_AO_long | (1140, 0) | (861, 152) |
1 | aatctaagactccgtcgcagtgctCAGACTCTATTTTTTTATGCAGTTTTAACTTTGCAGATAGCCGCAggcttgtcgacgacggcggtctccgtcgtcaggatcatAGCCATGACGCCACCGATAACCGTTATTTATCAGACCAAAGAAACgatccatcgtccaaaagctg | orbit_tf_del_AO_long | (1140, 0) | (861, 152) |
2 | aatctaagactccgtcgcagtgctAAAATCCTGGAGTCAGATTCAGGGTTATTCGTTAGTGGCAGGATTggcttgtcgacgacggcggtctccgtcgtcaggatcatTGCCATATGTTCGTGAATTTACAGGCGTTAGATTTACATACATTTgatccatcgtccaaaagctg | orbit_tf_del_AO_long | (1140, 0) | (861, 152) |
3 | aatctaagactccgtcgcagtgctCAGCATCGGCAAACAGATCCAACATTACCTCTCCTCATTTTCAGCggcttgtcgacgacggcggtctccgtcgtcaggatcatTTTCATAATCAGCTCCCTGGTTAAGGATAGCCTTTAGGCTGCCCGgatccatcgtccaaaagctg | orbit_tf_del_AO_long | (1140, 0) | (861, 152) |
4 | aatctaagactccgtcgcagtgctGCGAGAACTGGTCTTTTATTCGCACTCAGGAGTACATGTATGAGGggcttgtcgacgacggcggtctccgtcgtcaggatcatAGAGAACGCACTGTCGCCTGATTTTTAACCTTAACGAAGAGCTATgatccatcgtccaaaagctg | orbit_tf_del_AO_long | (1140, 0) | (861, 152) |
... | ... | ... | ... | ... |
221 | aatctaagactccgtcgcagtgctGAATTTCGCCAGTTAATGCATCTTTAATCGGGAACTTTCATGAAAggcttgtcgacgacggcggtctccgtcgtcaggatcatAGCGCCCGTTTTCAGGGCTAACGTCAGAAGGTTAATTCTGTTTCCgatccatcgtccaaaagctg | orbit_tf_del_AO_long | (1140, 0) | (861, 152) |
222 | aatctaagactccgtcgcagtgctGGAATAATCACGCAATTAACTAAACAAGGTTTAGTGAAGATGAGAggcttgtcgacgacggcggtctccgtcgtcaggatcatGCGCAGTTACGACAGATTTGATGGCGCGATAACGTAGAAAGGCTTgatccatcgtccaaaagctg | orbit_tf_del_AO_long | (1140, 0) | (861, 152) |
223 | aatctaagactccgtcgcagtgctTGTGATCTCCATTTCGATTGATTTAGTGTTTATTGACGTATGTACggcttgtcgacgacggcggtctccgtcgtcaggatcatCGTGAGGTTAATCGTGATTGATTATAAAAAAAACTTATTATTTATgatccatcgtccaaaagctg | orbit_tf_del_AO_long | (1140, 0) | (861, 152) |
224 | aatctaagactccgtcgcagtgctCCGGTTGCTGTACCGGGAACGTATTTAATTCCCCTGCATCGCCCGggcttgtcgacgacggcggtctccgtcgtcaggatcatTAGCATTTCTCCAGCACTCTGGAGAAATAGGCAAGACATTGGCAGgatccatcgtccaaaagctg | orbit_tf_del_AO_long | (1140, 0) | (861, 152) |
225 | aatctaagactccgtcgcagtgctGCGCGCTATCGAACGCGAGCAGAACTAACGCGACAGTTTTGCCAAggcttgtcgacgacggcggtctccgtcgtcaggatcatCGTCATCCTTGTGGGTCCTTACGCGTAATATTGACCGGAAGCCAGgatccatcgtccaaaagctg | orbit_tf_del_AO_long | (1140, 0) | (861, 152) |
226 rows × 4 columns
df_1_clean.to_csv("../../../../data/twist_order/twist_orbit_TF_del_first_last_short.csv")
df_2_clean.to_csv("../../../../data/twist_order/twist_orbit_TF_del_first_last_long.csv")
df_3_clean.to_csv("../../../../data/twist_order/twist_orbit_TF_del_avd_ovlp_short.csv")
df_4_clean.to_csv("../../../../data/twist_order/twist_orbit_TF_del_avd_ovlp_long.csv")
df_sRNA = pd.read_csv("twist_orbit_small_RNA.csv")
df_sRNA
Unnamed: 0 | Gene Name | Product Name | Left-End-Position | Right-End-Position | length | Direction | left_oligo_pos | right_oligo_pos | oligo | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 3'ETS-<i>leuZ</i> | small regulatory RNA 3'ETS<sup><i>leuZ</i></sup> | 1991748 | 1991814 | 66 | + | 1991747 | 1991815 | TGGTGATTAAAAATTAAGGAGGGTGTAACGACAAGTTGCAGGCACggcttgtcgacgacggcggtctccgtcgtcaggatcatTGGTACCCGGAGCGGGACTTGAACCCGCACAGCGCGAACGCCGAG |
1 | 1 | agrA | small RNA AgrA | 3648063 | 3648144 | 81 | + | 3648062 | 3648145 | AGCACGTCCTTGCAATAGTTTCAGTATGGTATTAGCATTGATGCGggcttgtcgacgacggcggtctccgtcgtcaggatcatACATCCGGATTCGGACAAGGCTTAATATGACGATGACCCAGTGAA |
2 | 2 | agrB | small regulatory RNA AgrB | 3648294 | 3648375 | 81 | + | 3648293 | 3648376 | CGCTAATTCTTGCAATGTTAGCCACTGGCTAATAGTATTGAGCTGggcttgtcgacgacggcggtctccgtcgtcaggatcatACGTCCTGATTCAGACCTCCTTTCAAATGAATAGCCAACTCAAAA |
3 | 3 | arcZ | small regulatory RNA ArcZ | 3350577 | 3350697 | 120 | + | 3350576 | 3350698 | ACTGATTCATGTAACAAATCATTTAAGTTTTGCTATCTTAACTGCggcttgtcgacgacggcggtctccgtcgtcaggatcatAGTGGCTTTTGCCACCCACGCTTTCAGCACTTCTACGTCGTGACG |
4 | 4 | arrS | small regulatory RNA ArrS | 3657985 | 3658054 | 69 | + | 3657984 | 3658055 | CTGAAGACATGAATGCGTTATTTACTCAGGTAATTTCAATGCGTTggcttgtcgacgacggcggtctccgtcgtcaggatcatATTTTAACTTTAGTAATATTCTTCAGAGATCACAAACTGGTTATT |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
88 | 88 | sroH | small RNA SroH | 4190327 | 4190487 | 160 | + | 4190326 | 4190488 | AGAGATCTGATTGTAAGAGAGTAAATACTCAACTATGATAGAGACggcttgtcgacgacggcggtctccgtcgtcaggatcatGTTATTTTGAGGGCTGAGGAAGCTGCTTATTTCTCAATAAGTTGT |
89 | 89 | ssrA | tmRNA | 2755593 | 2755955 | 362 | + | 2755592 | 2755956 | CTGGTCATGGCGCTCATAAATCTGGTATACTTACCTTTACACATTggcttgtcgacgacggcggtctccgtcgtcaggatcatAAATTCTCCATCGGTGATTACCAGAGTCATCCGATGAAGTCCTAA |
90 | 90 | ssrS | 6S RNA | 3055983 | 3056165 | 182 | + | 3055982 | 3056166 | ATGACACTTTTCGGTTTACTGTGGTAGAGTAACCGTGAAGACAAAggcttgtcgacgacggcggtctccgtcgtcaggatcatCCTTCTTATCTGGCACCAGCCATGACGCAACTACCAGAACTCCCA |
91 | 91 | symR | small regulatory RNA antitoxin SymR | 4579835 | 4579911 | 76 | + | 4579834 | 4579912 | TAGCTGGACTTTCCCCATATTTACTGATGATATATACAGGTATTTggcttgtcgacgacggcggtctccgtcgtcaggatcatGACACGCATTCTATTGCACAACCGTTCGAAGCAGAAGTCTCCCCG |
92 | 92 | tff | putative small RNA T44 | 189712 | 189847 | 135 | + | 189711 | 189848 | GCATGGAAACAGTTGCCATGATTAAAACCTCTATATAAAAGTTGGggcttgtcgacgacggcggtctccgtcgtcaggatcatGCGCGCTTTATACCACAAATACGTCGTGGACACCAATAATTGTTG |
93 rows × 10 columns
ends_sRNA = df_fwd_rev.iloc[4,:].str.lower()
df_sRNA['seq'] = ends_sRNA['fwd_seq_full'] + df_sRNA['oligo'] + ends_sRNA['rev_seq_comp']
df_sRNA['construct'] = 'orbit_small_RNA'
df_sRNA['forward_primers_0'] = [(int(ends_sRNA['fwd_primer_name'].split('-')[1])-1, 0)] * len(df_sRNA['construct'])
df_sRNA['reverse_primers_0'] = [(int(ends_sRNA['rev_primer_name'].split('-')[1])-1, 152)] * len(df_sRNA['construct'])
df_sRNA_clean = df_sRNA[['seq','construct','forward_primers_0','reverse_primers_0']]
df_sRNA_clean
seq | construct | forward_primers_0 | reverse_primers_0 | |
---|---|---|---|---|
0 | ggacctagcatcaaacgcagtgctTGGTGATTAAAAATTAAGGAGGGTGTAACGACAAGTTGCAGGCACggcttgtcgacgacggcggtctccgtcgtcaggatcatTGGTACCCGGAGCGGGACTTGAACCCGCACAGCGCGAACGCCGAGgatcaattccgtggggcttt | orbit_small_RNA | (2097, 0) | (1584, 152) |
1 | ggacctagcatcaaacgcagtgctAGCACGTCCTTGCAATAGTTTCAGTATGGTATTAGCATTGATGCGggcttgtcgacgacggcggtctccgtcgtcaggatcatACATCCGGATTCGGACAAGGCTTAATATGACGATGACCCAGTGAAgatcaattccgtggggcttt | orbit_small_RNA | (2097, 0) | (1584, 152) |
2 | ggacctagcatcaaacgcagtgctCGCTAATTCTTGCAATGTTAGCCACTGGCTAATAGTATTGAGCTGggcttgtcgacgacggcggtctccgtcgtcaggatcatACGTCCTGATTCAGACCTCCTTTCAAATGAATAGCCAACTCAAAAgatcaattccgtggggcttt | orbit_small_RNA | (2097, 0) | (1584, 152) |
3 | ggacctagcatcaaacgcagtgctACTGATTCATGTAACAAATCATTTAAGTTTTGCTATCTTAACTGCggcttgtcgacgacggcggtctccgtcgtcaggatcatAGTGGCTTTTGCCACCCACGCTTTCAGCACTTCTACGTCGTGACGgatcaattccgtggggcttt | orbit_small_RNA | (2097, 0) | (1584, 152) |
4 | ggacctagcatcaaacgcagtgctCTGAAGACATGAATGCGTTATTTACTCAGGTAATTTCAATGCGTTggcttgtcgacgacggcggtctccgtcgtcaggatcatATTTTAACTTTAGTAATATTCTTCAGAGATCACAAACTGGTTATTgatcaattccgtggggcttt | orbit_small_RNA | (2097, 0) | (1584, 152) |
... | ... | ... | ... | ... |
88 | ggacctagcatcaaacgcagtgctAGAGATCTGATTGTAAGAGAGTAAATACTCAACTATGATAGAGACggcttgtcgacgacggcggtctccgtcgtcaggatcatGTTATTTTGAGGGCTGAGGAAGCTGCTTATTTCTCAATAAGTTGTgatcaattccgtggggcttt | orbit_small_RNA | (2097, 0) | (1584, 152) |
89 | ggacctagcatcaaacgcagtgctCTGGTCATGGCGCTCATAAATCTGGTATACTTACCTTTACACATTggcttgtcgacgacggcggtctccgtcgtcaggatcatAAATTCTCCATCGGTGATTACCAGAGTCATCCGATGAAGTCCTAAgatcaattccgtggggcttt | orbit_small_RNA | (2097, 0) | (1584, 152) |
90 | ggacctagcatcaaacgcagtgctATGACACTTTTCGGTTTACTGTGGTAGAGTAACCGTGAAGACAAAggcttgtcgacgacggcggtctccgtcgtcaggatcatCCTTCTTATCTGGCACCAGCCATGACGCAACTACCAGAACTCCCAgatcaattccgtggggcttt | orbit_small_RNA | (2097, 0) | (1584, 152) |
91 | ggacctagcatcaaacgcagtgctTAGCTGGACTTTCCCCATATTTACTGATGATATATACAGGTATTTggcttgtcgacgacggcggtctccgtcgtcaggatcatGACACGCATTCTATTGCACAACCGTTCGAAGCAGAAGTCTCCCCGgatcaattccgtggggcttt | orbit_small_RNA | (2097, 0) | (1584, 152) |
92 | ggacctagcatcaaacgcagtgctGCATGGAAACAGTTGCCATGATTAAAACCTCTATATAAAAGTTGGggcttgtcgacgacggcggtctccgtcgtcaggatcatGCGCGCTTTATACCACAAATACGTCGTGGACACCAATAATTGTTGgatcaattccgtggggcttt | orbit_small_RNA | (2097, 0) | (1584, 152) |
93 rows × 4 columns
df_sRNA_clean.to_csv("../../../../data/twist_order/twist_orbit_small_RNA.csv")
%load_ext watermark
%watermark -v -p wgregseq,numpy,pandas
CPython 3.8.5 IPython 7.19.0 wgregseq 0.0.1 numpy 1.18.1 pandas 1.2.0