Twist: ORBIT cloning_scheme¶


In [24]:
import pandas as pd
import Bio.Seq as Seq
import Bio.SeqIO

pd.options.display.max_colwidth = 200

The cloning scheme that we will use to get single stranded oligos with no PCR handle overhangs from the oligo pool comes from this paper on 'MO-MAGE'. Note that there is detailed information in the supplement.

Their strategy was to amplify subpools of oligos as usual, but there are a few clever modifications. First, when amplifying, the reverse primer is modified to include a 5' phosphate group. This 5' end, i.e. the '-' strand will be selectively degraded by the lambda exonuclease (neb link). Also during that PCR it seems that the fwd primer has multiple 5' end PO bonds that selectively protect the '+' strand.

To cleave off the PCR handles, they used a uracil in the fwd primer to introduce a site for the USER enzyme, and they included a DpnII site in the reverse primer sequence. By annealing just the reverse primer you can create a double stranded template for DpnII, which will cleanly cleave off the 5' end of the '+' strand. See the diagram below:

drawing

In this way, they got ssDNA oligos with no overhangs.

I will modify their approach slightly, mainly because the USER enzyme is quite expensive, and I would like to avoid it if possible. For my cloning scheme I will introduce two restriction sites: btsI-V2 and dpnII / mboI. DpnII seemed to work well enough in their scheme for cleaving the 3'end, but it requires a special buffer, so I will actually use MboI to start, which cleaves at the exact same sequence. Throughout this document I may refer to the dpnII site, but remember that downstream I will actually use MboI. I chose BtsI-V2 for the 5' end because it does not leave a 3' overhang I can still include a T as the 5' overhang, allowing us to use the USER enzyme if this approach fails.

drawingdrawing

Further, both of these enzymes are quite cheap, and they both work in the Cutsmart buffer @ 37 degrees. I think that many different restriction enzymes could work for these two sites, for example nlaIII could work for the 5' site as well, but it wouldn't allow us to cleanly include a 'T' as a backup plan...but good to keep in mind for the future. Keep in mind that REs that work at temps > 50 degrees might cause the guide oligo do melt, leaving ssDNA, which would probably reduce the efficiency of the reaction.

For short recognition sequences, we can actually just find orthogonal kosuri primers that have the desired restriction site sequences. It would actually benefit this particular order to have longer sequences to be closer in length to the reg-seq constructs. The only issue is that when we purify / clean up this reaction we will be trying to purify a 128 bp oligo from 20 bp oligo, which already may be difficult. The longer those flanking oligos are the harder that step may be...that said these pcr handles should have no homology to genome and hopefully wouldn't affect anything even if they are electroporated directly into cell.

So, this notebook will find orthogonal primers that match our restriction sites (and add a 'CT' to BtsI-V2 site) to append to the ORBIT sequences of interest.


Generate Orthogonal primers with RE sites¶

First, let's read in the fwd and reverse orthogonal primers. We'll go ahead and reverse complement these sequences since we will need them in that format to append to the ORBIT sequences.

In [25]:
df_rev = pd.DataFrame()
i = 0

for record in Bio.SeqIO.parse("reverse_finalprimers.fasta", "fasta"):
    df_rev.loc[i,'rev_seq']=str(record.seq)
    df_rev.loc[i, 'rev_seq_comp']=str(record.seq.reverse_complement())
    df_rev.loc[i, 'rev_primer_name'] = record.name
    i = i+1
In [26]:
df_rev
Out[26]:
rev_seq rev_seq_comp rev_primer_name
0 AAGTATCTTTCCTGTGCCCA TGGGCACAGGAAAGATACTT skpp-1-R
1 TGGTAGTAATAAGGGCGACC GGTCGCCCTTATTACTACCA skpp-2-R
2 AGGGGTATCGGATACTCAGA TCTGAGTATCCGATACCCCT skpp-3-R
3 ATCGATTCCCCGGATATAGC GCTATATCCGGGGAATCGAT skpp-4-R
4 TACTAACTGCTTCAGGCCAA TTGGCCTGAAGCAGTTAGTA skpp-5-R
... ... ... ...
2995 GTCCGTGTAGGATCGCCTTT AAAGGCGATCCTACACGGAC skpp-2996-R
2996 GACTCTAGTGCGGGTGGTAC GTACCACCCGCACTAGAGTC skpp-2997-R
2997 TTGACCAGGGTAAGCCGATC GATCGGCTTACCCTGGTCAA skpp-2998-R
2998 GATTCAAGACGGCACTCGGA TCCGAGTGCCGTCTTGAATC skpp-2999-R
2999 GTAACACCTGTTCGCCGACT AGTCGGCGAACAGGTGTTAC skpp-3000-R

3000 rows × 3 columns

Looks good. Now let's look for specific primers that end with the DpnII recognition site GATC.

In [27]:
df_rev_DpnII = df_rev.loc[df_rev['rev_seq'].str.endswith('GATC', na = False)]
df_rev_DpnII
Out[27]:
rev_seq rev_seq_comp rev_primer_name
349 CCAACCAGAATCGAACGATC GATCGTTCGATTCTGGTTGG skpp-350-R
468 GTGACATCACACGGTTGATC GATCAACCGTGTGATGTCAC skpp-469-R
527 AAGAGGGTCGTATTCCGATC GATCGGAATACGACCCTCTT skpp-528-R
861 CAGCTTTTGGACGATGGATC GATCCATCGTCCAAAAGCTG skpp-862-R
1584 AAAGCCCCACGGAATTGATC GATCAATTCCGTGGGGCTTT skpp-1585-R
1695 TCCGGCTCTCCCTTAAGATC GATCTTAAGGGAGAGCCGGA skpp-1696-R
1856 CGGCTAAGTGAAGTCCGATC GATCGGACTTCACTTAGCCG skpp-1857-R
1888 AACGGCAGGGATGAAAGATC GATCTTTCATCCCTGCCGTT skpp-1889-R
1910 ATCTTCGGAGGGGAGAGATC GATCTCTCCCCTCCGAAGAT skpp-1911-R
2389 GGCCGTTTAAGGGATCGATC GATCGATCCCTTAAACGGCC skpp-2390-R
2889 ATTGCGTTTCGCCATGGATC GATCCATGGCGAAACGCAAT skpp-2890-R
2997 TTGACCAGGGTAAGCCGATC GATCGGCTTACCCTGGTCAA skpp-2998-R

Ok, there are 12 different reverse primers that contain the restriction site.

Now let's read in the fwd primers.

In [28]:
df_fwd = pd.DataFrame()
i = 0

for record in Bio.SeqIO.parse("forward_finalprimers.fasta", "fasta"):
    df_fwd.loc[i,'fwd_seq']=str(record.seq)
    df_fwd.loc[i, 'fwd_primer_name'] = record.name
    #df_fwd.loc[i, 'fwd_rev_comp']=str(record.seq.reverse_complement())
    i = i+1
In [29]:
df_fwd
Out[29]:
fwd_seq fwd_primer_name
0 ATATAGATGCCGTCCTAGCG skpp-1-F
1 CCCTTTAATCAGATGCGTCG skpp-2-F
2 TTGGTCATGTGCTTTTCGTT skpp-3-F
3 GGGTGGGTAAATGGTAATGC skpp-4-F
4 TCCGACGGGGAGTATATACT skpp-5-F
... ... ...
2995 GTCGATCACCGCCCCTTTTA skpp-2996-F
2996 CACGGAGGCAGCAAGACTTA skpp-2997-F
2997 AGGTCGAAGTGTCGCGTAAA skpp-2998-F
2998 TGTGCACTATCGATCACGGG skpp-2999-F
2999 GTTTCGTTGTTTTCGGCCGT skpp-3000-F

3000 rows × 2 columns

Let's look for the BtsI recognition site GCAGTGNN. Remember we'll add the two NN nucleotides after the primer sequence.

In [30]:
df_fwd_BtsI = df_fwd.loc[df_fwd['fwd_seq'].str.endswith('GCAGTG', na = False)]

df_fwd_BtsI
Out[30]:
fwd_seq fwd_primer_name

Uh oh, no primers with that 6bp sequence. There are only 4 with the first 5 bp, so let's try just the first 4 bp GCAG :

In [31]:
df_fwd_BtsI = df_fwd.loc[df_fwd['fwd_seq'].str.endswith('GCAG', na = False)]

df_fwd_BtsI
Out[31]:
fwd_seq fwd_primer_name
469 CCGTAGATAACACAACGCAG skpp-470-F
765 TCGTCTTAGTACGATCGCAG skpp-766-F
1074 GATTGGATAAATGGCCGCAG skpp-1075-F
1140 AATCTAAGACTCCGTCGCAG skpp-1141-F
2097 GGACCTAGCATCAAACGCAG skpp-2098-F
2392 CATGGAGAAGGCACTTGCAG skpp-2393-F

Ok, there are 6 fwd primers that contain this ending sequence.

Now let's add the rest of the BtsI site and the final two NNs, making sure the last nt is a 'T', which will allow us to use the USER enzyme as a backup plan. We'll just make the last two NNs 'CT' to balance the GC content. Therefore to get GCAGTGNN from GCAG we need to add TGCT. Remember, we are not limited for length by these ORBIT oligos, since so the oligo homology + attB is only 128 nt, with primers 168 nt, significantly less than our 200 nt limit. In fact we will simply add random sequence to these oligos to make them 200 nt as the final step in combine_orders.ipynb.

In [32]:
df_fwd_BtsI['fwd_seq_full'] = df_fwd['fwd_seq'] + 'TGCT'

df_fwd_BtsI
<ipython-input-32-4ac4b6df3b3b>:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_fwd_BtsI['fwd_seq_full'] = df_fwd['fwd_seq'] + 'TGCT'
Out[32]:
fwd_seq fwd_primer_name fwd_seq_full
469 CCGTAGATAACACAACGCAG skpp-470-F CCGTAGATAACACAACGCAGTGCT
765 TCGTCTTAGTACGATCGCAG skpp-766-F TCGTCTTAGTACGATCGCAGTGCT
1074 GATTGGATAAATGGCCGCAG skpp-1075-F GATTGGATAAATGGCCGCAGTGCT
1140 AATCTAAGACTCCGTCGCAG skpp-1141-F AATCTAAGACTCCGTCGCAGTGCT
2097 GGACCTAGCATCAAACGCAG skpp-2098-F GGACCTAGCATCAAACGCAGTGCT
2392 CATGGAGAAGGCACTTGCAG skpp-2393-F CATGGAGAAGGCACTTGCAGTGCT

With that we can concatenate the fwd and reverse primer dataframes.

In [33]:
df_fwd_rev = pd.concat([df_fwd_BtsI.reset_index(drop = True), df_rev_DpnII[0:6].reset_index(drop = True)], axis = 1, sort = False)
df_fwd_rev
Out[33]:
fwd_seq fwd_primer_name fwd_seq_full rev_seq rev_seq_comp rev_primer_name
0 CCGTAGATAACACAACGCAG skpp-470-F CCGTAGATAACACAACGCAGTGCT CCAACCAGAATCGAACGATC GATCGTTCGATTCTGGTTGG skpp-350-R
1 TCGTCTTAGTACGATCGCAG skpp-766-F TCGTCTTAGTACGATCGCAGTGCT GTGACATCACACGGTTGATC GATCAACCGTGTGATGTCAC skpp-469-R
2 GATTGGATAAATGGCCGCAG skpp-1075-F GATTGGATAAATGGCCGCAGTGCT AAGAGGGTCGTATTCCGATC GATCGGAATACGACCCTCTT skpp-528-R
3 AATCTAAGACTCCGTCGCAG skpp-1141-F AATCTAAGACTCCGTCGCAGTGCT CAGCTTTTGGACGATGGATC GATCCATCGTCCAAAAGCTG skpp-862-R
4 GGACCTAGCATCAAACGCAG skpp-2098-F GGACCTAGCATCAAACGCAGTGCT AAAGCCCCACGGAATTGATC GATCAATTCCGTGGGGCTTT skpp-1585-R
5 CATGGAGAAGGCACTTGCAG skpp-2393-F CATGGAGAAGGCACTTGCAGTGCT TCCGGCTCTCCCTTAAGATC GATCTTAAGGGAGAGCCGGA skpp-1696-R

And finally let's clean it up just to the sequences we will append to the beginning (fwd_seq_t) and end (rev_seq_comp) of the ORBIT oligos. In the future, we can return to this notebook to get the actual primer sequences we will use to amplify the ORBIT constructs.

In [34]:
df_fwd_rev = df_fwd_rev[['fwd_seq_full', 'rev_seq_comp', 'fwd_primer_name','rev_primer_name']]

df_fwd_rev
Out[34]:
fwd_seq_full rev_seq_comp fwd_primer_name rev_primer_name
0 CCGTAGATAACACAACGCAGTGCT GATCGTTCGATTCTGGTTGG skpp-470-F skpp-350-R
1 TCGTCTTAGTACGATCGCAGTGCT GATCAACCGTGTGATGTCAC skpp-766-F skpp-469-R
2 GATTGGATAAATGGCCGCAGTGCT GATCGGAATACGACCCTCTT skpp-1075-F skpp-528-R
3 AATCTAAGACTCCGTCGCAGTGCT GATCCATCGTCCAAAAGCTG skpp-1141-F skpp-862-R
4 GGACCTAGCATCAAACGCAGTGCT GATCAATTCCGTGGGGCTTT skpp-2098-F skpp-1585-R
5 CATGGAGAAGGCACTTGCAGTGCT GATCTTAAGGGAGAGCCGGA skpp-2393-F skpp-1696-R

Add primer sequences to ORBIT oligos¶

Now let's actually make our final TWIST constructs that contain our PCR handles, RE sites, and ORBIT targeting oligo.

In [35]:
df_1 = pd.read_csv("twist_orbit_tf_del_FL_short.csv")
df_2 = pd.read_csv("twist_orbit_tf_del_FL_long.csv")

df_3 = pd.read_csv("twist_orbit_tf_del_AO_short.csv")
df_4 = pd.read_csv("twist_orbit_tf_del_AO_long.csv")
In [36]:
len(df_1['oligo'][1])
Out[36]:
128
In [37]:
ends_1 = df_fwd_rev.iloc[0,:].str.lower()

df_1['seq'] = ends_1['fwd_seq_full'] + df_1['oligo'] + ends_1['rev_seq_comp']

df_1['construct'] = 'orbit_tf_del_FL_short'

df_1['forward_primers_0'] = [(int(ends_1['fwd_primer_name'].split('-')[1])-1, 0)] * len(df_1['construct'])
df_1['reverse_primers_0'] = [(int(ends_1['rev_primer_name'].split('-')[1])-1, 152)] * len(df_1['construct'])

df_1_clean = df_1[['seq','construct','forward_primers_0','reverse_primers_0']]
df_1_clean
Out[37]:
seq construct forward_primers_0 reverse_primers_0
0 ccgtagataacacaacgcagtgctAATCTCTCTGCAACCAAAGTGAACCAATGAGAGGCAACAAGAATGggcttgtcgacgacggcggtctccgtcgtcaggatcatTGAGGGTGTTACATGAATTCATACTCAATTGCTGTCATCGGAGTGgatcgttcgattctggttgg orbit_tf_del_FL_short (469, 0) (349, 152)
1 ccgtagataacacaacgcagtgctTATGCACAATAATGTTGTATCAACCACCATATCGGGTGACTTATGggcttgtcgacgacggcggtctccgtcgtcaggatcatTAATCTCTGCCCCGTCGTTTCTGACGGCGGGGAAAATGTTGCTTAgatcgttcgattctggttgg orbit_tf_del_FL_short (469, 0) (349, 152)
2 ccgtagataacacaacgcagtgctGATGAATGAGTTTTCTATAAACTTATACTTAATAATTAGAAGTTAggcttgtcgacgacggcggtctccgtcgtcaggatcatCATGGTAACCTCTCATCTTACTTATGAAATTTTAATGTATTCTGTgatcgttcgattctggttgg orbit_tf_del_FL_short (469, 0) (349, 152)
3 ccgtagataacacaacgcagtgctGCTTCGAAGAGAGACACTACCTGCAACAATCAGGAGCGCAATATGggcttgtcgacgacggcggtctccgtcgtcaggatcatTAAAAATTTAGCTAAACACATATGAATTTTCAGATGTGTTTTATCgatcgttcgattctggttgg orbit_tf_del_FL_short (469, 0) (349, 152)
4 ccgtagataacacaacgcagtgctGGCTAAAATAGAATGAATCATCAATCCGCATAAGAAAATCCTATGggcttgtcgacgacggcggtctccgtcgtcaggatcatTGATCGGCTTTTTTAATCCCATACTTTTCCACAGGTAGATCCCAAgatcgttcgattctggttgg orbit_tf_del_FL_short (469, 0) (349, 152)
... ... ... ... ...
69 ccgtagataacacaacgcagtgctTAAGGGCATCTGTTTTTTATATTCAAGAATGAAAAATTTTTGTCAggcttgtcgacgacggcggtctccgtcgtcaggatcatCATTACCAATACCTTACATATATTACTCATTAATGTATGTGCGAAgatcgttcgattctggttgg orbit_tf_del_FL_short (469, 0) (349, 152)
70 ccgtagataacacaacgcagtgctATATGAGTGTCGAATCCTTATCCAAAACAAGAGGTAACTCTCATGggcttgtcgacgacggcggtctccgtcgtcaggatcatTGAACAAATTTTATCAGGTGACGTTCCGTAAAAAGTTGTATGGAGgatcgttcgattctggttgg orbit_tf_del_FL_short (469, 0) (349, 152)
71 ccgtagataacacaacgcagtgctAGCCATGCACCGTAGACCAGATAAGCTCAGCGCATCCGGCAGTTAggcttgtcgacgacggcggtctccgtcgtcaggatcatCATTTCATACTTACCTTTTTGTACGTACTTACTAAAAGTAAGTTTgatcgttcgattctggttgg orbit_tf_del_FL_short (469, 0) (349, 152)
72 ccgtagataacacaacgcagtgctGGTTATTTAACGGCGCGAGTGTAATCCTGCCAGTGCAAAAAATCAggcttgtcgacgacggcggtctccgtcgtcaggatcatCATACATACTCCACTAGTTATCGTTGATTTTGTCCAACAACTTGTgatcgttcgattctggttgg orbit_tf_del_FL_short (469, 0) (349, 152)
73 ccgtagataacacaacgcagtgctGGTAAAGTAAGGACATTCTTAACCCCCACTTTGAGGTGCCCGATGggcttgtcgacgacggcggtctccgtcgtcaggatcatTAAGAGGGCGTACATCCTTGTACACGTCGGGCAGGAGGGATTAATgatcgttcgattctggttgg orbit_tf_del_FL_short (469, 0) (349, 152)

74 rows × 4 columns

In [38]:
ends_2 = df_fwd_rev.iloc[1,:].str.lower()

df_2['seq'] = ends_2['fwd_seq_full'] + df_2['oligo'] + ends_2['rev_seq_comp']

df_2['construct'] = 'twist_orbit_tf_del_FL_long'

df_2['forward_primers_0'] = [(int(ends_2['fwd_primer_name'].split('-')[1])-1, 0)] * len(df_2['construct'])
df_2['reverse_primers_0'] = [(int(ends_2['rev_primer_name'].split('-')[1])-1, 152)] * len(df_2['construct'])

df_2_clean = df_2[['seq','construct','forward_primers_0','reverse_primers_0']]
df_2_clean
Out[38]:
seq construct forward_primers_0 reverse_primers_0
0 tcgtcttagtacgatcgcagtgctCTATATTATGTGATCTAAATCACTTTTAAGTCAGAGTGAATAATGggcttgtcgacgacggcggtctccgtcgtcaggatcatTAATTCATATTGTACTGTTACGTTGTACAAACCTGTGCCAACGGGgatcaaccgtgtgatgtcac twist_orbit_tf_del_FL_long (765, 0) (468, 152)
1 tcgtcttagtacgatcgcagtgctGAGTCTGGCGGATGTCGACAGACTCTATTTTTTTATGCAGTTTTAggcttgtcgacgacggcggtctccgtcgtcaggatcatCATGACGCCACCGATAACCGTTATTTATCAGACCAAAGAAACTGGgatcaaccgtgtgatgtcac twist_orbit_tf_del_FL_long (765, 0) (468, 152)
2 tcgtcttagtacgatcgcagtgctCGACGAAAATGTCCAGGAAAAATCCTGGAGTCAGATTCAGGGTTAggcttgtcgacgacggcggtctccgtcgtcaggatcatCATATGTTCGTGAATTTACAGGCGTTAGATTTACATACATTTGTGgatcaaccgtgtgatgtcac twist_orbit_tf_del_FL_long (765, 0) (468, 152)
3 tcgtcttagtacgatcgcagtgctGTGGCTCTTGCCACGGTTCAGCATCGGCAAACAGATCCAACATTAggcttgtcgacgacggcggtctccgtcgtcaggatcatCATAATCAGCTCCCTGGTTAAGGATAGCCTTTAGGCTGCCCGGTCgatcaaccgtgtgatgtcac twist_orbit_tf_del_FL_long (765, 0) (468, 152)
4 tcgtcttagtacgatcgcagtgctTTAGCGAGAACTGGTCTTTTATTCGCACTCAGGAGTACATGTATGggcttgtcgacgacggcggtctccgtcgtcaggatcatTGATTTTTAACCTTAACGAAGAGCTATATTAATAACGGCATCAGCgatcaaccgtgtgatgtcac twist_orbit_tf_del_FL_long (765, 0) (468, 152)
... ... ... ... ...
221 tcgtcttagtacgatcgcagtgctAAAGAATTTCGCCAGTTAATGCATCTTTAATCGGGAACTTTCATGggcttgtcgacgacggcggtctccgtcgtcaggatcatTAACGTCAGAAGGTTAATTCTGTTTCCAGCAGCGTCAGGATACTTgatcaaccgtgtgatgtcac twist_orbit_tf_del_FL_long (765, 0) (468, 152)
222 tcgtcttagtacgatcgcagtgctCGCGGAATAATCACGCAATTAACTAAACAAGGTTTAGTGAAGATGggcttgtcgacgacggcggtctccgtcgtcaggatcatTGATGGCGCGATAACGTAGAAAGGCTTCCCGAAGGAAGCCTTGATgatcaaccgtgtgatgtcac twist_orbit_tf_del_FL_long (765, 0) (468, 152)
223 tcgtcttagtacgatcgcagtgctCTATGTGATCTCCATTTCGATTGATTTAGTGTTTATTGACGTATGggcttgtcgacgacggcggtctccgtcgtcaggatcatTGATTATAAAAAAAACTTATTATTTATTTTAGTTTTTATCAGTGGgatcaaccgtgtgatgtcac twist_orbit_tf_del_FL_long (765, 0) (468, 152)
224 tcgtcttagtacgatcgcagtgctTGACGATTTTCCCCGTTCCCGGTTGCTGTACCGGGAACGTATTTAggcttgtcgacgacggcggtctccgtcgtcaggatcatCATTTCTCCAGCACTCTGGAGAAATAGGCAAGACATTGGCAGAAAgatcaaccgtgtgatgtcac twist_orbit_tf_del_FL_long (765, 0) (468, 152)
225 tcgtcttagtacgatcgcagtgctCCGGAAAGATATCGGCTGGCGCGCTATCGAACGCGAGCAGAACTAggcttgtcgacgacggcggtctccgtcgtcaggatcatCATCCTTGTGGGTCCTTACGCGTAATATTGACCGGAAGCCAGAGGgatcaaccgtgtgatgtcac twist_orbit_tf_del_FL_long (765, 0) (468, 152)

226 rows × 4 columns

In [39]:
ends_3 = df_fwd_rev.iloc[2,:].str.lower()

df_3['seq'] = ends_3['fwd_seq_full'] + df_3['oligo'] + ends_3['rev_seq_comp']

df_3['construct'] = 'orbit_tf_del_AO_short'

df_3['forward_primers_0'] = [(int(ends_3['fwd_primer_name'].split('-')[1])-1, 0)] * len(df_3['construct'])
df_3['reverse_primers_0'] = [(int(ends_3['rev_primer_name'].split('-')[1])-1, 152)] * len(df_3['construct'])

df_3_clean = df_3[['seq','construct','forward_primers_0','reverse_primers_0']]
df_3_clean
Out[39]:
seq construct forward_primers_0 reverse_primers_0
0 gattggataaatggccgcagtgctCTCTCTGCAACCAAAGTGAACCAATGAGAGGCAACAAGAATGAACggcttgtcgacgacggcggtctccgtcgtcaggatcatCAACGCTGTAAACTTATTTGAGGGTGTTACATGAATTCATACTCAgatcggaatacgaccctctt orbit_tf_del_AO_short (1074, 0) (527, 152)
1 gattggataaatggccgcagtgctGCACAATAATGTTGTATCAACCACCATATCGGGTGACTTATGCGAggcttgtcgacgacggcggtctccgtcgtcaggatcatCTGTTCGACCAGGAGCTTTAATCTCTGCCCCGTCGTTTCTGACGGgatcggaatacgaccctctt orbit_tf_del_AO_short (1074, 0) (527, 152)
2 gattggataaatggccgcagtgctAAACTTATACTTAATAATTAGAAGTTACATATCATCAGCTGTGTAggcttgtcgacgacggcggtctccgtcgtcaggatcatAAGCATGGTAACCTCTCATCTTACTTATGAAATTTTAATGTATTCgatcggaatacgaccctctt orbit_tf_del_AO_short (1074, 0) (527, 152)
3 gattggataaatggccgcagtgctTCGAAGAGAGACACTACCTGCAACAATCAGGAGCGCAATATGTCAggcttgtcgacgacggcggtctccgtcgtcaggatcatAGTAAGAACATTTGCAGTTAAAAATTTAGCTAAACACATATGAATgatcggaatacgaccctctt orbit_tf_del_AO_short (1074, 0) (527, 152)
4 gattggataaatggccgcagtgctTAAAATAGAATGAATCATCAATCCGCATAAGAAAATCCTATGGAAggcttgtcgacgacggcggtctccgtcgtcaggatcatATGCGTACCATCAAGCCCTGATCGGCTTTTTTAATCCCATACTTTgatcggaatacgaccctctt orbit_tf_del_AO_short (1074, 0) (527, 152)
... ... ... ... ...
69 gattggataaatggccgcagtgctATATTCAAGAATGAAAAATTTTTGTCATTCCTTATGCTCCTTACAggcttgtcgacgacggcggtctccgtcgtcaggatcatCGCCATTACCAATACCTTACATATATTACTCATTAATGTATGTGCgatcggaatacgaccctctt orbit_tf_del_AO_short (1074, 0) (527, 152)
70 gattggataaatggccgcagtgctTGAGTGTCGAATCCTTATCCAAAACAAGAGGTAACTCTCATGCTTggcttgtcgacgacggcggtctccgtcgtcaggatcatAATCTCAAAAGACGATACTGAACAAATTTTATCAGGTGACGTTCCgatcggaatacgaccctctt orbit_tf_del_AO_short (1074, 0) (527, 152)
71 gattggataaatggccgcagtgctAGATAAGCTCAGCGCATCCGGCAGTTATGCCGCACGTTCATCCCGggcttgtcgacgacggcggtctccgtcgtcaggatcatACTCATTTCATACTTACCTTTTTGTACGTACTTACTAAAAGTAAGgatcggaatacgaccctctt orbit_tf_del_AO_short (1074, 0) (527, 152)
72 gattggataaatggccgcagtgctGTGTAATCCTGCCAGTGCAAAAAATCAACAACCACTCTTAACGCCggcttgtcgacgacggcggtctccgtcgtcaggatcatATACATACATACTCCACTAGTTATCGTTGATTTTGTCCAACAACTgatcggaatacgaccctctt orbit_tf_del_AO_short (1074, 0) (527, 152)
73 gattggataaatggccgcagtgctAAAGTAAGGACATTCTTAACCCCCACTTTGAGGTGCCCGATGGAAggcttgtcgacgacggcggtctccgtcgtcaggatcatGTGAAAAAGAAACCGCGTTAAGAGGGCGTACATCCTTGTACACGTgatcggaatacgaccctctt orbit_tf_del_AO_short (1074, 0) (527, 152)

74 rows × 4 columns

In [40]:
ends_4 = df_fwd_rev.iloc[3,:].str.lower()

df_4['seq'] = ends_4['fwd_seq_full'] + df_4['oligo'] + ends_4['rev_seq_comp']

df_4['construct'] = 'orbit_tf_del_AO_long'

df_4['forward_primers_0'] = [(int(ends_4['fwd_primer_name'].split('-')[1])-1, 0)] * len(df_4['construct'])
df_4['reverse_primers_0'] = [(int(ends_4['rev_primer_name'].split('-')[1])-1, 152)] * len(df_4['construct'])

df_4_clean = df_4[['seq','construct','forward_primers_0','reverse_primers_0']]
df_4_clean
Out[40]:
seq construct forward_primers_0 reverse_primers_0
0 aatctaagactccgtcgcagtgctTATTATGTGATCTAAATCACTTTTAAGTCAGAGTGAATAATGGAAggcttgtcgacgacggcggtctccgtcgtcaggatcatGGGCGCGGGAAAGAGAAGTAATTCATATTGTACTGTTACGTTGTAgatccatcgtccaaaagctg orbit_tf_del_AO_long (1140, 0) (861, 152)
1 aatctaagactccgtcgcagtgctCAGACTCTATTTTTTTATGCAGTTTTAACTTTGCAGATAGCCGCAggcttgtcgacgacggcggtctccgtcgtcaggatcatAGCCATGACGCCACCGATAACCGTTATTTATCAGACCAAAGAAACgatccatcgtccaaaagctg orbit_tf_del_AO_long (1140, 0) (861, 152)
2 aatctaagactccgtcgcagtgctAAAATCCTGGAGTCAGATTCAGGGTTATTCGTTAGTGGCAGGATTggcttgtcgacgacggcggtctccgtcgtcaggatcatTGCCATATGTTCGTGAATTTACAGGCGTTAGATTTACATACATTTgatccatcgtccaaaagctg orbit_tf_del_AO_long (1140, 0) (861, 152)
3 aatctaagactccgtcgcagtgctCAGCATCGGCAAACAGATCCAACATTACCTCTCCTCATTTTCAGCggcttgtcgacgacggcggtctccgtcgtcaggatcatTTTCATAATCAGCTCCCTGGTTAAGGATAGCCTTTAGGCTGCCCGgatccatcgtccaaaagctg orbit_tf_del_AO_long (1140, 0) (861, 152)
4 aatctaagactccgtcgcagtgctGCGAGAACTGGTCTTTTATTCGCACTCAGGAGTACATGTATGAGGggcttgtcgacgacggcggtctccgtcgtcaggatcatAGAGAACGCACTGTCGCCTGATTTTTAACCTTAACGAAGAGCTATgatccatcgtccaaaagctg orbit_tf_del_AO_long (1140, 0) (861, 152)
... ... ... ... ...
221 aatctaagactccgtcgcagtgctGAATTTCGCCAGTTAATGCATCTTTAATCGGGAACTTTCATGAAAggcttgtcgacgacggcggtctccgtcgtcaggatcatAGCGCCCGTTTTCAGGGCTAACGTCAGAAGGTTAATTCTGTTTCCgatccatcgtccaaaagctg orbit_tf_del_AO_long (1140, 0) (861, 152)
222 aatctaagactccgtcgcagtgctGGAATAATCACGCAATTAACTAAACAAGGTTTAGTGAAGATGAGAggcttgtcgacgacggcggtctccgtcgtcaggatcatGCGCAGTTACGACAGATTTGATGGCGCGATAACGTAGAAAGGCTTgatccatcgtccaaaagctg orbit_tf_del_AO_long (1140, 0) (861, 152)
223 aatctaagactccgtcgcagtgctTGTGATCTCCATTTCGATTGATTTAGTGTTTATTGACGTATGTACggcttgtcgacgacggcggtctccgtcgtcaggatcatCGTGAGGTTAATCGTGATTGATTATAAAAAAAACTTATTATTTATgatccatcgtccaaaagctg orbit_tf_del_AO_long (1140, 0) (861, 152)
224 aatctaagactccgtcgcagtgctCCGGTTGCTGTACCGGGAACGTATTTAATTCCCCTGCATCGCCCGggcttgtcgacgacggcggtctccgtcgtcaggatcatTAGCATTTCTCCAGCACTCTGGAGAAATAGGCAAGACATTGGCAGgatccatcgtccaaaagctg orbit_tf_del_AO_long (1140, 0) (861, 152)
225 aatctaagactccgtcgcagtgctGCGCGCTATCGAACGCGAGCAGAACTAACGCGACAGTTTTGCCAAggcttgtcgacgacggcggtctccgtcgtcaggatcatCGTCATCCTTGTGGGTCCTTACGCGTAATATTGACCGGAAGCCAGgatccatcgtccaaaagctg orbit_tf_del_AO_long (1140, 0) (861, 152)

226 rows × 4 columns

In [41]:
df_1_clean.to_csv("../../../../data/twist_order/twist_orbit_TF_del_first_last_short.csv")
df_2_clean.to_csv("../../../../data/twist_order/twist_orbit_TF_del_first_last_long.csv")
df_3_clean.to_csv("../../../../data/twist_order/twist_orbit_TF_del_avd_ovlp_short.csv")
df_4_clean.to_csv("../../../../data/twist_order/twist_orbit_TF_del_avd_ovlp_long.csv")

Small RNAs¶

In [42]:
df_sRNA = pd.read_csv("twist_orbit_small_RNA.csv")
df_sRNA
Out[42]:
Unnamed: 0 Gene Name Product Name Left-End-Position Right-End-Position length Direction left_oligo_pos right_oligo_pos oligo
0 0 3'ETS-<i>leuZ</i> small regulatory RNA 3'ETS<sup><i>leuZ</i></sup> 1991748 1991814 66 + 1991747 1991815 TGGTGATTAAAAATTAAGGAGGGTGTAACGACAAGTTGCAGGCACggcttgtcgacgacggcggtctccgtcgtcaggatcatTGGTACCCGGAGCGGGACTTGAACCCGCACAGCGCGAACGCCGAG
1 1 agrA small RNA AgrA 3648063 3648144 81 + 3648062 3648145 AGCACGTCCTTGCAATAGTTTCAGTATGGTATTAGCATTGATGCGggcttgtcgacgacggcggtctccgtcgtcaggatcatACATCCGGATTCGGACAAGGCTTAATATGACGATGACCCAGTGAA
2 2 agrB small regulatory RNA AgrB 3648294 3648375 81 + 3648293 3648376 CGCTAATTCTTGCAATGTTAGCCACTGGCTAATAGTATTGAGCTGggcttgtcgacgacggcggtctccgtcgtcaggatcatACGTCCTGATTCAGACCTCCTTTCAAATGAATAGCCAACTCAAAA
3 3 arcZ small regulatory RNA ArcZ 3350577 3350697 120 + 3350576 3350698 ACTGATTCATGTAACAAATCATTTAAGTTTTGCTATCTTAACTGCggcttgtcgacgacggcggtctccgtcgtcaggatcatAGTGGCTTTTGCCACCCACGCTTTCAGCACTTCTACGTCGTGACG
4 4 arrS small regulatory RNA ArrS 3657985 3658054 69 + 3657984 3658055 CTGAAGACATGAATGCGTTATTTACTCAGGTAATTTCAATGCGTTggcttgtcgacgacggcggtctccgtcgtcaggatcatATTTTAACTTTAGTAATATTCTTCAGAGATCACAAACTGGTTATT
... ... ... ... ... ... ... ... ... ... ...
88 88 sroH small RNA SroH 4190327 4190487 160 + 4190326 4190488 AGAGATCTGATTGTAAGAGAGTAAATACTCAACTATGATAGAGACggcttgtcgacgacggcggtctccgtcgtcaggatcatGTTATTTTGAGGGCTGAGGAAGCTGCTTATTTCTCAATAAGTTGT
89 89 ssrA tmRNA 2755593 2755955 362 + 2755592 2755956 CTGGTCATGGCGCTCATAAATCTGGTATACTTACCTTTACACATTggcttgtcgacgacggcggtctccgtcgtcaggatcatAAATTCTCCATCGGTGATTACCAGAGTCATCCGATGAAGTCCTAA
90 90 ssrS 6S RNA 3055983 3056165 182 + 3055982 3056166 ATGACACTTTTCGGTTTACTGTGGTAGAGTAACCGTGAAGACAAAggcttgtcgacgacggcggtctccgtcgtcaggatcatCCTTCTTATCTGGCACCAGCCATGACGCAACTACCAGAACTCCCA
91 91 symR small regulatory RNA antitoxin SymR 4579835 4579911 76 + 4579834 4579912 TAGCTGGACTTTCCCCATATTTACTGATGATATATACAGGTATTTggcttgtcgacgacggcggtctccgtcgtcaggatcatGACACGCATTCTATTGCACAACCGTTCGAAGCAGAAGTCTCCCCG
92 92 tff putative small RNA T44 189712 189847 135 + 189711 189848 GCATGGAAACAGTTGCCATGATTAAAACCTCTATATAAAAGTTGGggcttgtcgacgacggcggtctccgtcgtcaggatcatGCGCGCTTTATACCACAAATACGTCGTGGACACCAATAATTGTTG

93 rows × 10 columns

In [43]:
ends_sRNA = df_fwd_rev.iloc[4,:].str.lower()

df_sRNA['seq'] = ends_sRNA['fwd_seq_full'] + df_sRNA['oligo'] + ends_sRNA['rev_seq_comp']

df_sRNA['construct'] = 'orbit_small_RNA'

df_sRNA['forward_primers_0'] = [(int(ends_sRNA['fwd_primer_name'].split('-')[1])-1, 0)] * len(df_sRNA['construct'])
df_sRNA['reverse_primers_0'] = [(int(ends_sRNA['rev_primer_name'].split('-')[1])-1, 152)] * len(df_sRNA['construct'])

df_sRNA_clean = df_sRNA[['seq','construct','forward_primers_0','reverse_primers_0']]
df_sRNA_clean
Out[43]:
seq construct forward_primers_0 reverse_primers_0
0 ggacctagcatcaaacgcagtgctTGGTGATTAAAAATTAAGGAGGGTGTAACGACAAGTTGCAGGCACggcttgtcgacgacggcggtctccgtcgtcaggatcatTGGTACCCGGAGCGGGACTTGAACCCGCACAGCGCGAACGCCGAGgatcaattccgtggggcttt orbit_small_RNA (2097, 0) (1584, 152)
1 ggacctagcatcaaacgcagtgctAGCACGTCCTTGCAATAGTTTCAGTATGGTATTAGCATTGATGCGggcttgtcgacgacggcggtctccgtcgtcaggatcatACATCCGGATTCGGACAAGGCTTAATATGACGATGACCCAGTGAAgatcaattccgtggggcttt orbit_small_RNA (2097, 0) (1584, 152)
2 ggacctagcatcaaacgcagtgctCGCTAATTCTTGCAATGTTAGCCACTGGCTAATAGTATTGAGCTGggcttgtcgacgacggcggtctccgtcgtcaggatcatACGTCCTGATTCAGACCTCCTTTCAAATGAATAGCCAACTCAAAAgatcaattccgtggggcttt orbit_small_RNA (2097, 0) (1584, 152)
3 ggacctagcatcaaacgcagtgctACTGATTCATGTAACAAATCATTTAAGTTTTGCTATCTTAACTGCggcttgtcgacgacggcggtctccgtcgtcaggatcatAGTGGCTTTTGCCACCCACGCTTTCAGCACTTCTACGTCGTGACGgatcaattccgtggggcttt orbit_small_RNA (2097, 0) (1584, 152)
4 ggacctagcatcaaacgcagtgctCTGAAGACATGAATGCGTTATTTACTCAGGTAATTTCAATGCGTTggcttgtcgacgacggcggtctccgtcgtcaggatcatATTTTAACTTTAGTAATATTCTTCAGAGATCACAAACTGGTTATTgatcaattccgtggggcttt orbit_small_RNA (2097, 0) (1584, 152)
... ... ... ... ...
88 ggacctagcatcaaacgcagtgctAGAGATCTGATTGTAAGAGAGTAAATACTCAACTATGATAGAGACggcttgtcgacgacggcggtctccgtcgtcaggatcatGTTATTTTGAGGGCTGAGGAAGCTGCTTATTTCTCAATAAGTTGTgatcaattccgtggggcttt orbit_small_RNA (2097, 0) (1584, 152)
89 ggacctagcatcaaacgcagtgctCTGGTCATGGCGCTCATAAATCTGGTATACTTACCTTTACACATTggcttgtcgacgacggcggtctccgtcgtcaggatcatAAATTCTCCATCGGTGATTACCAGAGTCATCCGATGAAGTCCTAAgatcaattccgtggggcttt orbit_small_RNA (2097, 0) (1584, 152)
90 ggacctagcatcaaacgcagtgctATGACACTTTTCGGTTTACTGTGGTAGAGTAACCGTGAAGACAAAggcttgtcgacgacggcggtctccgtcgtcaggatcatCCTTCTTATCTGGCACCAGCCATGACGCAACTACCAGAACTCCCAgatcaattccgtggggcttt orbit_small_RNA (2097, 0) (1584, 152)
91 ggacctagcatcaaacgcagtgctTAGCTGGACTTTCCCCATATTTACTGATGATATATACAGGTATTTggcttgtcgacgacggcggtctccgtcgtcaggatcatGACACGCATTCTATTGCACAACCGTTCGAAGCAGAAGTCTCCCCGgatcaattccgtggggcttt orbit_small_RNA (2097, 0) (1584, 152)
92 ggacctagcatcaaacgcagtgctGCATGGAAACAGTTGCCATGATTAAAACCTCTATATAAAAGTTGGggcttgtcgacgacggcggtctccgtcgtcaggatcatGCGCGCTTTATACCACAAATACGTCGTGGACACCAATAATTGTTGgatcaattccgtggggcttt orbit_small_RNA (2097, 0) (1584, 152)

93 rows × 4 columns

In [44]:
df_sRNA_clean.to_csv("../../../../data/twist_order/twist_orbit_small_RNA.csv")

Computational Environment¶

In [19]:
%load_ext watermark
%watermark -v -p wgregseq,numpy,pandas
CPython 3.8.5
IPython 7.19.0

wgregseq 0.0.1
numpy 1.18.1
pandas 1.2.0
In [ ]: