YouTube Deep SummaryYouTube Deep Summary

Star Extract content that makes a tangible impact on your life

Video thumbnail

Exonic DNA Competition - Intermediate

Exonic β€’ 2025-11-02 β€’ 7:27 minutes β€’ YouTube

πŸ€– AI-Generated Summary:

Enhancing DNA Sequence Design with AI and Bioinformatics Tools: A Step-by-Step Guide

Hi everyone! I’m Dr. Mahil Kulak, Chief Scientific Officer at Exonic. Today, I want to walk you through an exciting and slightly advanced process of optimizing DNA sequences using AI models combined with bioinformatics tools. This approach not only improves the DNA sequence scores but also ensures their biological realism, which is crucial for successful lab experiments.

Step 1: Generating and Optimizing Random DNA Sequences

We start by generating random DNA sequences and assigning them an initial score using our AI model. The goal is to improve this score, which reflects the quality or functionality of the sequence according to our criteria.

  • Initial random generation: This provides a baseline score.
  • Optimization: Using the optimizer module, we connect it to the random generator and run the model to enhance the sequence scores.
  • Outcome: We aim for a score above 15, and often the AI model can generate sequences with scores as high as 19, which is excellent.

Step 2: Ensuring Biological Realism with DNA Bird Embeddings

AI models can sometimes produce extreme or unrealistic sequences. To address this, we use the DNA Bird tool, which embeds our sequences into a real biological context by comparing them to actual DNA data from Molinos et al.

  • The tool finds the closest neighbors in the embedding space, which represent biologically plausible sequences.
  • This step helps us ground our AI-generated sequences in reality.

Step 3: Splitting and Scoring Sequences for Detailed Analysis

To analyze the generated sequences in detail, we use the DNA Fast Splitter utility:

  • It splits the output into the original sequence plus the top four closest neighbors found by DNA Bird.
  • We then score each sequence individually using multiple DNA Score modules, allowing us to compare how each variant performs.

Step 4: Evaluating Transcription Factor Binding with JASPAR

Biological activity of DNA sequences often depends on transcription factors (TFs) binding to them. To evaluate this:

  • We use the JASPAR matrix tool, which ranks transcription factors based on their binding affinity to our sequence.
  • The highest-ranking TFs indicate where and how strongly proteins might bind to the DNA, affecting its activity inside cells.

Step 5: Refining Sequences by Modulating Transcription Factor Sites

Using insights from JASPAR:

  • We identify key TF binding sites within the sequence using the DNA Viewer tool.
  • By adding or modifying these sites (for example, adding copies of a TF binding motif), we can enhance the sequence’s activity scores.
  • This manual adjustment helps us balance scores such as the K562 activity, aiming to keep undesirable off-target effects low (below 1).

Step 6: Finalizing and Submitting the Optimized Sequence

Once satisfied with the improvements:

  • We copy the refined DNA sequence into the DNA design panel.
  • Submit the sequence for further testing or synthesis.
  • The system confirms the validity of the sequence and displays updated scores.

Key Takeaways

  • Combining AI with real biological data embeddings ensures that optimized DNA sequences are both high-scoring and biologically plausible.
  • Utilizing tools like DNA Fast Splitter and JASPAR provides detailed insights into sequence functionality and protein-DNA interactions.
  • Manual refinement of transcription factor binding sites can further improve sequence performance and reduce potential off-target effects.
  • This integrated approach accelerates the design of functional DNA sequences ready for wet lab testing.

I hope this walkthrough gives you a clear understanding of how advanced AI-driven tools can be harnessed for DNA sequence design. Best of luck with your designs and experiments!

Dr. Mahil Kulak
Chief Scientific Officer, Exonic


πŸ“ Transcript (137 entries):

Hi everyone, my name is Dr. Mahil Kulak and I'm chief scientific officer at Exonic. Today I'm going to show you something more interesting and a bit a bit advanced. Let's look at the table. First two blocks you've seen it already and we will start from the randomness. Let's generate something random. All right. So kind of we get a score and second block you've seen already. It's a optimizer. I can connect them like that and push the button. Run. Wait a bit and we get hopefully score improved. Okay, I'm going to zoom in a bit. Okay, we're waiting. Waiting. Yeah, so this model generates the most improved score. Oh yeah, very good score 19. That's what we want above 15 and other ones pretty much good. But we want to make it real because AI model is good but it's not good in the extremes and that's kind of extreme. For the for more realistic view we will use DNA bird tool. Uh this is the module which takes our DNA and create embedding and then look at the real web data from Molinos paper and find the closest neighbors which are in this space exists. I'll push run again and here I go. So this is the input and underneath different sequences with the different scores hopefully improved but it's kind of small for me. So I want to see it more in details. To do so I will use a nodule which is called fast splitter. It's located under utilities uh section. It's just over there uh on the left side. And I connect the DNA faster with the DNA fast splitter and push the button run again. As you can see output is a five DNA sequences where the top one is original and for more it's a top first four sequences the the splitter found. We want to see how good those ones for for first one we already know it's 19 1 uh4 and minus 2.54 SK and SH. So I connect the DNA note with the DNA score. I want to see what others um sequences how they get. So I click on the DNA score. You can use Ctrl Ctrl V and create another one. Then another one and another one. Maybe one one. Yeah, let's do more. And then connect them all um individually. So we can see the score for each top sequences real denert module generated. I click button run and we see okay just a little bit move. Okay, this is the number one numbers are the same. This one is kind not much improvement. This is another number. Sometimes you get very very different numbers and very often it's improving but this time not. So we will stick with the top one. Top one is very nice but it again it's generated by I and we want to make it more real. I'll introduce another nodule. Uh it's called Jaspar. just bar located in a section of matrix on the left side you can scroll this is just bar score matrix put them in another nodule and I'll take the d number one and connect push button run and it gave me the list of transcription factors ranked so number one is the highest ber and then goes down uh those description factors DNA DNA and protein uh complexes are forming within this uh DNA fragments which we going to test in the wet lab stronger binder and more binders on the DNA more active DNA fragment uh becomes when it gets inside of a cell we can uh add more or less uh desirable binder uh description factors or binders we can say uh to manually adding them to the original sequence. First of all, we're going to take the sequence uh we're working with. We can copy it from there directly and transfer in DNA design panel replacing existed DNA which we started from to a new one. And this is it. See scores are very much improved but still kind of uh K562 is a bit uh too high. We want to be want to have it below one which is very close but still not. Okay. So going back to the transcription factors we will look at them. Scroll down. What I like H and F1A to see where it binds with our sequence DNDA. I copy the ID number like this and go back to DNA viewer. Click a view motif and paste the ID number. Here you go. You highlight it, choose this information about it. And in order to see more clearly now this all signals overlay, I go back to the AI maps. I can remove the signal. Oh, see this transcription factor binds in the middle of the sequence. And I want to I want to see what's going to happen if I make more. I do the copy and paste. Put them over there. Oh, score is even better. See, now it's improved. 22. It's below 25, above 15. Off target, below one. I like it. So now I can take the sequence like this. Copy it again and submit. So push button say I'll text it this time. Oh, let's do without saving because it's for your purposes. But then I push complete over there. Okay, here we go. Submit DNA. I paste my sequence. Push the button submit. Let's see whether it's going to be correct. Oh yeah, that's correct. Can you see very good numbers? So yeah, that's it. So good luck.