Exonic thumbnail

📝 Exonic Blog

Harnessing AI to Design Custom DNA for Targeted Liver Cancer Therapy: A Deep Dive into Carl Bisaga’s Exotic Model

Hello, fellow biotech enthusiasts! Today, let’s explore an exciting journey into the fusion of artificial intelligence and DNA design, narrated by Carl Bisaga, a fourth-year medical student from Chicago. Carl showcases his custom exotic model aimed at winning a cutting-edge DNA sequencing competition hosted by Exonic, a platform pioneering crowdsourced DNA design for precision medicine. Here’s a detailed walkthrough of his process, insights, and the revolutionary potential of AI in targeted chemotherapy drug development.


The Vision: Precision Medicine Meets AI-Driven DNA Design

Chemotherapy has long been a double-edged sword—effective in killing cancer cells but often wreaking havoc on healthy organs, causing severe side effects like hair loss and organ damage. Carl’s project is part of a larger mission to create DNA sequences that code for peptides specifically localized to the liver, thereby enabling drugs that target liver cancer cells with minimal off-target effects. Imagine a chemotherapy treatment that attacks only liver tumors without harming other tissues—this is the promise of precision medicine.


Phase 1: Generating Random DNA Sequences

The journey begins with creating random DNA sequences. These sequences initially hold little biological meaning—think of them as blank canvases consisting mostly of repeated adenine bases. Carl uses a DNA design panel within Exonic’s platform to generate these base sequences, which serve as the starting point for optimization.


Phase 2: Tweaking DNA with AI — The EVO2 Model

Random DNA isn’t enough. Carl integrates AI, specifically the EVO2 model, into his workflow to refine these sequences. The AI tidies up the DNA, making it more "organic" and increasing its likelihood of functioning well in wet lab experiments.

  • Why longer sequences? Carl uses custom JavaScript to extend the DNA sequence to five times its original length before feeding it to EVO2. This provides the AI with richer context, enabling it to generate more meaningful and viable sequences.
  • Parameters for AI tweaking: Temperature, Top K, and Top P control how aggressively EVO2 modifies the sequence. Adjusting these parameters allows balancing creativity and biological plausibility.

Phase 3: Testing and Validating DNA Sequences

No single metric can capture the effectiveness of a DNA sequence, so Carl evaluates multiple parameters to ensure the best candidates:

1. HEPG2 Score (Liver Localization)

  • Derived from a melanoma DNA optimizer, the HEPG2 score estimates how well the peptide localizes to liver cells.
  • More blue in the DNA visualizer indicates higher liver targeting, which is desirable.
  • However, overfitting the HEPG2 metric (scores >6) may reduce overall effectiveness.

2. K562 Score (Blood Localization)

  • Indicates how much the peptide localizes to blood cells.
  • Lower scores are better since blood localization could cause systemic side effects.

3. SKNSH Score (Neuronal Localization)

  • Reflects localization to neurons and the nervous system.
  • Like K562, lower is preferred to avoid unwanted neurological side effects.

4. Perplexity Metric

  • Measures how "natural" the AI believes the DNA sequence is.
  • A score close to 1 indicates the sequence looks biologically plausible; higher scores suggest nonsense sequences.

5. DNA BERT 2 Nearest Metric

  • Acts like a “Facebook for DNA,” comparing your sequence to a repository of 70,000 wet lab-tested DNA strands.
  • Helps identify how similar your design is to known sequences, informing submission strategy.

6. Jasper Score

  • Functions like LinkedIn for DNA, identifying transcription factors that bind to your DNA.
  • Helps predict biological activity and tissue specificity by assessing which transcription factors interact with your sequence.

Putting It All Together: The Workflow in Action

Carl’s model pipeline looks like this:

  1. Generate random DNA → 2. Optimize with melanoma DNA optimizer → 3. Extend and refine with EVO2 AI → 4. Evaluate with multiple metrics (HEPG2, K562, SKNSH, Perplexity, DNA BERT, Jasper) → 5. Select top candidates for submission

This iterative process allows him to balance liver targeting while minimizing off-target effects, all while ensuring the DNA sequences remain biologically plausible.


Real-World Impact and Future Directions

Carl highlights the broader vision: leveraging AI’s ability to process and understand massive genetic sequences to revolutionize drug design. This approach could lead to highly specific chemotherapy drugs that drastically reduce side effects, improving patient quality of life.

He also emphasizes community involvement, encouraging others to experiment with parameters, explore different optimization nodes, and iterate continuously to find the best DNA sequences.


Final Thoughts

Carl’s model represents a powerful convergence of biology, medicine, and artificial intelligence. By thoughtfully combining random sequence generation, AI refinement, and multi-metric evaluation, he’s pushing the boundaries of what’s possible in DNA-based precision medicine.

As the competition unfolds, we look forward to seeing how these innovative designs perform in the wet lab and, ultimately, how they might transform liver cancer treatment.


About the Author

Carl Bisaga is a passionate fourth-year medical student and AI enthusiast based in Chicago. He actively contributes to the Exonic community, blending his medical knowledge with cutting-edge AI tools to advance the field of synthetic biology and precision medicine.


Good luck to all participants in the competition! Stay tuned for updates and breakthroughs in AI-driven DNA design.


If you’re interested in learning more about AI in biotechnology or want to try your hand at DNA sequence design, explore the Exonic platform and join the community pushing the frontiers of medicine.

Enhancing DNA Sequence Design with AI and Bioinformatics Tools: A Step-by-Step Guide

Hi everyone! I’m Dr. Mahil Kulak, Chief Scientific Officer at Exonic. Today, I want to walk you through an exciting and slightly advanced process of optimizing DNA sequences using AI models combined with bioinformatics tools. This approach not only improves the DNA sequence scores but also ensures their biological realism, which is crucial for successful lab experiments.

Step 1: Generating and Optimizing Random DNA Sequences

We start by generating random DNA sequences and assigning them an initial score using our AI model. The goal is to improve this score, which reflects the quality or functionality of the sequence according to our criteria.

  • Initial random generation: This provides a baseline score.
  • Optimization: Using the optimizer module, we connect it to the random generator and run the model to enhance the sequence scores.
  • Outcome: We aim for a score above 15, and often the AI model can generate sequences with scores as high as 19, which is excellent.

Step 2: Ensuring Biological Realism with DNA Bird Embeddings

AI models can sometimes produce extreme or unrealistic sequences. To address this, we use the DNA Bird tool, which embeds our sequences into a real biological context by comparing them to actual DNA data from Molinos et al.

  • The tool finds the closest neighbors in the embedding space, which represent biologically plausible sequences.
  • This step helps us ground our AI-generated sequences in reality.

Step 3: Splitting and Scoring Sequences for Detailed Analysis

To analyze the generated sequences in detail, we use the DNA Fast Splitter utility:

  • It splits the output into the original sequence plus the top four closest neighbors found by DNA Bird.
  • We then score each sequence individually using multiple DNA Score modules, allowing us to compare how each variant performs.

Step 4: Evaluating Transcription Factor Binding with JASPAR

Biological activity of DNA sequences often depends on transcription factors (TFs) binding to them. To evaluate this:

  • We use the JASPAR matrix tool, which ranks transcription factors based on their binding affinity to our sequence.
  • The highest-ranking TFs indicate where and how strongly proteins might bind to the DNA, affecting its activity inside cells.

Step 5: Refining Sequences by Modulating Transcription Factor Sites

Using insights from JASPAR:

  • We identify key TF binding sites within the sequence using the DNA Viewer tool.
  • By adding or modifying these sites (for example, adding copies of a TF binding motif), we can enhance the sequence’s activity scores.
  • This manual adjustment helps us balance scores such as the K562 activity, aiming to keep undesirable off-target effects low (below 1).

Step 6: Finalizing and Submitting the Optimized Sequence

Once satisfied with the improvements:

  • We copy the refined DNA sequence into the DNA design panel.
  • Submit the sequence for further testing or synthesis.
  • The system confirms the validity of the sequence and displays updated scores.

Key Takeaways

  • Combining AI with real biological data embeddings ensures that optimized DNA sequences are both high-scoring and biologically plausible.
  • Utilizing tools like DNA Fast Splitter and JASPAR provides detailed insights into sequence functionality and protein-DNA interactions.
  • Manual refinement of transcription factor binding sites can further improve sequence performance and reduce potential off-target effects.
  • This integrated approach accelerates the design of functional DNA sequences ready for wet lab testing.

I hope this walkthrough gives you a clear understanding of how advanced AI-driven tools can be harnessed for DNA sequence design. Best of luck with your designs and experiments!

Dr. Mahil Kulak
Chief Scientific Officer, Exonic

Getting Started with DNA Design in the Exonic Studio Competition

If you're interested in DNA design and want to dive into an exciting competition, Exonic Studio offers a fantastic platform to get started quickly. In this post, we'll walk you through a super quick start guide on how to use the tools provided inside Exonic Studio to participate in the DNA design competition, optimize sequences, and validate your designs.

Exploring the DNA Design Panel

The first tool you'll encounter is the DNA Design Panel. This is a hands-on, user-friendly interface where you can manually manipulate DNA sequences. Key features include:

  • Copy and Paste DNA Regions: Easily copy segments of DNA and paste them to create new sequences.
  • Randomization: Quickly randomize your DNA sequence to explore different variants.
  • Selection Modes: Choose between grid mode or traditional sequence mode to select and edit your DNA.
  • Visual Layout: The DNA is displayed in a wrapped left-to-right format, making it intuitive to view and edit.

This panel is perfect for experimenting with DNA sequences and making targeted changes.

Using the Malinosis DNA Optimizer

On the right side of the interface, you'll find the Malinosis DNA Optimizer. This tool leverages a model from a recent research paper, which sets a benchmark for the competition. The optimizer offers two main approaches:

  • Greedy Optimizer: This method randomly flips base pairs to incrementally improve the score.
  • Gradient-Based Method: A more sophisticated approach that uses gradients to optimize the sequence in a controlled manner.

Running the Gradient-Based Optimization

To use the gradient-based method:

  1. Set the KL clamp parameter to control how much the sequence can change in each optimization step. Increasing this value allows larger changes.
  2. Choose the number of steps to run (e.g., 30 steps).
  3. Run the optimizer.

This method quickly generates optimized sequences. For example, after running it, you might see a strong score on the target cell line HEP G2 and good off-target scores, indicating specificity.

Balancing Scores and Real-World Validation

One key insight from the competition is that achieving a very high score on the Malinosis model isn't the only goal. Some sequences may score exceptionally high but could be overly optimistic or unrealistic for wet lab validation. The challenge lies in balancing:

  • High on-target activity (e.g., HEP G2 activation)
  • Low off-target effects
  • Design cleverness that translates to practical success

Fine-Tuning Sequences in the DNA Design Panel

After generating an optimized sequence, you can:

  • Paste it into the DNA Design Panel.
  • Visualize where the model predicts high activity and off-target effects using heatmaps.
  • Manually flip base pairs to reduce off-target activation.
  • Incorporate known DNA motifs that enhance target activity.

This iterative process allows you to combine computational optimization with human intuition.

Submitting Your Design

Once satisfied with your sequence:

  1. Copy and paste it into the submission panel.
  2. Run the internal model pipeline to evaluate your design.
  3. Review the scores and various metrics provided.

This feedback loop helps you refine your approach and improve future designs.

Conclusion

Exonic Studio provides a powerful, interactive environment to design and optimize DNA sequences for a cutting-edge competition. By combining manual design with automated optimization tools like the Malinosis DNA Optimizer, you can explore creative solutions and validate them computationally before moving to experimental phases.

Ready to start designing? Jump into Exonic Studio, experiment with the panels, and see how far your DNA designs can go!


Happy Designing!