YouTube Deep SummaryYouTube Deep Summary

Star Extract content that makes a tangible impact on your life

Video thumbnail

The Evo 2 Play: Community Submission (Synthetic Enhancer Design)

Exonic • 2025-11-28 • 24:02 minutes • YouTube

🤖 AI-Generated Summary:

Harnessing AI to Design Custom DNA for Targeted Liver Cancer Therapy: A Deep Dive into Carl Bisaga’s Exotic Model

Hello, fellow biotech enthusiasts! Today, let’s explore an exciting journey into the fusion of artificial intelligence and DNA design, narrated by Carl Bisaga, a fourth-year medical student from Chicago. Carl showcases his custom exotic model aimed at winning a cutting-edge DNA sequencing competition hosted by Exonic, a platform pioneering crowdsourced DNA design for precision medicine. Here’s a detailed walkthrough of his process, insights, and the revolutionary potential of AI in targeted chemotherapy drug development.


The Vision: Precision Medicine Meets AI-Driven DNA Design

Chemotherapy has long been a double-edged sword—effective in killing cancer cells but often wreaking havoc on healthy organs, causing severe side effects like hair loss and organ damage. Carl’s project is part of a larger mission to create DNA sequences that code for peptides specifically localized to the liver, thereby enabling drugs that target liver cancer cells with minimal off-target effects. Imagine a chemotherapy treatment that attacks only liver tumors without harming other tissues—this is the promise of precision medicine.


Phase 1: Generating Random DNA Sequences

The journey begins with creating random DNA sequences. These sequences initially hold little biological meaning—think of them as blank canvases consisting mostly of repeated adenine bases. Carl uses a DNA design panel within Exonic’s platform to generate these base sequences, which serve as the starting point for optimization.


Phase 2: Tweaking DNA with AI — The EVO2 Model

Random DNA isn’t enough. Carl integrates AI, specifically the EVO2 model, into his workflow to refine these sequences. The AI tidies up the DNA, making it more "organic" and increasing its likelihood of functioning well in wet lab experiments.

  • Why longer sequences? Carl uses custom JavaScript to extend the DNA sequence to five times its original length before feeding it to EVO2. This provides the AI with richer context, enabling it to generate more meaningful and viable sequences.
  • Parameters for AI tweaking: Temperature, Top K, and Top P control how aggressively EVO2 modifies the sequence. Adjusting these parameters allows balancing creativity and biological plausibility.

Phase 3: Testing and Validating DNA Sequences

No single metric can capture the effectiveness of a DNA sequence, so Carl evaluates multiple parameters to ensure the best candidates:

1. HEPG2 Score (Liver Localization)

  • Derived from a melanoma DNA optimizer, the HEPG2 score estimates how well the peptide localizes to liver cells.
  • More blue in the DNA visualizer indicates higher liver targeting, which is desirable.
  • However, overfitting the HEPG2 metric (scores >6) may reduce overall effectiveness.

2. K562 Score (Blood Localization)

  • Indicates how much the peptide localizes to blood cells.
  • Lower scores are better since blood localization could cause systemic side effects.

3. SKNSH Score (Neuronal Localization)

  • Reflects localization to neurons and the nervous system.
  • Like K562, lower is preferred to avoid unwanted neurological side effects.

4. Perplexity Metric

  • Measures how "natural" the AI believes the DNA sequence is.
  • A score close to 1 indicates the sequence looks biologically plausible; higher scores suggest nonsense sequences.

5. DNA BERT 2 Nearest Metric

  • Acts like a “Facebook for DNA,” comparing your sequence to a repository of 70,000 wet lab-tested DNA strands.
  • Helps identify how similar your design is to known sequences, informing submission strategy.

6. Jasper Score

  • Functions like LinkedIn for DNA, identifying transcription factors that bind to your DNA.
  • Helps predict biological activity and tissue specificity by assessing which transcription factors interact with your sequence.

Putting It All Together: The Workflow in Action

Carl’s model pipeline looks like this:

  1. Generate random DNA → 2. Optimize with melanoma DNA optimizer → 3. Extend and refine with EVO2 AI → 4. Evaluate with multiple metrics (HEPG2, K562, SKNSH, Perplexity, DNA BERT, Jasper) → 5. Select top candidates for submission

This iterative process allows him to balance liver targeting while minimizing off-target effects, all while ensuring the DNA sequences remain biologically plausible.


Real-World Impact and Future Directions

Carl highlights the broader vision: leveraging AI’s ability to process and understand massive genetic sequences to revolutionize drug design. This approach could lead to highly specific chemotherapy drugs that drastically reduce side effects, improving patient quality of life.

He also emphasizes community involvement, encouraging others to experiment with parameters, explore different optimization nodes, and iterate continuously to find the best DNA sequences.


Final Thoughts

Carl’s model represents a powerful convergence of biology, medicine, and artificial intelligence. By thoughtfully combining random sequence generation, AI refinement, and multi-metric evaluation, he’s pushing the boundaries of what’s possible in DNA-based precision medicine.

As the competition unfolds, we look forward to seeing how these innovative designs perform in the wet lab and, ultimately, how they might transform liver cancer treatment.


About the Author

Carl Bisaga is a passionate fourth-year medical student and AI enthusiast based in Chicago. He actively contributes to the Exonic community, blending his medical knowledge with cutting-edge AI tools to advance the field of synthetic biology and precision medicine.


Good luck to all participants in the competition! Stay tuned for updates and breakthroughs in AI-driven DNA design.


If you’re interested in learning more about AI in biotechnology or want to try your hand at DNA sequence design, explore the Exonic platform and join the community pushing the frontiers of medicine.


📝 Transcript (573 entries):

Hello everybody. My name is Carl Bisaga. Today I'm going to be showing you uh my custom exotic model that I'm going to be using to try to win this awesome competition right here. Um I'm a fourth year medical student here. I'm in downtown Chicago today. A little bit of foggy, but we're not going to let that keep us down here. So, let's get right into it. Um, I made this nice little uh explainer, little PowerPoint screenshot here for you today, um, to explain what I'm going to do. So, here's the title. We got how this model works. So, phase one, first thing we're going to do is create random DNA. Uh, this is essentially DNA with very little meaning attached to it, just something that we uh, just sort of created on our own. Then we move into phase two which is tweaking the DNA with AI specifically the EVO2 model which is conveniently built into Exxonics platform. And then we finish up with phase three which is going to be testing the DNA uh according to several metrics. No one metric is perfect. So we're going to be looking through several different uh metrics including things like the um Malininoa DNA optimizer metrics such as HEPG2A. I'll explain this uh the details of that later on. We're going to be looking at Jasper scores. We're going to be looking at um perplexity from the EVO 2. So, let's dive right into it. So, I know this is small resolution. We'll zoom in a little bit here, but just broad strokes. Here's what I'm working with today. Here's my here's the grand treasure map of my model. So, I went ahead and organized it into sections just as the uh as we covered in the plan earlier. So this first rectangle here, this is where we're creating the DNA. Uh it's going to be very low meaning DNA. It's just kind of random. Step two here, this is going to be within this rectangle right here. That is going to be where we are engaging uh EVO 2. We're feeding the DNA into the EVO 2 model and it's going to tidy it up for us. It's going to make it look organic so that it has higher chance of success in the wet lab in the uh final days of the competition 47 days from now. And final rectangle here that I'm uh showing with my nice mechanical I got the little cursor effects on my mouse here for you guys today. Um this final part is the part where we test the DNA sequence that we've come up with because they can all be winners. Let's be real. So, and we only have three submissions for the competition. So, we want to choose our best three. So, let's dive right into it. I'm going to go zoom in here. Let's get this nice high resolution so you can actually read what's going on here. So, these little these little panels, these are essentially the building blocks of our machine. It's almost like Legos. We get to plug and play here. Uh you can find all types of different uh nodes here in the sidebar. I'm going to be covering a few of these today, the highest yield ones for you, but feel free to explore and uh take a look at some of these other ones. Um most crucial one we start off with here is the DNA design panel. Essentially, um looks like the just, you know, it loaded in a sequence of DNA with uh all adenosines here. is just straight adenoses adenosines which it's probably not biologically feasible. It basically means nothing. So we can go and mess around with it here. Uh what I like to do is I like to drag this down a little bit. And I just press load random a few times. And let me tell you exactly what we're looking at here. So why do these colors change? Okay, we kind of have a good one here. I'll explain why. So the colors change according to um these three metrics we have set out here. So these three metrics they're der derived from an earlier research paper. Um essentially he g2 that is the good metric. It outputs a number that we want to go up. Uh so good will be blue here. So the more blue we see in our little DNA visualizer here, the better it is. And let me explain why this is good. So um I read through some of these papers and essentially HEP G2. So the overarching project we're working on here at Exonic is well not me I'm part of the community but um overall project that this company's working on is we are crowdsourcing DNA uh sequencing because we feel that or because you know blending the lines here a little bit. I know Ben pretty well. He's a good friend of mine. What can I say? So, I like to say we um essentially we're creating DNA strands. We're submitting them to this competition and the DNA strands are transcribed. Essentially, your the proteins in your body read them almost like a book. It's like a cookbook. It tells them what protein to make, what peptide to make. And this peptide is essentially whatever peptide you make, it actually doesn't quite matter. What matters is where it localizes. Um, we don't care too much about the structure, but we do care about where this protein is found. Because if we happen to find the dream sort of that we're working on this competition is if we find a peptide that only is localized to the liver when you when you uh give it to a person as a uh as a medication. If it only exists in the liver and it doesn't exist anywhere else, that means that we can rely upon this peptide, we can take this peptide and attach to it a huge um toxic chemical. And the reason we want to do that is because there are many cancers that occur in the liver. And if we had a drug that only went to those cancer cells in the liver and destroyed them and didn't cause damage anywhere else in the body, that would be the definition of a perfect chemotherapy drug. Um, that's also part of a subbranch of what's called precision medicine that's up and coming here in the world. Um, the typical just a little bit background in case you're not from medical uh background. I always like to give a little bit of context but uh the chemotherapy drugs especially the earlier uh generations of chemotherapy drugs the problem is they they do destroy the cancer but they cause massive damage to every other organ system in the body. That's why you typically see uh you know these unfortunate cancer patients they have to shave their heads because the hair their hair follicles are dying they're under undergoing myopathy. Basically every organ system is occurring some sort of damage because this chemotherapy drug just spreads throughout the body and causes damage everywhere more so usually in the cancer cells but still systemically. So uh what essentially what we're asking of of uh everyone in the competition here is let's make chemotherapy more specific. Let's make it cause less damage. We don't want people to have to shave their heads when they have to uh do chemotherapy. we wouldn't want them to have uh kidney injuries and so forth, lung damage. Um so that's a little bit about, you know, that gives you so that gives you so much more emotion that you can work with here on the project. Uh aside like instead of just looking at these little uh anonymous names to the metrics. So um the master plan here is HEB G2. HEP is the Latin suffix for um sorry this Latin prefix for liver. So, HEPG G2 means that uh according to this model we have here, this simple simple model, uh the more blue we have in our square here, the more it's going to localize in the liver. And that is good. Um that's what we want. K562 that corresponds to localization in your blood. Uh just like your serum, that means that it'll generally hang out in your arteries, your veins, your capillaries, and we don't want that. um that is unwanted for the purposes of chemotherapy or amunotherapy. And for SKNSH that corresponds to localizing the drug localizing in the neurons. So essentially your nervous system and mostly your brain. Your brain is the largest mass of that system. So that means that these two metrics we actually want them to be low. So that means we don't want a lot a lot of red in our visualizer here. So, um, we can just go ahead keep on randomizing this just to get a good framework to start off of here. Sometimes it can be a little difficult. So, I'm going to compromise. We'll get a little bit of purple. You know, sometimes life you got to compromise a little bit. [laughter] This is a fantastic start though. So, we have our DNA sequence here. This is the same DNA sequence sequence we see in our image because it's linked to the Melanino DNA optimizer. it will specifically so this tool it specifically increases it aims to increase the HEPG2 metric and it seeks to decrease the localization in the blood the K5 and the neurons the SK SK um and the way it does this is it's a very simple simple um simple al excuse me algorithm um it is not nearly as complex as the EVO 2 model which is why I tend to put it in the beginning and let the AI model clean up afterwards. The parameters we can change here max steps. So this is where the wisdom of crowds come in. Um I essentially am play putting these there's no correct answer for these numbers putting into uh to these parameters here. If you started off with a much better sequence perhaps you want to go lighter on these. Essentially, this means that I'm allowing the uh optimizer to make 40 changes uh to my DNA. And each of those changes will be three base pairs or letters in length um at a time. You could change this. If you really want to be drastic, you change it to four and so forth. We'll stick with three here. That's a good starting point. Um once I click run here, then this sequence will output into here. So this is not the sequence that we see on the image here. That'll only appear once we run this whole thing. So I'm going to leave that like it is for now. Okay. So that's it for phase one where we created the random DNA. We tweaked it a little bit with some simple uh algorithms here. That word is giving me a hard time today. [laughter] Um and the next step of this uh just as a quick reminder here we are tweaking the DNA with AI. So the way that I like to do that is I've written a little code here in JavaScript. Essentially it takes that uh randomized DNA here as an input and using JavaScript you know oh man code I have to code really Carl like you're really going to do that to me I have to spend four hours of my Sunday evening. No. So I'm going to explain this to you. You can copy paste this code. Essentially this code takes the DNA that we have and it extends it to uh five times its length. As you can see here, this is a much longer series of base pairs. Um, and the reason why we want that is because we are going to feed that longer sequence into the EVO2 model. And why that's valuable is because essentially if imagine we gave uh let's move it to the absolute extremes here. If we gave the EVO2 model one single base pair just an adenosine and we told it okay generate a sequence of 200 base pairs that is going to be valuable here uh for our chem or for our wet lab it's going to be total nonsense because it only has a single data point and on the same uh on the most on the other opposite side of the spectrum imagine we gave it a million base pairs and we told it to generate something that is very useful it'll probably generate something more useful in the longer sequence. So essentially this is taking advantage of the way the AI model works. It likes more data. It likes a longer prompt to give you a better answer. And we've we're inputting this,200 base pair sequence into it and it will go ahead and um change it according to these parameters. Essentially um this is how long we want our output to be. So you can still you still have to output a 200 base pair uh DNA sequence. That's what you're going to submit in the competition here. Uh so you're not going to change this, but you are going to uh play around with these uh few parameters. Temperature is essentially how aggressive you want the AI model to be in your changes. Top K and top P. I'm not going to cover too much today. Seems like a great job for GPT to be to be quite honest. Um, and once it does that, it's going to output its final DNA sequence. That's going to be the last thing that we change here. And it's going to put it into the final section, phase three, which is testing our DNA. As we can see here, uh, we So, it's like I said earlier, we're going through a few different tests because no one test is perfect. It's the same way that you can't pick a boyfriend off of one date. You got to, you know, you got to go visit his parents. You got to go ask him about his favorite movies. You got to annoy him a little bit. See how he responds, right? So, so this is the same thing with our DNA here. Um um so let's go ahead. Yeah. So let's cover this. So the HEPG2, this is the metric I talked about earlier. Um there's some some data that s suggests that uh HEBG2 can be overfitted if the number goes too high. that number is six. So if it's over six, there is a slight danger of overfitting your DNA to um this metric which is a you know no metric is perfect, right? So um it might not necessarily be the best strategy in the competition to uh make this number very large such as in the 20s, 30s, 40s which I've seen in the competition which I've also submitted. Um so that is one risk with this metric. these K562, you want this to be below one as a starting point. Ideally, the closer to zero it is, the better it is because that means that, you know, way on down the line, 10 years from now, the future patients getting perhaps um your drug that you help design. You know, the lower this is, the less side effects they're going to have in their different organ systems. You know, if you're curious, there actually are other metrics that judge how much the drug or the sequence localizes in the lungs, for example. There's other uh other se, you know, every organ is at play here. So, it's not just going to be these few metrics. This is just to start off here. So, it's a a future jumping off point if you really wanted to nail down your DNA sequence. Um, but and then finally, this is just a composite score of the f of the three up here. I'm not quite sure how much it weights each individual metric. Um, as for DNA BERT 2 nearest, so this is a neat little tool that essentially it is almost like a Facebook of DNA sequences. It takes your DNA sequence, which is almost like a name, you type in on Facebook, and it returns the uh top closest um hits that are match your DNA, the closest, and it gives you a similarity score. And the nice thing is it's searching through a repository of 70,000 uh wet lab already tested DNA sequences. So if your sequence is very similar to a sequence that has already been tested in the lab and already performs very well or maybe perhaps not very well that can give you a better decision on whether or not to submit this DNA to the wet lab. Um the scores you're looking at here similarity goes from zero to one. One is more similar uh to the listed sequence from the repository that's already been tested and distance is the inverse of similarity. Um, and it gives you quite a few results here. So, you can walk through this just as you like. So, that's quick explainer on DNA BERT. That was our second little metric or a little test that we gave to our DNA sequence. And then the final one I'm going to cover here today. There are a few others uh down here. These are not too important. This is more so fine-tuning uh that'll perhaps be important if you're really trying to lock in your DNA choice here. The final test here is the Jasper score. And this is important because again it's it if that was Facebook the if the uh DNA BERT was like Facebook Jasper is almost like LinkedIn. It searches through it doesn't search through DNA sequences but it does search through transcription factors that have already been well researched on the uh and uploaded to the internet essentially in an open- source repository. And this score tells you how active this specific transcription factor is. Um so for example um this DNA sequence that was submitted to Jasper, it it searched the DNA sequence and it looked at what transcription factors bind to this DNA sequence. And the top one top hit here is ZNF-558. And we can go ahead and look up more info about what ZNF558 does. And that is very useful because we can theorize, you know, if ZNF-558 is only active in the brain and this is your top transcription factor, probably not a good sign for your DNA sequence. But if it only is active in the liver, then we might be uh on to something with our DNA sequence. uh the score above two I believe that means that trips transcription factor is really going to enjoy bind buying binding to your DNA and that is going to be particularly more active uh with your sequence and there are many perhaps too many transcription factors returned here there's hundreds here so you can really get into the nitty nitty-gritty of the macro of the microbiology here so that is it for the quick run through here let's go ahead and pause the video I'm going to go ahead and run my machine and see what comes out here. All right, everyone. So, our model finished running. It took about a minute and a half here. Very fast. So, I'll walk you through what it did step by step here. Once again, it took this DNA sequence. This is unchanged from how we had it. Now, these metrics are finally accurate. They correspond to our current uh DNA sequence that we generated here randomly. As you can see, terrible, terrible scores. uh it's going all over the blood, it's going all over the neurons, it's causing a lot of damage, the patient's not happy, they're ticked off. So, we went ahead and optimized it with the Melanino DNA optimizer, and it made some changes to our DNA sequence, improved our score. HEPG2 actually went down a little bit, but thankfully the K5 and the SK uh were cut down a little bit. particularly the K5 dropped down by two points and the SK still not the greatest here. Um, ideally want that to be under 0.6 I would say probably even lower than that, but that's a good starting point here just to display what this model is up to. We extended it. We ran it through the AI model, the EVO 2 revolutionary tech here from 2024. And uh one thing I didn't touch on earlier but uh is important to know is uh this perplexity metric is important because it it is the AI telling you how organic your DNA sequence looks. Um for instance, it's the equivalent of someone coming up to you and saying a random series of words and this score essentially would be how do you think this person is a human or not? Because there are certain things that you know you can write down certain sequences of DNA that are total nonsense chemically and biologically and the AI the nice thing is the AI will recognize that and when it is total nonsense it will give you a very high score. it'll be over 10. Generally, the closer it is to one means that your DNA sequence actually corresponds to the the the AI model believes it is something that could be seen in nature, which is very good. Um, and then finally for our so we ran it through the model and so these are the final scores we're looking at. These are the the finishing scores. HEP G2 actually went down a little bit. Uh, but the K5, this is actually the lowest K5 I've ever seen. Wow. 0.03. That's very low. Very good. Um, you know, it's probably lower actually on the competition. I haven't looked at it too much. Um, the SK just under that 0.6 mark. So, it seemed to emphasize lowering the K5 and the SK more instead of increasing the HEP G2. Uh that could be something that you could one strategy to mitigate that would be to put another Malininoa DNA optimizer here and optimize it once again after running it through the EVO 2 model. You could essentially just run this in a loop over and over until you reach some sort of local maximum or perhaps the global maximum as we're hoping here and uh perhaps potentially get a fantastic uh DNA sequence to put in the competition. And as for DNA Bur, as we can see here, we look at our, this is our Facebook test, right? So, we looking at what DNA sequences are closest to the ones we've input into the model. And looks like we're very, very different from the closest cousin we could find on our repository of 70,000 sequences here. So, our sequence is actually very different, which could be good or bad depending on what your strategy here is in the competition. And then finally looking at Jasper, if we really had good scores on the HEPG2 and so forth and the uh DNA Bert was also looking good strategically, then we would move into the Jasper scores and really dive down deep into the weeds here and figure out, okay, what is PX PBX1 do inside of the body? um according to the latest research, how do we theorize that this will help our DNA sequence localize to the liver um to the HEPG2 cells and kill those pesky cancer cells. So that is it for it for me today everyone. Um I'm glad you really excited about this competition. Feel like we can make a huge impact here. This is all about you know the using the dream of AI. We're using cutting edge technology. The value of a machine that can understand a sequence that is millions of base pairs long that a human could never understand is uh revolutionary. And I'm really excited about this project. Really happy I could walk you through it today. And for me today, that is it. Um I'm going to run out downtown today and go get a haircut. So good luck in the competition everyone. I'll see you in the leaderboards. Thanks.