[00:00] (0.72s)
Hi everyone, my name is Dr. Mahil Kulak
[00:04] (4.16s)
and I'm chief scientific officer at
[00:06] (6.16s)
Exonic. Today I'm going to show you
[00:08] (8.48s)
something more interesting and a bit a
[00:11] (11.20s)
bit advanced.
[00:13] (13.12s)
Let's look at the table. First two
[00:15] (15.28s)
blocks you've seen it already and we
[00:17] (17.44s)
will start from the randomness.
[00:20] (20.08s)
Let's generate something random.
[00:22] (22.88s)
All right. So kind of we get a score and
[00:26] (26.88s)
second block you've seen already. It's a
[00:29] (29.36s)
optimizer. I can connect them like that
[00:33] (33.60s)
and push the button. Run.
[00:37] (37.04s)
Wait a bit and we get hopefully score
[00:41] (41.36s)
improved.
[00:43] (43.60s)
Okay, I'm going to zoom in a bit. Okay,
[00:48] (48.16s)
we're waiting. Waiting.
[00:52] (52.64s)
Yeah, so this model generates the most
[00:56] (56.40s)
improved score. Oh yeah, very good score
[00:59] (59.20s)
19. That's what we want above 15 and
[01:03] (63.92s)
other ones pretty much good. But we want
[01:06] (66.64s)
to make it real because AI model is good
[01:09] (69.52s)
but it's not good in the extremes and
[01:11] (71.60s)
that's kind of extreme. For the for more
[01:14] (74.72s)
realistic view we will use DNA bird
[01:17] (77.52s)
tool. Uh this is the module which takes
[01:20] (80.88s)
our DNA and create embedding and then
[01:25] (85.60s)
look at the real web data from Molinos
[01:28] (88.88s)
paper and find the closest neighbors
[01:31] (91.84s)
which are in this space exists.
[01:36] (96.48s)
I'll push run again and here I go. So
[01:40] (100.72s)
this is the input and underneath
[01:43] (103.60s)
different sequences with the different
[01:45] (105.68s)
scores hopefully improved
[01:48] (108.72s)
but it's kind of small for me. So I want
[01:50] (110.96s)
to see it more in details. To do so I
[01:55] (115.28s)
will use a nodule which is called fast
[01:58] (118.00s)
splitter. It's located under utilities
[02:01] (121.60s)
uh section. It's just over there uh on
[02:05] (125.36s)
the left side. And I connect the DNA
[02:08] (128.56s)
faster with the DNA fast
[02:13] (133.44s)
splitter and push the button run again.
[02:17] (137.76s)
As you can see output is a five DNA
[02:20] (140.40s)
sequences where the top one is original
[02:24] (144.32s)
and for more it's a top first four
[02:28] (148.48s)
sequences the the splitter found. We
[02:32] (152.48s)
want to see how good those ones for for
[02:34] (154.88s)
first one we already know it's 19 1 uh4
[02:39] (159.84s)
and minus 2.54
[02:42] (162.64s)
SK and SH. So I connect the DNA note
[02:46] (166.56s)
with the DNA score. I want to see what
[02:49] (169.20s)
others um sequences
[02:52] (172.32s)
how they get. So I click on the DNA
[02:54] (174.80s)
score. You can use Ctrl Ctrl V and
[02:59] (179.44s)
create another one. Then another one
[03:04] (184.16s)
and another one. Maybe one one. Yeah,
[03:07] (187.68s)
let's do more. And then connect them all
[03:11] (191.28s)
um individually. So we can see the score
[03:15] (195.84s)
for each top sequences
[03:21] (201.92s)
denert module generated. I click button
[03:25] (205.84s)
run and we see okay just a little bit
[03:29] (209.44s)
move. Okay, this is the number one
[03:32] (212.48s)
numbers are the same. This one is kind
[03:36] (216.16s)
not much improvement. This is another
[03:38] (218.80s)
number. Sometimes you get very very
[03:41] (221.44s)
different numbers and very often it's
[03:44] (224.08s)
improving but this time not. So we will
[03:46] (226.56s)
stick with the top one. Top one is very
[03:49] (229.04s)
nice but it again it's generated by I
[03:51] (231.44s)
and we want to make it more real. I'll
[03:54] (234.72s)
introduce another nodule. Uh it's called
[03:57] (237.60s)
Jaspar. just bar located in a section of
[04:02] (242.08s)
matrix on the left side you can scroll
[04:06] (246.88s)
this is just bar score matrix put them
[04:09] (249.76s)
in another nodule and I'll take the d
[04:13] (253.28s)
number one and connect
[04:15] (255.84s)
push button run and it gave me the list
[04:19] (259.04s)
of transcription factors ranked so
[04:22] (262.00s)
number one is the highest ber and then
[04:25] (265.68s)
goes down
[04:27] (267.52s)
uh those description factors DNA
[04:31] (271.52s)
DNA and protein uh complexes are forming
[04:35] (275.04s)
within this uh DNA fragments which we
[04:37] (277.92s)
going to test in the wet lab stronger
[04:40] (280.64s)
binder and more binders on the DNA more
[04:43] (283.68s)
active DNA fragment uh becomes when it
[04:47] (287.52s)
gets inside of a cell we can uh add more
[04:51] (291.44s)
or less uh desirable binder uh
[04:54] (294.48s)
description factors or binders we can
[04:57] (297.12s)
say uh to manually adding them to the
[05:00] (300.88s)
original sequence. First of all, we're
[05:03] (303.20s)
going to take the sequence uh we're
[05:05] (305.60s)
working with. We can copy it from there
[05:08] (308.16s)
directly and transfer in DNA design
[05:12] (312.80s)
panel replacing existed DNA which we
[05:15] (315.52s)
started from to a new one. And this is
[05:18] (318.48s)
it. See scores are very much improved
[05:22] (322.16s)
but still kind of uh K562 is a bit uh
[05:26] (326.72s)
too high. We want to be want to have it
[05:28] (328.88s)
below one which is very close but still
[05:30] (330.80s)
not. Okay. So going back to the
[05:34] (334.96s)
transcription factors we will look at
[05:37] (337.12s)
them. Scroll down. What I like
[05:41] (341.92s)
H and F1A
[05:45] (345.68s)
to see where it binds with our sequence
[05:48] (348.24s)
DNDA. I copy the ID number like this
[05:54] (354.32s)
and go back to DNA viewer.
[05:58] (358.88s)
Click a view motif
[06:01] (361.60s)
and paste the ID number. Here you go.
[06:04] (364.96s)
You highlight it, choose this
[06:07] (367.12s)
information about it. And in order to
[06:09] (369.68s)
see more clearly now this all signals
[06:12] (372.00s)
overlay, I go back to the AI maps. I can
[06:17] (377.04s)
remove the signal. Oh, see this
[06:20] (380.56s)
transcription factor binds in the middle
[06:22] (382.08s)
of the sequence. And I want to I want to
[06:24] (384.48s)
see what's going to happen if I make
[06:26] (386.88s)
more. I do the copy and paste. Put them
[06:30] (390.00s)
over there.
[06:31] (391.92s)
Oh, score is even better. See, now it's
[06:35] (395.36s)
improved. 22. It's below 25, above 15.
[06:40] (400.88s)
Off target, below one. I like it. So now
[06:44] (404.64s)
I can take the sequence like this. Copy
[06:48] (408.40s)
it again
[06:51] (411.04s)
and submit. So push button say I'll text
[06:54] (414.16s)
it this time. Oh, let's do without
[06:57] (417.84s)
saving because it's for your purposes.
[07:00] (420.80s)
But then I push complete
[07:04] (424.08s)
over there. Okay, here we go. Submit
[07:07] (427.28s)
DNA. I paste my sequence. Push the
[07:10] (430.80s)
button submit.
[07:13] (433.12s)
Let's see whether it's
[07:15] (435.76s)
going to be correct. Oh yeah, that's
[07:17] (437.44s)
correct. Can you see
[07:20] (440.24s)
very good numbers? So yeah, that's it.
[07:23] (443.28s)
So good luck.