So... Shipping AI Apps Is Hard

[00:00] (0.16s)

So, I recently added an AI agent to my

[00:02] (2.48s)

daily planning app, Ellie, and it can do

[00:04] (4.32s)

things like time box your day, bulkedit

[00:06] (6.72s)

tasks, and basically act as a personal

[00:08] (8.80s)

assistant. This is the first major AI

[00:10] (10.56s)

feature that I ship, but getting it to

[00:12] (12.16s)

the finish line was way harder than I

[00:14] (14.16s)

anticipated. There is so much stuff

[00:15] (15.84s)

people don't tell you when it comes to

[00:17] (17.44s)

shipping AI products, and this is what

[00:19] (19.28s)

this video is going to be about. This is

[00:20] (20.72s)

not a tutorial video. I already have a

[00:22] (22.24s)

step-by-step video on my channel about

[00:24] (24.16s)

how to build an AI agent and build this

[00:26] (26.24s)

feature from scratch. So, check that out

[00:27] (27.76s)

if you want the basics. This is a video

[00:29] (29.28s)

about the stuff that is not covered in

[00:31] (31.12s)

those basic tutorials because building

[00:32] (32.80s)

AI features is a bit different than

[00:34] (34.32s)

traditional software. The cost problems,

[00:36] (36.32s)

the security problems, the design

[00:38] (38.00s)

problems. I'm going to share all the

[00:39] (39.28s)

lessons I learned while getting this

[00:40] (40.56s)

feature to the finish line. If you're

[00:42] (42.08s)

planning on building anything with AI,

[00:43] (43.52s)

these are the walls that you're going to

[00:44] (44.64s)

hit, and I want to make sure that you

[00:46] (46.00s)

see them coming.

[00:49] (49.44s)

Let's start with something that I really

[00:50] (50.88s)

didn't keep in my mind while I was

[00:52] (52.40s)

building this, and that's cost. When I

[00:54] (54.24s)

was building this thing, I really was

[00:55] (55.52s)

just trying to get it over the finish

[00:56] (56.88s)

line. I kind of had costs in mind, but I

[00:58] (58.88s)

wasn't thinking too much of it. But once

[01:00] (60.64s)

I started getting closer to shipping and

[01:02] (62.32s)

I looked at how much the stuff was

[01:03] (63.84s)

costing, just in my case alone, I had

[01:05] (65.84s)

spent over $30 in a single month.

[01:08] (68.16s)

Problem is that the subscription price

[01:09] (69.44s)

of the app is only $10 a month. So I'd

[01:11] (71.36s)

be losing $20 every single month just

[01:13] (73.68s)

through my own usage. So before

[01:15] (75.12s)

launching, I had to sit down and

[01:16] (76.56s)

seriously figure out how to optimize

[01:18] (78.00s)

this. And I'm going to share some of the

[01:19] (79.20s)

stuff that I learned with you guys. So

[01:20] (80.64s)

the first thing was that the system

[01:21] (81.92s)

prompt was way too long. When you're

[01:23] (83.92s)

developing an app, as you're

[01:24] (84.96s)

encountering issues and edge cases,

[01:26] (86.56s)

you're going to start adding things to

[01:27] (87.68s)

the system prompt to get it to function

[01:29] (89.28s)

the way that you want. And in my case,

[01:30] (90.88s)

my system prompt got really long.

[01:32] (92.88s)

Something I didn't consider was that the

[01:34] (94.24s)

system prompt gets sent every single

[01:36] (96.00s)

time you're sending a message. Even if

[01:37] (97.76s)

I'm just saying hi in the chat, that one

[01:39] (99.76s)

word is going to be sent over, but the

[01:41] (101.36s)

entire system prompt would also be sent

[01:43] (103.36s)

along with it, too. And every subsequent

[01:45] (105.20s)

message would include that system

[01:46] (106.56s)

prompt. And all of this does add up over

[01:48] (108.48s)

time. In my case, I kind of went

[01:49] (109.84s)

overboard. The system prompt was almost

[01:51] (111.60s)

8,000 tokens long. So, I did a lot of

[01:53] (113.60s)

optimizations to cut that down to around

[01:56] (116.00s)

3,000. And I still think that that's

[01:58] (118.00s)

pretty long, and there's a lot more that

[01:59] (119.28s)

I can do, but for the time being, it

[02:00] (120.80s)

seems okay for now. The second mistake I

[02:02] (122.40s)

was doing was sending the entire

[02:03] (123.76s)

conversation history with each message.

[02:05] (125.76s)

During testing, this was not a problem

[02:07] (127.20s)

because I was sending maybe two to three

[02:08] (128.96s)

messages at a time. So, the whole chat

[02:10] (130.96s)

was really like six messages total. It

[02:12] (132.96s)

wasn't a big deal. But something I

[02:14] (134.16s)

noticed during actual usage and during

[02:15] (135.92s)

the beta testing was people like to keep

[02:17] (137.92s)

the chat window open for 2 to 3 days.

[02:20] (140.48s)

And those conversations would end up

[02:21] (141.76s)

being 50 plus messages long. So imagine

[02:23] (143.84s)

sending a single message and then the

[02:25] (145.44s)

entire chat history with 50 messages

[02:27] (147.36s)

gets sent along with that. That would

[02:28] (148.80s)

add up a lot over time. And this is

[02:30] (150.40s)

actually where the bulk of the cost was

[02:32] (152.08s)

coming from. There's a ton of ways to

[02:33] (153.44s)

solve it, but the way that I did it was

[02:34] (154.96s)

doing a sort of window technique where I

[02:37] (157.12s)

only send the last couple messages in

[02:39] (159.44s)

the chat to the LLM for processing. I

[02:41] (161.92s)

had to really play around with what that

[02:43] (163.28s)

window felt like. The optimal amount

[02:44] (164.88s)

really depends on the AI app itself and

[02:46] (166.80s)

what the usage is. And in my case, most

[02:49] (169.12s)

people were using it to just send

[02:50] (170.80s)

one-off instructions to an LLM. They

[02:52] (172.72s)

didn't really need much of the

[02:53] (173.60s)

conversation history to do that. So in

[02:55] (175.20s)

my case, I kept it to the last 10

[02:56] (176.64s)

messages, which seemed to work pretty

[02:58] (178.32s)

well. The big issue with this is what if

[03:00] (180.00s)

they ask about something earlier in the

[03:01] (181.68s)

conversation and it's cut off. And yes,

[03:03] (183.20s)

that is a big problem again for the use

[03:04] (184.56s)

case of this assistant. I think most

[03:06] (186.32s)

people won't be doing that. But a

[03:07] (187.76s)

technique I might try to do is

[03:08] (188.88s)

summarizing and compressing the earlier

[03:10] (190.64s)

messages so that they are sent in

[03:12] (192.32s)

context, but it doesn't eat up as many

[03:13] (193.60s)

tokens as sending the entire

[03:14] (194.88s)

conversation. This was just the basics.

[03:16] (196.56s)

There's a ton more I plan on exploring

[03:18] (198.08s)

with cost optimization, but these were

[03:19] (199.84s)

the two biggest things that I did to

[03:21] (201.28s)

really get the cost down.

[03:25] (205.44s)

The next thing I really didn't have on

[03:26] (206.96s)

my mind was how was I going to prevent

[03:28] (208.96s)

abuse? Even not intentional abuse, but

[03:30] (210.96s)

people accidentally abusing the system.

[03:32] (212.80s)

So, an example is there was no limit to

[03:35] (215.12s)

what people can put in the chat box. In

[03:37] (217.04s)

theory, someone could just insert an

[03:38] (218.48s)

entire book in there and then I would be

[03:39] (219.84s)

on the hook for that and it would cost

[03:40] (220.88s)

me like $20 for that single message. or

[03:42] (222.96s)

someone could just spam the chat with a

[03:44] (224.32s)

thousand messages and I would be on the

[03:45] (225.76s)

hook for that too. I had to think

[03:47] (227.04s)

through a couple of these scenarios and

[03:48] (228.48s)

put systems in place to prevent some of

[03:50] (230.56s)

this from happening, whether it be

[03:51] (231.92s)

intentional or non-intentional abuse.

[03:54] (234.00s)

Here's a couple things that I did that

[03:55] (235.28s)

you can implement in your own

[03:56] (236.32s)

application. The first thing I did was

[03:57] (237.84s)

set a message size limit. This is the

[04:00] (240.08s)

max size that a message can be before

[04:01] (241.84s)

it's either truncated or just rejected

[04:03] (243.68s)

by the system. In my case, the max

[04:05] (245.44s)

message size is about 10,000 tokens. And

[04:07] (247.28s)

in real world usage, I have not come

[04:09] (249.36s)

close to that limit. So, I think that's

[04:10] (250.96s)

okay for now. The second thing was to

[04:12] (252.56s)

add some per user rate limits. This is a

[04:15] (255.44s)

limit on the max number of messages that

[04:17] (257.20s)

a user can send every single day and

[04:19] (259.20s)

every single month. So in my case, I

[04:21] (261.12s)

capped it at 100 messages per day and

[04:23] (263.60s)

a,000 messages per month. And again,

[04:25] (265.28s)

it's really dependent on the AI

[04:26] (266.72s)

application and how users are going to

[04:28] (268.56s)

use it. But in my case, I really can't

[04:30] (270.16s)

see people sending more than 100

[04:31] (271.60s)

messages because again, they're really

[04:32] (272.80s)

just using this to send commands for

[04:34] (274.16s)

Ellie. This isn't like chat GPT where

[04:36] (276.00s)

they're going to be sending thousands of

[04:37] (277.28s)

messages a day and having entire

[04:38] (278.64s)

conversations. At least in my case, I

[04:40] (280.48s)

never got close to sending 100 messages

[04:42] (282.24s)

per day. So, I think that's a pretty

[04:43] (283.60s)

good limit for now, but if people

[04:45] (285.04s)

complain, I'm more than happy to raise

[04:46] (286.48s)

that. The third thing I did was to build

[04:48] (288.08s)

a remote kill switch. So, this is the

[04:50] (290.00s)

ability for me to turn off the assistant

[04:52] (292.40s)

for a specific user. I did set up some

[04:54] (294.56s)

analytics using a service called Post

[04:56] (296.08s)

Hog, so I can see how much money is this

[04:58] (298.80s)

app incurring, and I can even break down

[05:00] (300.40s)

and see how much is each specific user

[05:02] (302.48s)

using. If I see someone racking up a

[05:04] (304.16s)

huge bill and it looks kind of

[05:05] (305.52s)

suspicious to me, what I can do is just

[05:07] (307.12s)

press a button, turn it off for them,

[05:08] (308.72s)

and then I can reach out to them and ask

[05:10] (310.08s)

them, "Hey, just checking what are you

[05:11] (311.68s)

doing with this? Why are you sending

[05:12] (312.88s)

this many messages?" And if it looks

[05:14] (314.32s)

legitimate, I'll turn it back on and and

[05:15] (315.76s)

then if it's not, we'll deal with that.

[05:16] (316.96s)

I guess on the note, the fourth thing I

[05:18] (318.16s)

did to prevent abuse was that analytic

[05:19] (319.84s)

system. I do recommend adding some sort

[05:22] (322.16s)

of system. And you could either use

[05:23] (323.28s)

Postgog or you could do this manually,

[05:24] (324.64s)

but you should have a way to view at

[05:26] (326.16s)

minimum how many tokens and how much

[05:27] (327.76s)

money is your app consuming. And if

[05:29] (329.28s)

possible, do that on a per user level so

[05:31] (331.36s)

you can see who is using the most. Is

[05:33] (333.12s)

there something weird going on? I'm very

[05:34] (334.64s)

surprised by the number of apps that

[05:35] (335.92s)

don't have that in place in day one.

[05:39] (339.12s)

[Music]

[05:40] (340.48s)

So the next learning is to not reinvent

[05:42] (342.32s)

the wheel. After my last video, a bunch

[05:43] (343.84s)

of people reached out to me and said,

[05:44] (344.96s)

"Hey, you know, there are libraries out

[05:46] (346.56s)

there that do a lot of the stuff that

[05:48] (348.16s)

you implemented yourself out of the

[05:49] (349.84s)

box." When I built the application in my

[05:51] (351.68s)

first video, I did everything from

[05:53] (353.28s)

scratch from the streaming to the tool

[05:55] (355.04s)

calling to the max number of tool calls

[05:57] (357.12s)

that could happen in a single loop. All

[05:58] (358.64s)

of that stuff was built manually and

[06:00] (360.08s)

from scratch. Then people pointed me to

[06:01] (361.76s)

the Verscell AI SDK. I'd heard about it

[06:04] (364.08s)

in the past, but I was hesitant because

[06:05] (365.36s)

I didn't want to be locked into

[06:06] (366.56s)

anything. But after doing a lot more

[06:07] (367.76s)

research, I realized that it actually

[06:09] (369.20s)

did a lot of the stuff that I did in my

[06:11] (371.68s)

first video out of the box and way

[06:13] (373.52s)

better than I did it myself. It handled

[06:15] (375.12s)

things like being able to do the

[06:16] (376.32s)

streaming correctly with proper error

[06:17] (377.92s)

handling, tool calling with automatic

[06:19] (379.84s)

retries, managing the conversation

[06:22] (382.08s)

state. The system that I built kind of

[06:23] (383.60s)

worked, but there were times when the

[06:24] (384.80s)

stream would fail or some of the tool

[06:26] (386.48s)

callings weren't happening consistently,

[06:28] (388.16s)

but I had a suspicion to get it more

[06:29] (389.60s)

consistent would probably take a lot of

[06:31] (391.04s)

effort. So, I did take a look at the AI

[06:32] (392.80s)

SDK. I did port it over to test and it

[06:35] (395.60s)

actually did solve a lot of the problems

[06:37] (397.12s)

that I was facing. Streaming started

[06:38] (398.64s)

working out of the box very reliably and

[06:40] (400.40s)

the tool calling was way more

[06:42] (402.08s)

consistent, which was a big problem with

[06:43] (403.68s)

the system that I had set up. And the

[06:45] (405.04s)

codebase looked a lot cleaner. So, what

[06:46] (406.64s)

took 100 lines of code in the past ended

[06:48] (408.48s)

up being like 10 lines with the Versel

[06:50] (410.08s)

AI SDK. SDK is completely free. It's

[06:52] (412.80s)

open- source. And this is not sponsored

[06:54] (414.32s)

at all. I just wanted to share this

[06:55] (415.60s)

library because that's what I ended up

[06:57] (417.04s)

using at the end. No regrets doing it

[06:58] (418.64s)

the manual way, though. I did learn a

[07:00] (420.08s)

lot in the process, and it really did

[07:01] (421.60s)

confirm why things like the AI SDK do

[07:04] (424.48s)

exist. And I understand how this stuff

[07:06] (426.00s)

works under the hood a lot better, too.

[07:10] (430.40s)

The next lesson was kind of obvious in

[07:11] (431.92s)

hindsight, but not a lot of people talk

[07:13] (433.68s)

about this. You're probably going to be

[07:14] (434.88s)

using multiple models for a lot of

[07:17] (437.04s)

different things. When I started, I

[07:18] (438.48s)

naively thought, I can do all of this

[07:20] (440.24s)

with one model. I'll probably just use

[07:21] (441.52s)

Gemini Flash or something and it'll all

[07:23] (443.20s)

work perfectly. I wasted a ton of time

[07:25] (445.04s)

tweaking the system prompt, trying to

[07:26] (446.56s)

get it to consistently output or call

[07:28] (448.64s)

specific tools when it turns out it was

[07:30] (450.64s)

actually a problem with the model

[07:31] (451.84s)

itself. Because then when I tried

[07:33] (453.04s)

different models, certain things started

[07:34] (454.80s)

working more consistently. So, that's

[07:36] (456.64s)

something I wish I had a little bit more

[07:37] (457.92s)

of an open mind with going in. It would

[07:39] (459.52s)

have saved a lot of time was that I

[07:41] (461.36s)

would probably be using different models

[07:43] (463.12s)

for different use cases. So, some

[07:44] (464.80s)

specific examples, I ended up using

[07:46] (466.32s)

GPT40 Mini for a lot of stuff because it

[07:49] (469.12s)

seemed to outperform Gemini Flash in

[07:50] (470.96s)

most cases. There were certain tasks

[07:52] (472.56s)

related to time boxing, for example,

[07:54] (474.40s)

that it was really struggling with. So,

[07:56] (476.08s)

I had to use GPT40 for those tasks. And

[07:58] (478.80s)

for something like time boxing, even

[08:00] (480.64s)

GPT40 was struggling with it. And my

[08:03] (483.20s)

suspicion is because the time zones were

[08:05] (485.04s)

kind of confusing it. So, after testing

[08:06] (486.48s)

a bunch of models, I actually ended up

[08:07] (487.76s)

using Grock to do the time boxing stuff.

[08:09] (489.68s)

For some weird reason, Grock was very

[08:11] (491.60s)

consistent at dealing with multiple time

[08:13] (493.84s)

zones. And here's a cool technique that

[08:15] (495.44s)

I learned. You can actually put a layer

[08:16] (496.96s)

before it starts the agent to actually

[08:19] (499.28s)

pick which model to use. So, in my case,

[08:21] (501.36s)

I actually have a layer that's using

[08:22] (502.80s)

Gemini Flash to then choose which model

[08:26] (506.32s)

the agent should be running. So, if it's

[08:28] (508.00s)

a really simple task that doesn't really

[08:29] (509.92s)

involve time, I'll use GPT4 Mini. And if

[08:32] (512.72s)

it's a little bit more complex or

[08:34] (514.00s)

involves time zones, then it switches to

[08:35] (515.68s)

GPT40. big benefit is cost and speed

[08:38] (518.88s)

because then it can default to the

[08:40] (520.08s)

cheaper faster model for simpler use

[08:41] (521.92s)

cases and then only go to the bigger

[08:44] (524.00s)

more expensive model when needed and

[08:45] (525.60s)

then I specifically have a tool that

[08:47] (527.12s)

calls Grock just for the time boxing

[08:49] (529.28s)

stuff and Grock is way more expensive

[08:51] (531.12s)

than GPT40 so I only reserve it for that

[08:53] (533.68s)

task when it's needed. I have a feeling

[08:55] (535.28s)

that in the future I'm probably going to

[08:56] (536.48s)

be calling 10 different models here for

[08:58] (538.16s)

a bunch of different use cases. But that

[08:59] (539.68s)

was a really cool technique is using a

[09:01] (541.20s)

very cheap small model to decide which

[09:03] (543.44s)

model to use based on the user's input.

[09:05] (545.92s)

Here's

[09:08] (548.88s)

a couple smaller observations that I had

[09:10] (550.64s)

that I really wanted to share with you

[09:11] (551.92s)

guys. First is that the form factor

[09:13] (553.68s)

actually does matter and I wish I spent

[09:15] (555.52s)

a little bit more time considering that

[09:17] (557.28s)

when building the product. I originally

[09:18] (558.80s)

built the agent just on the web for the

[09:20] (560.88s)

sake of speed thinking that I'd port it

[09:22] (562.48s)

to iOS later, but I should have thought

[09:24] (564.08s)

a little bit harder about where people

[09:25] (565.92s)

were going to be using this agent. The

[09:27] (567.44s)

main use case that I'm seeing so far and

[09:29] (569.36s)

even for myself personally is dictating

[09:31] (571.68s)

quick commands on my phone on the go.

[09:33] (573.60s)

So, I can say something like, "When you

[09:35] (575.04s)

create a task for groceries, add bacon,

[09:36] (576.88s)

eggs, and paper towels to the list, and

[09:38] (578.88s)

it'll just go ahead and do that and

[09:40] (580.48s)

create the task with the relevant

[09:42] (582.48s)

subtasks for me." These actions are so

[09:44] (584.48s)

much nicer with dictation, and it's so

[09:46] (586.00s)

much easier on the mobile version. It's

[09:47] (587.60s)

a small detail, but it's something I

[09:48] (588.96s)

wish I did consider because I could have

[09:50] (590.88s)

launched this a little bit earlier, and

[09:52] (592.32s)

probably mobile first if I'd realized

[09:53] (593.92s)

that sooner. The next observation is

[09:55] (595.60s)

actually pretty cool. It's that

[09:56] (596.88s)

personalization and settings is very

[09:59] (599.20s)

different for AI products than

[10:00] (600.88s)

traditional software. In traditional

[10:02] (602.40s)

software like Ellie, for settings, you

[10:04] (604.00s)

can toggle things like when does the

[10:05] (605.36s)

week start for you or when do you want

[10:07] (607.04s)

to start your day. And these are just

[10:08] (608.24s)

drop downs in the settings menu. But for

[10:10] (610.00s)

AI products, personalization is a little

[10:12] (612.16s)

bit different and actually a lot cooler.

[10:13] (613.92s)

For time boxing preferences for the

[10:15] (615.68s)

user, instead of having a toggle and

[10:17] (617.28s)

drop down for everything, I could just

[10:18] (618.88s)

have a text box and have the user input

[10:20] (620.80s)

whatever preferences they want. So they

[10:22] (622.48s)

can say something like, "I like to go to

[10:23] (623.92s)

the gym in the morning. I like to do all

[10:25] (625.60s)

my personal tasks after work, and I need

[10:28] (628.00s)

a 15-minute break in between each

[10:29] (629.92s)

meeting at minimum." Because at the end

[10:31] (631.36s)

of the day, what I'm going to do is take

[10:32] (632.56s)

this text and inject it into the prompt

[10:34] (634.88s)

so that when the AI is coming up with

[10:36] (636.40s)

the schedule, it just factors all that

[10:38] (638.40s)

stuff in. Maybe I'm alone here, but I

[10:39] (639.84s)

thought that was really cool and it made

[10:40] (640.96s)

me think a lot more about how software

[10:42] (642.88s)

is going to be more personalized in the

[10:44] (644.40s)

future. The last observation was how my

[10:46] (646.32s)

chat and my agent compared to general

[10:48] (648.48s)

tools like chat GPT because a common

[10:50] (650.40s)

thing that I hear from people is what's

[10:52] (652.00s)

the point of building this if Chat GPT

[10:53] (653.84s)

is just going to add this feature or

[10:55] (655.12s)

Claude's going to add this feature. I've

[10:56] (656.72s)

actually used Claude and Chatpt's

[10:58] (658.48s)

calendar integrations to time box and

[11:00] (660.56s)

plan my day. And after using both, I can

[11:02] (662.56s)

say that the experience in Ellie was

[11:04] (664.96s)

completely different than the experience

[11:06] (666.40s)

in Chat EPT. Even though in theory they

[11:08] (668.80s)

do the same thing. At the time of

[11:10] (670.16s)

recording to do something like create a

[11:11] (671.76s)

calendar event in Claude, you type in

[11:13] (673.44s)

the calendar event you want, but it's

[11:14] (674.96s)

going to ask permission to run certain

[11:16] (676.72s)

tools. Then it's going to run the tool.

[11:18] (678.32s)

Then it's going to confirm with you and

[11:19] (679.92s)

then it's going to go create the task.

[11:21] (681.36s)

For some reason, it just feels kind of

[11:22] (682.88s)

clunky and cumbersome with all these

[11:24] (684.32s)

steps. Whereas in Ellie, if I say the

[11:26] (686.16s)

exact same thing, it's just going to do

[11:27] (687.76s)

it and it's just going to make it happen

[11:29] (689.28s)

in one message. I get why they have all

[11:31] (691.04s)

these confirmations. They're building

[11:32] (692.16s)

for a million users, but we are not them

[11:34] (694.32s)

and we can take a little bit more risk

[11:35] (695.84s)

than they can. So, in my case, I did

[11:37] (697.44s)

feel confident enough to just bypass all

[11:39] (699.36s)

of those confirmations and just allow

[11:40] (700.96s)

the tool calls to happen automatically.

[11:42] (702.72s)

And it really does change the nature of

[11:44] (704.88s)

the experience. It feels like there's a

[11:46] (706.24s)

lot less friction and makes me want to

[11:47] (707.84s)

use it compared to using it in chat GPT

[11:50] (710.08s)

or Claude. I think the best way to think

[11:51] (711.44s)

about it is it's like a general app

[11:53] (713.04s)

versus a hypersp specific app. When I

[11:54] (714.96s)

think about general apps versus focused

[11:56] (716.80s)

apps that really solve a problem for a

[11:58] (718.72s)

specific niche, most cases the niche app

[12:00] (720.88s)

probably solves the problem a lot better

[12:02] (722.72s)

than the general app. And users usually

[12:04] (724.72s)

can feel that. And I think the same

[12:06] (726.24s)

thing does apply to AI products.

[12:10] (730.80s)

Shipping AI products is pretty hard, but

[12:12] (732.72s)

not in the ways that I expected. There

[12:14] (734.64s)

was a challenge to make sure that the AI

[12:16] (736.08s)

was smart enough and execute things the

[12:18] (738.00s)

way that I envisioned. But there were

[12:19] (739.52s)

also a lot of consideration like cost,

[12:21] (741.68s)

security, the form factor, a lot of

[12:23] (743.60s)

these things that I don't hear a lot of

[12:25] (745.04s)

people talking about that I wish someone

[12:26] (746.72s)

told me earlier. To summarize everything

[12:28] (748.48s)

here, if you're building an AI product,

[12:30] (750.24s)

I recommend tracking cost from day one.

[12:33] (753.04s)

Building preventions to prevent abuse,

[12:34] (754.96s)

whether it's intentional or

[12:36] (756.08s)

unintentional abuse, remembering and

[12:38] (758.00s)

honestly planning to use multiple models

[12:40] (760.48s)

from the start, considering what the

[12:42] (762.24s)

optimal form factor for your AI is,

[12:44] (764.48s)

whether it be mobile or voice or on the

[12:46] (766.64s)

web. But really considering that when

[12:48] (768.24s)

you're working on a product roadmap,

[12:49] (769.60s)

leveraging existing frameworks like the

[12:51] (771.36s)

Verscell AI SDK to make sure you're not

[12:53] (773.36s)

reinventing the wheel and then figuring

[12:54] (774.96s)

out what your edge is going to be when

[12:56] (776.48s)

you're building your product, especially

[12:58] (778.00s)

comparing against something general like

[12:59] (779.52s)

Chat GPT and Claude and figuring out a

[13:01] (781.84s)

way to make your app stand out. And the

[13:03] (783.28s)

agent that I showed you at the beginning

[13:04] (784.48s)

of the video, hopefully by the time that

[13:06] (786.16s)

this video is out, it should be launched

[13:07] (787.92s)

and you can actually try it yourself if

[13:09] (789.44s)

you want. If you're building an AI

[13:10] (790.64s)

product, I would love to hear what

[13:11] (791.84s)

you're building and some of the problems

[13:13] (793.12s)

that you've encountered along the way.

[13:14] (794.40s)

If it wasn't for you guys, I would not

[13:15] (795.44s)

have found the AI SDK from Versel.

[13:17] (797.52s)

Please drop any tips that you have. I

[13:18] (798.96s)

read every single comment. And if you

[13:20] (800.24s)

like this content, check out my

[13:21] (801.52s)

Instagram and Tik Tok. I post almost

[13:23] (803.12s)

every other day about building

[13:24] (804.24s)

productivity apps. And obviously, if you

[13:25] (805.92s)

like this content, don't forget to

[13:27] (807.36s)

subscribe. But thank you guys so much

[13:28] (808.48s)

for watching and I'll see you guys in

[13:29] (809.92s)

the next video.

[13:35] (815.26s)

[Music]

YouTube Deep Summary

📚 Chapter Summaries (9)

🤖 AI-Generated Summary:

Summary History

The Hidden Challenges of Building AI Products: Lessons from Adding an AI Agent to My App

The Cost Crisis: When Your AI Feature Becomes a Money Pit

1. Bloated System Prompts

2. Conversation History Overload

Preventing Abuse: Protecting Your App (and Wallet)

Essential Protection Measures:

Don't Reinvent the Wheel: Leverage Existing Libraries

The Multi-Model Reality: One Size Doesn't Fit All

Form Factor Matters More Than You Think

AI Settings Are Different (and Better)

How to Beat ChatGPT at Their Own Game

The Bottom Line

📝 Transcript Chapters (9 chapters):

📝 Transcript (470 entries):