YouTube Deep SummaryYouTube Deep Summary

Star Extract content that makes a tangible impact on your life

Video thumbnail

So... Shipping AI Apps Is Hard

Chris Raroque • 13:36 minutes • Published 2025-07-18 • YouTube

📚 Chapter Summaries (9)

🤖 AI-Generated Summary:

The Hidden Challenges of Building AI Products: Lessons from Adding an AI Agent to My App

Building AI features isn't just about getting the technology to work—it's about navigating a maze of unexpected challenges that most tutorials never mention. After recently shipping an AI agent for my daily planning app, Ellie, I learned this the hard way. While the agent can time-box your day, bulk-edit tasks, and act as a personal assistant, getting it to production revealed problems I never saw coming.

Here are the crucial lessons every AI product builder needs to know before they hit the same walls I did.

The Cost Crisis: When Your AI Feature Becomes a Money Pit

The Problem: I spent over $30 in a single month just on my own usage, while my app subscription costs only $10. That's a $20 loss per user per month—a recipe for bankruptcy.

The Hidden Cost Drivers:

1. Bloated System Prompts

Your system prompt gets sent with every single message, even a simple "hi." Mine ballooned to 8,000 tokens as I added fixes for edge cases. I optimized it down to 3,000 tokens, but there's still room for improvement.

2. Conversation History Overload

During testing, conversations were short. In real usage, people keep chats open for 2-3 days with 50+ messages. Sending entire conversation histories with each new message creates exponential costs.

The Solution: Implement a sliding window technique. I now only send the last 10 messages to the LLM, which works for most use cases. For longer context needs, consider summarizing earlier messages instead of sending them verbatim.

Preventing Abuse: Protecting Your App (and Wallet)

Even well-intentioned users can accidentally break your system. Here's how to build safeguards:

Essential Protection Measures:

  • Message size limits: Cap messages at reasonable token counts (I use 10,000 tokens)
  • Rate limiting: Set daily and monthly message limits per user (100/day, 1,000/month for my use case)
  • Remote kill switch: Build the ability to instantly disable the feature for specific users
  • Analytics system: Track token usage and costs per user from day one using tools like PostHog

These aren't just technical requirements—they're business survival tools.

Don't Reinvent the Wheel: Leverage Existing Libraries

I initially built everything from scratch—streaming, tool calling, conversation management. It worked, but barely. After switching to the Vercel AI SDK, my 100-line implementation became 10 lines with better reliability.

Key benefits of using established libraries:
- Robust error handling out of the box
- Consistent tool calling with automatic retries
- Proper streaming implementation
- Cleaner, more maintainable code

The time saved debugging custom implementations is worth the slight dependency risk.

The Multi-Model Reality: One Size Doesn't Fit All

I naively thought one model could handle everything. Wrong. Different tasks require different models:

  • GPT-4o Mini: General tasks and cost-effective operations
  • GPT-4o: Complex tasks requiring more reasoning
  • Grok: Time zone handling (surprisingly good at this specific task)

Pro tip: Use a cheap model (like Gemini Flash) as a router to decide which model should handle each request. This optimizes both cost and performance.

Form Factor Matters More Than You Think

I built for web first, planning to add mobile later. Big mistake. The primary use case turned out to be voice commands on mobile—dictating quick tasks while on the go.

Consider your AI's natural habitat early:
- Where will users interact with it most?
- What input methods make sense?
- How does the form factor affect the user experience?

AI Settings Are Different (and Better)

Traditional apps use dropdowns and toggles for preferences. AI apps can do something cooler: natural language preferences.

Instead of complex settings menus, users can simply describe their preferences: "I like to go to the gym in the morning, do personal tasks after work, and need 15-minute breaks between meetings."

This text gets injected into prompts, making personalization more intuitive and comprehensive.

How to Beat ChatGPT at Their Own Game

"Why build this when ChatGPT exists?" is the wrong question. The right question is: "How can I make this experience better for my specific use case?"

The difference: ChatGPT asks for permission, confirms actions, and adds friction for safety. As a niche product, you can take calculated risks and remove friction. In my app, saying "create a meeting" just creates it—no confirmations, no extra steps.

The principle: Focused apps usually solve specific problems better than general-purpose tools. Users feel the difference.

The Bottom Line

Shipping AI products is challenging in unexpected ways. The technical implementation is just the beginning. The real challenges are:

  1. Cost management from day one
  2. Abuse prevention systems
  3. Multi-model architecture planning
  4. Form factor optimization
  5. Leveraging existing tools instead of building everything custom
  6. Finding your competitive edge against general AI tools

These aren't just technical considerations—they're fundamental to building sustainable AI products that users actually want to use.

The AI revolution is here, but success belongs to those who understand not just how to build AI features, but how to ship them successfully. Plan for these challenges early, and you'll save yourself months of painful optimization later.

Building an AI product? I'd love to hear about your challenges and solutions. The AI community's shared knowledge is what makes breakthrough products possible.


📝 Transcript Chapters (9 chapters):

📝 Transcript (470 entries):

## Intro / What we are covering today [00:00] So, I recently added an AI agent to my daily planning app, Ellie, and it can do things like time box your day, bulkedit tasks, and basically act as a personal assistant. This is the first major AI feature that I ship, but getting it to the finish line was way harder than I anticipated. There is so much stuff people don't tell you when it comes to shipping AI products, and this is what this video is going to be about. This is not a tutorial video. I already have a step-by-step video on my channel about how to build an AI agent and build this feature from scratch. So, check that out if you want the basics. This is a video about the stuff that is not covered in those basic tutorials because building AI features is a bit different than traditional software. The cost problems, the security problems, the design problems. I'm going to share all the lessons I learned while getting this feature to the finish line. If you're planning on building anything with AI, these are the walls that you're going to hit, and I want to make sure that you ## AI is expensive (how to bring down costs) [00:46] see them coming. Let's start with something that I really didn't keep in my mind while I was building this, and that's cost. When I was building this thing, I really was just trying to get it over the finish line. I kind of had costs in mind, but I wasn't thinking too much of it. But once I started getting closer to shipping and I looked at how much the stuff was costing, just in my case alone, I had spent over $30 in a single month. Problem is that the subscription price of the app is only $10 a month. So I'd be losing $20 every single month just through my own usage. So before launching, I had to sit down and seriously figure out how to optimize this. And I'm going to share some of the stuff that I learned with you guys. So the first thing was that the system prompt was way too long. When you're developing an app, as you're encountering issues and edge cases, you're going to start adding things to the system prompt to get it to function the way that you want. And in my case, my system prompt got really long. Something I didn't consider was that the system prompt gets sent every single time you're sending a message. Even if I'm just saying hi in the chat, that one word is going to be sent over, but the entire system prompt would also be sent along with it, too. And every subsequent message would include that system prompt. And all of this does add up over time. In my case, I kind of went overboard. The system prompt was almost 8,000 tokens long. So, I did a lot of optimizations to cut that down to around 3,000. And I still think that that's pretty long, and there's a lot more that I can do, but for the time being, it seems okay for now. The second mistake I was doing was sending the entire conversation history with each message. During testing, this was not a problem because I was sending maybe two to three messages at a time. So, the whole chat was really like six messages total. It wasn't a big deal. But something I noticed during actual usage and during the beta testing was people like to keep the chat window open for 2 to 3 days. And those conversations would end up being 50 plus messages long. So imagine sending a single message and then the entire chat history with 50 messages gets sent along with that. That would add up a lot over time. And this is actually where the bulk of the cost was coming from. There's a ton of ways to solve it, but the way that I did it was doing a sort of window technique where I only send the last couple messages in the chat to the LLM for processing. I had to really play around with what that window felt like. The optimal amount really depends on the AI app itself and what the usage is. And in my case, most people were using it to just send one-off instructions to an LLM. They didn't really need much of the conversation history to do that. So in my case, I kept it to the last 10 messages, which seemed to work pretty well. The big issue with this is what if they ask about something earlier in the conversation and it's cut off. And yes, that is a big problem again for the use case of this assistant. I think most people won't be doing that. But a technique I might try to do is summarizing and compressing the earlier messages so that they are sent in context, but it doesn't eat up as many tokens as sending the entire conversation. This was just the basics. There's a ton more I plan on exploring with cost optimization, but these were the two biggest things that I did to really get the cost down. ## Preventing people from abusing your app [03:23] The next thing I really didn't have on my mind was how was I going to prevent abuse? Even not intentional abuse, but people accidentally abusing the system. So, an example is there was no limit to what people can put in the chat box. In theory, someone could just insert an entire book in there and then I would be on the hook for that and it would cost me like $20 for that single message. or someone could just spam the chat with a thousand messages and I would be on the hook for that too. I had to think through a couple of these scenarios and put systems in place to prevent some of this from happening, whether it be intentional or non-intentional abuse. Here's a couple things that I did that you can implement in your own application. The first thing I did was set a message size limit. This is the max size that a message can be before it's either truncated or just rejected by the system. In my case, the max message size is about 10,000 tokens. And in real world usage, I have not come close to that limit. So, I think that's okay for now. The second thing was to add some per user rate limits. This is a limit on the max number of messages that a user can send every single day and every single month. So in my case, I capped it at 100 messages per day and a,000 messages per month. And again, it's really dependent on the AI application and how users are going to use it. But in my case, I really can't see people sending more than 100 messages because again, they're really just using this to send commands for Ellie. This isn't like chat GPT where they're going to be sending thousands of messages a day and having entire conversations. At least in my case, I never got close to sending 100 messages per day. So, I think that's a pretty good limit for now, but if people complain, I'm more than happy to raise that. The third thing I did was to build a remote kill switch. So, this is the ability for me to turn off the assistant for a specific user. I did set up some analytics using a service called Post Hog, so I can see how much money is this app incurring, and I can even break down and see how much is each specific user using. If I see someone racking up a huge bill and it looks kind of suspicious to me, what I can do is just press a button, turn it off for them, and then I can reach out to them and ask them, "Hey, just checking what are you doing with this? Why are you sending this many messages?" And if it looks legitimate, I'll turn it back on and and then if it's not, we'll deal with that. I guess on the note, the fourth thing I did to prevent abuse was that analytic system. I do recommend adding some sort of system. And you could either use Postgog or you could do this manually, but you should have a way to view at minimum how many tokens and how much money is your app consuming. And if possible, do that on a per user level so you can see who is using the most. Is there something weird going on? I'm very surprised by the number of apps that don't have that in place in day one. ## Using libraries (like Vercel AI SDK) [05:38] [Music] So the next learning is to not reinvent the wheel. After my last video, a bunch of people reached out to me and said, "Hey, you know, there are libraries out there that do a lot of the stuff that you implemented yourself out of the box." When I built the application in my first video, I did everything from scratch from the streaming to the tool calling to the max number of tool calls that could happen in a single loop. All of that stuff was built manually and from scratch. Then people pointed me to the Verscell AI SDK. I'd heard about it in the past, but I was hesitant because I didn't want to be locked into anything. But after doing a lot more research, I realized that it actually did a lot of the stuff that I did in my first video out of the box and way better than I did it myself. It handled things like being able to do the streaming correctly with proper error handling, tool calling with automatic retries, managing the conversation state. The system that I built kind of worked, but there were times when the stream would fail or some of the tool callings weren't happening consistently, but I had a suspicion to get it more consistent would probably take a lot of effort. So, I did take a look at the AI SDK. I did port it over to test and it actually did solve a lot of the problems that I was facing. Streaming started working out of the box very reliably and the tool calling was way more consistent, which was a big problem with the system that I had set up. And the codebase looked a lot cleaner. So, what took 100 lines of code in the past ended up being like 10 lines with the Versel AI SDK. SDK is completely free. It's open- source. And this is not sponsored at all. I just wanted to share this library because that's what I ended up using at the end. No regrets doing it the manual way, though. I did learn a lot in the process, and it really did confirm why things like the AI SDK do exist. And I understand how this stuff works under the hood a lot better, too. ## Plan to use multiple models early [07:08] The next lesson was kind of obvious in hindsight, but not a lot of people talk about this. You're probably going to be using multiple models for a lot of different things. When I started, I naively thought, I can do all of this with one model. I'll probably just use Gemini Flash or something and it'll all work perfectly. I wasted a ton of time tweaking the system prompt, trying to get it to consistently output or call specific tools when it turns out it was actually a problem with the model itself. Because then when I tried different models, certain things started working more consistently. So, that's something I wish I had a little bit more of an open mind with going in. It would have saved a lot of time was that I would probably be using different models for different use cases. So, some specific examples, I ended up using GPT40 Mini for a lot of stuff because it seemed to outperform Gemini Flash in most cases. There were certain tasks related to time boxing, for example, that it was really struggling with. So, I had to use GPT40 for those tasks. And for something like time boxing, even GPT40 was struggling with it. And my suspicion is because the time zones were kind of confusing it. So, after testing a bunch of models, I actually ended up using Grock to do the time boxing stuff. For some weird reason, Grock was very consistent at dealing with multiple time zones. And here's a cool technique that I learned. You can actually put a layer before it starts the agent to actually pick which model to use. So, in my case, I actually have a layer that's using Gemini Flash to then choose which model the agent should be running. So, if it's a really simple task that doesn't really involve time, I'll use GPT4 Mini. And if it's a little bit more complex or involves time zones, then it switches to GPT40. big benefit is cost and speed because then it can default to the cheaper faster model for simpler use cases and then only go to the bigger more expensive model when needed and then I specifically have a tool that calls Grock just for the time boxing stuff and Grock is way more expensive than GPT40 so I only reserve it for that task when it's needed. I have a feeling that in the future I'm probably going to be calling 10 different models here for a bunch of different use cases. But that was a really cool technique is using a very cheap small model to decide which model to use based on the user's input. Here's ## Think about your AI form factor early on [09:07] a couple smaller observations that I had that I really wanted to share with you guys. First is that the form factor actually does matter and I wish I spent a little bit more time considering that when building the product. I originally built the agent just on the web for the sake of speed thinking that I'd port it to iOS later, but I should have thought a little bit harder about where people were going to be using this agent. The main use case that I'm seeing so far and even for myself personally is dictating quick commands on my phone on the go. So, I can say something like, "When you create a task for groceries, add bacon, eggs, and paper towels to the list, and it'll just go ahead and do that and create the task with the relevant subtasks for me." These actions are so much nicer with dictation, and it's so much easier on the mobile version. It's a small detail, but it's something I wish I did consider because I could have launched this a little bit earlier, and probably mobile first if I'd realized that sooner. The next observation is ## AI settings vs App Settings [09:54] actually pretty cool. It's that personalization and settings is very different for AI products than traditional software. In traditional software like Ellie, for settings, you can toggle things like when does the week start for you or when do you want to start your day. And these are just drop downs in the settings menu. But for AI products, personalization is a little bit different and actually a lot cooler. For time boxing preferences for the user, instead of having a toggle and drop down for everything, I could just have a text box and have the user input whatever preferences they want. So they can say something like, "I like to go to the gym in the morning. I like to do all my personal tasks after work, and I need a 15-minute break in between each meeting at minimum." Because at the end of the day, what I'm going to do is take this text and inject it into the prompt so that when the AI is coming up with the schedule, it just factors all that stuff in. Maybe I'm alone here, but I thought that was really cool and it made me think a lot more about how software is going to be more personalized in the future. The last observation was how my ## How you can beat ChatGPT [10:46] chat and my agent compared to general tools like chat GPT because a common thing that I hear from people is what's the point of building this if Chat GPT is just going to add this feature or Claude's going to add this feature. I've actually used Claude and Chatpt's calendar integrations to time box and plan my day. And after using both, I can say that the experience in Ellie was completely different than the experience in Chat EPT. Even though in theory they do the same thing. At the time of recording to do something like create a calendar event in Claude, you type in the calendar event you want, but it's going to ask permission to run certain tools. Then it's going to run the tool. Then it's going to confirm with you and then it's going to go create the task. For some reason, it just feels kind of clunky and cumbersome with all these steps. Whereas in Ellie, if I say the exact same thing, it's just going to do it and it's just going to make it happen in one message. I get why they have all these confirmations. They're building for a million users, but we are not them and we can take a little bit more risk than they can. So, in my case, I did feel confident enough to just bypass all of those confirmations and just allow the tool calls to happen automatically. And it really does change the nature of the experience. It feels like there's a lot less friction and makes me want to use it compared to using it in chat GPT or Claude. I think the best way to think about it is it's like a general app versus a hypersp specific app. When I think about general apps versus focused apps that really solve a problem for a specific niche, most cases the niche app probably solves the problem a lot better than the general app. And users usually can feel that. And I think the same thing does apply to AI products. ## Final thoughts and advice :) [12:08] Shipping AI products is pretty hard, but not in the ways that I expected. There was a challenge to make sure that the AI was smart enough and execute things the way that I envisioned. But there were also a lot of consideration like cost, security, the form factor, a lot of these things that I don't hear a lot of people talking about that I wish someone told me earlier. To summarize everything here, if you're building an AI product, I recommend tracking cost from day one. Building preventions to prevent abuse, whether it's intentional or unintentional abuse, remembering and honestly planning to use multiple models from the start, considering what the optimal form factor for your AI is, whether it be mobile or voice or on the web. But really considering that when you're working on a product roadmap, leveraging existing frameworks like the Verscell AI SDK to make sure you're not reinventing the wheel and then figuring out what your edge is going to be when you're building your product, especially comparing against something general like Chat GPT and Claude and figuring out a way to make your app stand out. And the agent that I showed you at the beginning of the video, hopefully by the time that this video is out, it should be launched and you can actually try it yourself if you want. If you're building an AI product, I would love to hear what you're building and some of the problems that you've encountered along the way. If it wasn't for you guys, I would not have found the AI SDK from Versel. Please drop any tips that you have. I read every single comment. And if you like this content, check out my Instagram and Tik Tok. I post almost every other day about building productivity apps. And obviously, if you like this content, don't forget to subscribe. But thank you guys so much for watching and I'll see you guys in the next video. [Music]