Cartoon 2D

An experiment in turning AI-generated SVGs into something animated, reusable, and actually useful for storytelling.

This post was written as part of my entry for the Gemini Live Agent Challenge. #GeminiLiveAgentChallenge


What originally inspired me was actually a very simple question.

Models were getting surprisingly good at two things at once: generating SVGs and understanding text. That made me wonder whether those two strengths could be combined into something much bigger. If a model can understand a story and also generate vector characters and scenes, could those SVGs become animated too? And if they could, could that lead to something beyond just making moving images and turn into a real cartoon-making system?

That question eventually turned into Cartoon 2D.

AI should direct the cartoon. The engine should make it playable.

The Real Problem

A lot of AI creative tools are excellent at giving you one impressive result.

But once you try to make something longer, the cracks start to show. Characters drift. Motion gets weird. Timing becomes inconsistent. Costs rise because too much has to be regenerated from scratch. And even when the output looks good in a still frame, it may stop feeling good the second it starts moving.

That was probably the most important realization in the project: making something that looks nice is not the same as making something that animates well.

And if the goal is long-form storytelling, that difference matters a lot.

Why Long-Form Cartoons Make Sense

I was not especially interested in building a one-shot clip generator. What felt more interesting was the possibility of building something that could support longer stories.

Long-form cartoons are actually a very good fit for this kind of system, because they naturally have recurring characters and lots of dialogue. That means a lot of expensive work only has to be done once.

If you can rig a character once, stabilize their design once, assign them a voice once, and build a reusable set of motion patterns once, the whole system becomes much cheaper and much more predictable. Instead of regenerating everything from zero in every new scene, you start reusing the same actor across the story.

That is a big part of why this project made sense to me. Reuse is not just a bonus. It is the whole economic advantage.

The Architecture Shift

One of the biggest lessons from earlier attempts (ADK + bidi-streaming) was that animation does not just need generation. It needs control.

You need a timeline. You need scrubbing. You need previews. You need to be able to fix one scene without destroying the next. You need to reuse a character without reinventing them every time they appear.

That led to the architectural decision that made the whole project start working:

AI should describe intent. Deterministic code should execute it.

Once I stopped treating the model like a system that should directly output final animation and started treating it more like a creative director, the project became much more coherent. The model could focus on structure, description, scene intent, motion intent, and assets. The runtime could focus on validation, compilation, playback, reuse, and export.

How Cartoon 2D Works

The workflow starts with a text prompt.

Gemini generates structured scene beats, comic-style scene images, dialogue cues, and motion intent. From there, the system turns those outputs into something the runtime can actually use.

Characters are turned into SVG rigs with pivots, bones, views, and limits. Motion intent gets compiled into playable clips using a deterministic TypeScript runtime built around a canonical IK graph. Dialogue goes through Google Cloud Text-to-Speech, which then feeds timing and lip-sync data back into the animation layer. Scenes land on a timeline where they can be previewed, adjusted, and exported.

The important thing is that the model is not being asked to directly produce a finished long cartoon frame by frame. It is being asked to produce creative direction and structured intent. The engine then has to make that intent playable.

That was the only way I found to make the system feel less like a prompt demo and more like actual software.

The Newest Models Really Helped

One thing that became very obvious while building this is that the newest models genuinely matter.

For rigging especially, the latest Gemini Pro models were a real unlock. They did a noticeably better job at returning clean, good-looking SVG assets from image references. That mattered more than I expected, because weak rig output makes every downstream problem harder. If the character starts from a bad place, animation gets harder, cleanup gets harder, and consistency gets harder.

So for this kind of system, using the strongest models really did improve the result in a meaningful way.

The Hardest Part Is Still Animation

The biggest struggle in the whole project was animation quality.

And honestly, it still is.

A character can look great in a generated image and still move in a way that feels awkward, floaty, stiff, or just plain wrong when it starts playing back. That gap between “good image” and “good animation” is huge.

So I do not want to present this as some magical workflow where you paste in a full cartoon script and instantly get a polished finished episode. It is not that. Not yet.

Right now, the system works best step by step:

  • scene by scene
  • actor by actor
  • clip by clip

That is the honest version.

And I think that is fine, because animation quality is hard and deserves real attention. If more focused research had gone into the animation layer itself, I think the result could have been much stronger. In many ways, that still feels like the biggest area for improvement.

What Was Difficult Besides Animation

Consistency was another major challenge.

At first, characters came back looking different in every scene. The fix was an identity-lock system: save reference portraits for actors, then feed those back into later generations so Gemini has a stable visual anchor.

Rigging was also messy. Sometimes generated characters came back structurally weak, missing pieces, over-segmented, or arranged in ways that were hard to animate. To keep the pipeline usable, I added deterministic cleanup and repair passes that normalize the rig output before playback.

The editor itself was also a big part of the work. The timeline UI, scrubbing, audio placement, scene previews, and export logic were not “just AI problems.” They were product and engineering problems, and they took a lot of iteration.

Lip sync added another layer. Dialogue audio is generated with Google Cloud Text-to-Speech, and timing data is converted into visemes that drive SVG mouth shapes or jaw motion. That meant the audio system, the rigging system, and the playback system all had to agree on the same structure.

How It Was Built

This project was built in a very iterative way, and AI-assisted development was a real part of that process.

Antigravity, Codex, and Claude all helped at different points with architecture exploration, implementation, refactoring, debugging, and moving through ideas more quickly. That made it easier to test different directions and keep momentum while building.

But those tools did not remove the hard part. They mostly helped me get to the hard part faster.

The real work was still deciding where AI should be trusted, where deterministic code was necessary, and how much structure the system needed in order to stay usable.

One Direction That Could Improve It a Lot

One idea I keep coming back to is that the system could become much stronger if more of the animation vocabulary was predefined up front.

For example, instead of asking the model to invent every object class and every kind of motion from scratch, a better system might predefine more of that world in advance:

  • known object types
  • known rig families
  • known animation categories
  • known motion libraries

That would make the system more deterministic, and I think it would also provide the quality bump the project still needs.

In other words, one of the best next steps may not be “let the model invent everything,” but “let the model choose intelligently from a stronger predefined animation language.”

That feels like a very practical path forward.

The Google Stack

The project leans heavily on Google’s ecosystem.

Gemini is the creative layer. Gemini 3.1 Flash Image Preview is used for storyboard and scene image generation. Gemini 3.1 Pro Preview is used for heavier reasoning tasks like rig drafting, motion planning, and structured asset generation.

Google Cloud Text-to-Speech is used for dialogue generation and timing data that drives lip sync.

The application itself is deployed on Google Cloud Run, and server-side FFmpeg is used to package rendered frames and audio into MP4 output.

The API integration side was actually fairly smooth. The harder engineering work was in the translation layer: taking creative, probabilistic output and turning it into something stable enough to animate and edit.

What I Learned

The biggest lesson from this project is that architecture matters more than novelty.

The key question was never just “which model should I call?” The real question was: what should the model be responsible for, and what should code be responsible for?

Once that split became clear, the project improved in every direction. It became more controllable, more reusable, more predictable, and much easier to think about.

I also came away believing even more strongly that guardrails are not the enemy of creativity. In production systems, they are often what makes creativity useful.

And maybe most importantly, I learned that in animation it is better to block bad output than to pretend it is good.

Where I Think This Is Going

Even with all the rough edges, I think the prototype turned out well.

It no longer feels like a toy prompt demo. It feels like the beginning of a real direction: AI-assisted cartoon production where the model handles creative intent and the runtime handles validation, reuse, playback, and editing.

That is the part I find exciting.

Not the idea of replacing animation with prompting, but the idea of building animation software that becomes dramatically faster and more useful because AI is embedded in the right places.

That is what Cartoon 2D is trying to do.

AI should direct the cartoon. The engine should make it playable.


One thing that did not make it into the final version was the SFX system. It actually worked pretty well in free Google Colab and even locally on my MacBook.

The setup looked like this:

!pip install fastapi uvicorn pydantic diffusers transformers accelerate soundfile pyngrok nest-asyncio torchsde

The real blocker was deployment cost. To run it properly, I really needed a dedicated T4-class GPU, and that was more than I could justify for this prototype at the moment. So for now, sound effects are not generated in the final deployed version.

Distribution for Google Play and App Store promo codes

Developers, you get a generous amount of promo codes from the App Store and Google Play, and sharing these for my own apps/games was the #1 way I got early users.

However, managing them in spreadsheets is incredibly tedious and not fun for users. I built a tool to automate the process so you can focus on the product.

Developers, your first campaign is on the house.
Users can just go and claim the codes without any hassle.

https://proffer.codes

5Y Habit Tracker: Art Edition

Does your current habit tracker feel like a chore? It’s time to make self-improvement look as good as it feels.

With 5Y Habit Tracker: Art Edition, every habit you complete is a brushstroke on a larger canvas. Designed for visual thinkers and long-term planners, 5Y allows you to unlock a unique piece of fine art for every year you remain dedicated.

How it works:
1. Define Your Themes: Categorize your habits into 10 life themes (each theme with 5 famous artworks / 5 years).
2. Stay Consistent: Track your stats simply and effectively.
3. Reveal the Beauty: Every 365 days, a new masterpiece is yours.

Plus, we’ve integrated a Private AI companion. Whether you need to clear your head or celebrate a win, you can chat without judgment. Your data stays on your device—no clouds, no prying eyes.

Build a gallery of success over the next five years. Download 5Y today.

Big Christmas Giveaway

Last giveaway of the year!

I wanted to wrap up 2025 by giving back to the community. It’s been a busy year for me as a solo developer; I managed to release two brand new titles and finally finished a massive consolidation update for my puzzle series.

If you are looking for something new to play over the holidays, I’d love to share some promo codes. Just let me know which game interests you!

The New 2025 Releases:

+ Trivia Player: This is for the hardcore trivia fans. I built this because I was tired of multiple-choice quizzes that hold your hand. This is authentic trivia—real-time chat battles where you have to actually type the answer. No luck, just pure knowledge.

+ AI Movie Quiz: I used AI to generate unique, slightly twisted clips and images based on famous movies. You have to guess the film from the AI’s “dream.” It’s a surreal take on the classic movie quiz format.

The Big Update:

Subliminal Words: I spent a lot of time this year merging my other games (Subliminal Faces and Subliminal Football Quiz) into this one main game. It’s a visual puzzle game where words, faces, and objects are cleverly camouflaged in the image (think “Magic Eye” meets word search).

The Classics (Also available):

I also have codes left for my older arcade and puzzle games if you want to check out my back catalog:

+ Palindrome
+ Dice Guess
+ Snackroach
+ Penalty 2D

Also Slopper on Android!

How to get a code: Just send a mail to indrekl[at]gmail.com and tell me what code do you want and I’ll send you one while the supplies last!

Happy Holidays!

App Store Link
Google Play Link

regards,
🧑‍🎄

Slopper: Private AI Replies

Replying to posts, joining conversations, and staying active. Who has the time? Slopper uses a powerful floating bubble to help you generate countless high-quality, relevant replies in any app. Engage with 10x more posts on X (Twitter), Instagram, Reddit, and more. Best of all, it runs 100% on your device. No servers, no tracking, and no one else training on your data. Boost your engagement without ever sacrificing your privacy.

AI Movie Quiz

Test your movie knowledge with AI-generated clips inspired by legendary films. Watch unique visual sequences and guess which iconic movies they represent, from classic Hollywood blockbusters to modern masterpieces.

Features:

  • AI-created movie clips and scenes
  • Progressive difficulty levels
  • Hint system
  • Covers decades of cinema across all genres
  • Regular updates with new movies

Perfect for movie buffs and trivia lovers. Challenge yourself or compete with friends to see who knows cinema best!

 

Trivia Player

Tired of quizzes that treat you like you need your hand held with obvious answer choices?

This quiz respects your intelligence. It’s authentic trivia – the kind you’d find at intense pub quiz nights where you and your friends battle it out with pure knowledge. Our players don’t mess around; for ‘What’s the capital of Australia?’, they just type ‘Canberra’ cold. No hints needed, no second guesses about Sydney. If you’re confident in your knowledge and ready to put your thinking skills to the test, then step right up!

Features
• Compete against other players in real-time.
• Huge amount of unique questions: Always something new to learn.
• Leaderboards for question and round wins.
• Light/Dark mode with multiple color themes.
• 24/7: Enjoy endless fun anytime.

Subliminal Words

Discover all the hidden words that are cleverly camouflaged, and with each level, unlock a sentence of an unfolding story.

Like those magic eye pictures that used to hold our gaze, this game has subliminal words that emerge from the chaos, forming part of the image—a secret message that becomes clear as you look “closer”.

FEATURES
• Complete 200 Levels
• Unlock 12 unique achievements
• Unlock the story – every level you complete unlocks a sentence in an overarching story
• Activate the hint ‘Highlight’ to reveal a letter
• Activate the hint ‘Manipulate’ to use zoom in/out and rotate
• Activate the hint ‘Remove 3’ to remove letters that are not needed
• Choose Your favourite color theme (red, yellow, green, blue, purple)
• Enjoy a clean and intuitive UI

 

Dice Guess

or mobile

Roll the dice along the track (in your mind) and guess which number will be on top when it reaches the last platform. Train your brain? I’m not really sure if it trains it but it is a solid brain teaser. Ask your math friend for more info.

FEATURES
● 50 free levels
● 150 premium levels
● Leaderboard
● No ads

Snackroach

Help out Your distant relative Snackroach! Help the little guy to reach the food gate. Make as few taps as possible. Avoid enemies, obstacles and falling rocks!

FEATURES
• Levels with smart puzzles
• Levels that require you to be fast
• Novel obstacles, enemies and falling rocks
• Enjoyable character animations
• Hats
• Progressively increasing difficulty
• 69 free levels
• 12 achievements
• Global leaderboard
• Progress saved to cloud
• Easy to start!

HOW TO PLAY
• Tap the location where You want to move
• Score high by making fewer taps