Starting today, you can generate short videos from text prompts in Tripplet. The feature is available to all Pro and Max users. Here's what it can do, what it can't do yet, and where we're taking it.
What's available today
Video generation supports durations from 2 to 10 seconds, aspect ratios of 16:9, 9:16, and 1:1, and motion styles including smooth, dynamic, and cinematic. You write a text prompt describing the scene and motion — "a mountain lake at sunset, slow zoom in, cinematic" — and receive an MP4 in approximately 15-30 seconds.
Like image generation, we run prompt enhancement through Suzhou 3.1 before passing to the generation model. This significantly improves output quality for vague or short prompts.
Current limitations
Video generation is compute-intensive, and at launch we're limiting each Pro user to 20 videos per day and Max users to unlimited. We expect to relax these limits as we scale infrastructure.
The current model handles simple scenes well but struggles with complex multi-object interactions, text within video, and accurate human faces. We're not hiding these limitations — the output you get reflects the state of the art. We'll improve it.
What's next
In the next 60 days, we're working on: longer durations (up to 30 seconds), video-to-video editing (upload a clip and describe a transformation), and image-to-video (animate a still image). We're also working on quality improvements to the base model.
Video generation is a space that's moving faster than any other modality right now. We're committed to staying at the frontier.