Sora 2 Video Generation
Generate videos using the Sora 2 model from text prompts or with image references.
BetaDocumentation Index
Fetch the complete documentation index at: https://mulerun.com/docs/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Sora 2 is OpenAI’s video generation model capable of creating richly detailed, dynamic clips from natural language prompts or image references. Built on years of research into multimodal diffusion and trained on diverse visual data, Sora brings a deep understanding of 3D space, motion, and scene continuity to video generation.Key Features
- Text-to-video and image-to-video generation
- Fast generation speed, ideal for rapid iteration
- Good quality results suitable for social media content and prototypes
Supported Resolutions
| Size | Aspect Ratio | Use Case |
|---|---|---|
| 720x1280 | 9:16 | Vertical/Portrait (mobile, social media) |
| 1280x720 | 16:9 | Horizontal/Landscape (standard video) |
| 1024x1792 | 9:16 | Tall Portrait (extended vertical) |
| 1792x1024 | 16:9 | Wide Landscape (cinematic) |
Duration Options
Videos can be generated in three duration options:- 4 seconds: Quick clips (default)
- 8 seconds: Standard duration
- 12 seconds: Extended clips
Effective Prompting
For best results, describe shot type, subject, action, setting, and lighting. For example:- “Wide shot of a child flying a red kite in a grassy park, golden hour sunlight, camera slowly pans upward.”
- “Close-up of a steaming coffee cup on a wooden table, morning light through blinds, soft depth of field.”
Content Restrictions
The API enforces several content restrictions:- Only content suitable for audiences under 18
- Copyrighted characters and copyrighted music will be rejected
- Real people—including public figures—cannot be generated
- Input images with faces of humans are currently rejected
Example Requests
Text-to-Video
Image-to-Video (with Reference)
Authorizations
Bearer authentication header of the form Bearer <token>, where <token> is your auth token.
Body
Text description for the video. For best results, describe:
- Shot type (wide shot, close-up, etc.)
- Subject (what is the main focus)
- Action (what is happening)
- Setting (where the action takes place)
- Lighting (time of day, mood)
Example: "Wide shot of a child flying a red kite in a grassy park, golden hour sunlight, camera slowly pans upward."
2000Optional image reference that guides generation. Can be a URL or Base64 encoded data.
Format for Base64: data:image/jpeg;base64,{base64_data}
Supported formats: image/jpeg, image/png, image/webp Image resolution must match the target video's size parameter Max file size: 10MB
Clip duration in seconds.
4, 8, 12 Output resolution formatted as width x height.
720x1280, 1280x720, 1024x1792, 1792x1024 Response
Accepted - Task created successfully