Veo 3.0 Video Generation
Generate videos using Google Veo 3.0 model with text or images.
BetaDocumentation Index
Fetch the complete documentation index at: https://mulerun.com/docs/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Generate high-fidelity videos with stunning realism and natively generated audio using Google’s Veo 3.0 model.Key Features
- Text-to-Video: Generate videos from descriptive text prompts
- Image-to-Video: Animate a starting image into a video sequence
- Audio generation: Natively generates synchronized audio with video
- 4K support: Generate videos up to 4K resolution
Supported Configurations
| Aspect Ratio | Resolution | Duration Options | Notes |
|---|---|---|---|
| 16:9 | 720p | 4s, 6s, 8s | All features supported |
| 9:16 | 720p | 4s, 6s, 8s | All features supported |
| 16:9 | 1080p | 8s only | - |
| 9:16 | 1080p | 8s only | - |
| 16:9 | 4k | 8s only | Higher latency and cost |
| 9:16 | 4k | 8s only | Higher latency and cost |
Prompt Writing Tips
For best results, include these elements in your prompt:- Subject: The main focus (object, person, animal, scenery)
- Action: What the subject is doing (walking, running, turning)
- Style: Creative direction (sci-fi, horror film, film noir, cartoon)
- Camera positioning (optional): aerial view, eye-level, dolly shot
- Composition (optional): wide shot, close-up, single-shot
- Ambiance (optional): blue tones, night, warm tones
Audio Prompting
Veo 3.0 can generate synchronized audio. Include audio cues in your prompt:- Dialogue: Use quotes for specific speech (e.g., “This must be the key,” he murmured)
- Sound Effects: Explicitly describe sounds (e.g., tires screeching loudly)
- Ambient Noise: Describe the environment’s soundscape (e.g., a faint, eerie hum)
Example Requests
Text-to-Video
Image-to-Video
4K Generation
Regional Restrictions
In EU, UK, CH, and MENA regions,person_generation is restricted:
- Only
allow_adultis available for all generation modes
Authorizations
Bearer authentication header of the form Bearer <token>, where <token> is your auth token.
Body
Text description for the video. Supports audio cues.
Use descriptive language including:
- Subject (object, person, animal, scenery)
- Action (what the subject is doing)
- Style (sci-fi, horror film, film noir, cartoon, etc.)
- Camera positioning and motion (optional): aerial view, eye-level, dolly shot
- Composition (optional): wide shot, close-up, single-shot
- Ambiance (optional): blue tones, night, warm tones
Audio Prompting:
- Dialogue: Use quotes for specific speech (e.g., "This must be the key," he murmured)
- Sound Effects: Explicitly describe sounds (e.g., tires screeching loudly)
- Ambient Noise: Describe the environment's soundscape (e.g., a faint, eerie hum)
2000Text describing what not to include in the video.
Do not use instructive language like "no" or "don't". Instead, describe what you don't want to see (e.g., "wall, frame" instead of "No walls").
500Initial image to animate (first frame). Can be a URL or Base64 encoded data.
Format for Base64: data:image/png;base64,{base64_data}
Supported formats: JPEG, JPG, PNG, BMP, WEBP Max file size: 20MB
Video aspect ratio (width:height).
16:9, 9:16 Video resolution.
Note: 1080p and 4k only support 8 second duration.
720p, 1080p, 4k Length of the generated video in seconds.
Note: Must be 8 when using 1080p or 4k resolution.
4, 6, 8 Response
Accepted - Task created successfully