Veo 3.0 Video Generation

Create Generation Task

curl --request POST \
  --url https://api.mulerun.com/vendors/google/v1/veo-3.0/generation \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "prompt": "A serene sunset over a calm ocean, with gentle waves lapping against the shore",
  "negative_prompt": "blurry, low quality, pixelated",
  "aspect_ratio": "16:9",
  "resolution": "1080p",
  "duration": 8
}
'

{
  "task_info": {
    "id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
    "created_at": "2023-11-07T05:31:56Z",
    "updated_at": "2023-11-07T05:31:56Z"
  }
}

POST

vendors

google

veo-3.0

generation

Create Generation Task

curl --request POST \
  --url https://api.mulerun.com/vendors/google/v1/veo-3.0/generation \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "prompt": "A serene sunset over a calm ocean, with gentle waves lapping against the shore",
  "negative_prompt": "blurry, low quality, pixelated",
  "aspect_ratio": "16:9",
  "resolution": "1080p",
  "duration": 8
}
'

{
  "task_info": {
    "id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
    "created_at": "2023-11-07T05:31:56Z",
    "updated_at": "2023-11-07T05:31:56Z"
  }
}

Beta

This model is currently in public testing. Not everyone has access, and API requests may also be unstable.

Overview

Generate high-fidelity videos with stunning realism and natively generated audio using Google’s Veo 3.0 model.

Key Features

Text-to-Video: Generate videos from descriptive text prompts
Image-to-Video: Animate a starting image into a video sequence
Audio generation: Natively generates synchronized audio with video
4K support: Generate videos up to 4K resolution

Veo 3.0 does not support reference images, frame interpolation (last_frame), or video extension. For these features, use Veo 3.1.

Supported Configurations

Aspect Ratio	Resolution	Duration Options	Notes
16:9	720p	4s, 6s, 8s	All features supported
9:16	720p	4s, 6s, 8s	All features supported
16:9	1080p	8s only	-
9:16	1080p	8s only	-
16:9	4k	8s only	Higher latency and cost
9:16	4k	8s only	Higher latency and cost

Prompt Writing Tips

For best results, include these elements in your prompt:

Subject: The main focus (object, person, animal, scenery)
Action: What the subject is doing (walking, running, turning)
Style: Creative direction (sci-fi, horror film, film noir, cartoon)
Camera positioning (optional): aerial view, eye-level, dolly shot
Composition (optional): wide shot, close-up, single-shot
Ambiance (optional): blue tones, night, warm tones

Audio Prompting

Veo 3.0 can generate synchronized audio. Include audio cues in your prompt:

Dialogue: Use quotes for specific speech (e.g., “This must be the key,” he murmured)
Sound Effects: Explicitly describe sounds (e.g., tires screeching loudly)
Ambient Noise: Describe the environment’s soundscape (e.g., a faint, eerie hum)

Example Requests

Text-to-Video

{
  "prompt": "A serene sunset over a calm ocean, with gentle waves lapping against the shore",
  "negative_prompt": "blurry, low quality, pixelated",
  "aspect_ratio": "16:9",
  "resolution": "1080p",
  "duration": 8
}

Image-to-Video

{
  "prompt": "The character starts walking forward slowly",
  "image": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUg...",
  "aspect_ratio": "9:16",
  "resolution": "720p",
  "duration": 6
}

4K Generation

{
  "prompt": "A stunning drone view of the Grand Canyon during sunset",
  "aspect_ratio": "16:9",
  "resolution": "4k",
  "duration": 8
}

Regional Restrictions

In EU, UK, CH, and MENA regions, person_generation is restricted:

Only allow_adult is available for all generation modes

Authorizations

Authorization

string

header

required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json

prompt

string

required

Text description for the video. Supports audio cues.

Use descriptive language including:

Subject (object, person, animal, scenery)
Action (what the subject is doing)
Style (sci-fi, horror film, film noir, cartoon, etc.)
Camera positioning and motion (optional): aerial view, eye-level, dolly shot
Composition (optional): wide shot, close-up, single-shot
Ambiance (optional): blue tones, night, warm tones

Audio Prompting:

Dialogue: Use quotes for specific speech (e.g., "This must be the key," he murmured)
Sound Effects: Explicitly describe sounds (e.g., tires screeching loudly)
Ambient Noise: Describe the environment's soundscape (e.g., a faint, eerie hum)

Maximum string length: 2000

negative_prompt

string

Text describing what not to include in the video.

Do not use instructive language like "no" or "don't". Instead, describe what you don't want to see (e.g., "wall, frame" instead of "No walls").

Maximum string length: 500

image

string | null

Initial image to animate (first frame). Can be a URL or Base64 encoded data.

Format for Base64: data:image/png;base64,{base64_data}

Supported formats: JPEG, JPG, PNG, BMP, WEBP Max file size: 20MB

aspect_ratio

enum<string>

default:16:9

Video aspect ratio (width:height).

Available options:

16:9,

9:16

resolution

enum<string>

default:720p

Video resolution.

Note: 1080p and 4k only support 8 second duration.

Available options:

720p,

1080p,

4k

duration

enum<integer>

default:8

Length of the generated video in seconds.

Note: Must be 8 when using 1080p or 4k resolution.

Available options:

4,

6,

8

Response

202 - application/json

Accepted - Task created successfully

task_info

object

Show child attributes

Get Sora 2 Pro Generation Task Get Veo 3.0 Generation Task

⌘I

Documentation Index

​Overview

​Key Features

​Supported Configurations

​Prompt Writing Tips

​Audio Prompting

​Example Requests

​Text-to-Video

​Image-to-Video

​4K Generation

​Regional Restrictions

Authorizations

Body

Response

Overview

Key Features

Supported Configurations

Prompt Writing Tips

Audio Prompting

Example Requests

Text-to-Video

Image-to-Video

4K Generation

Regional Restrictions