Image to Video
Generate videos from images using the Kling v2.6 model with audio and voice generation support.
Documentation Index
Fetch the complete documentation index at: https://mulerun.com/docs/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Generate videos from images using the Kling v2.6 model with built-in audio and custom voice generation support.Key Features
- Image-to-video generation with audio
- Standard and Professional quality modes
- 5s or 10s duration
- End frame control (image_tail)
- Motion brush support (static_mask, dynamic_masks)
- Audio generation support (new in v2.6)
- Custom voice generation (new in v2.6)
Image Requirements
| Property | Requirement |
|---|---|
| Formats | JPEG, JPG, PNG |
| Dimensions | Min 300px for both width and height |
| Aspect Ratio | Between 1:2.5 and 2.5:1 |
| File Size | Max 10MB |
| Input | Public URL or Base64 encoded data |
Audio & Voice Generation
sound Parameter
on: Generate video with synchronized audiooff: Generate silent video (default)
voice_list Parameter
Reference custom voices in video generation:- Up to 2 voices per task
- Use
<<<voice_1>>>syntax in prompt to specify voice - Requires
sound: "on"
Example Requests
Basic Image-to-Video with Audio
With Custom Voice
With Multiple Voices
With Dynamic Masks
Parameters
sound
- Options:
on,off - Default:
off - Note: Must be
onwhen using voice_list
voice_list
- Optional: Yes
- Max Items: 2
- Description: Custom voice IDs for voice generation
- Note: Use voice IDs from the custom voice API, NOT Lip-Sync API
Prompt Voice Syntax
Use<<<voice_N>>> in your prompt to specify which voice speaks:
<<<voice_1>>>- First voice in voice_list<<<voice_2>>>- Second voice in voice_list (if provided)
The man <<<voice_1>>> said, "Hello."Authorizations
Bearer authentication header of the form Bearer <token>, where <token> is your auth token.
Body
Reference Image. Support inputting image Base64 encoding or image URL.
Important: When using Base64 encoding, do not add any prefixes such as data:image/png;base64,. Provide only the Base64-encoded string itself.
- Supported image formats: .jpg, .jpeg, .png
- Image file size cannot exceed 10MB
- Width and height dimensions must not be less than 300px
- Aspect ratio should be between 1:2.5 ~ 2.5:1
Reference Image - End frame control. Support inputting image Base64 encoding or image URL.
- At least one parameter should be filled in between
imageandimage_tail image+image_tailanddynamic_masks/static_maskcannot be used at the same time
Positive text prompt. Cannot exceed 2500 characters.
Use <<<voice_1>>> to specify the voice, matching the sequence in voice_list.
Example: The man <<<voice_1>>> said, "Hello."
2500Negative text prompt. Cannot exceed 2500 characters.
2500List of voices referenced when generating videos.
- A video generation task can reference up to 2 voices
- When voice_list is not empty and prompt references the voice ID, billing is based on "with voice generation"
2Generate audio simultaneously when generating videos.
on: Enable audio generation (required when using voice_list)off: Disable audio generation (silent video)
on, off Video generation mode
std: Standard Mode, which is cost-effective.
pro: Professional Mode, generates videos use longer duration but higher quality video output.
std, pro Static Brush Application Area (Mask image created by users using the motion brush).
- Support inputting image Base64 encoding or image URL
- The aspect ratio of the mask image must match the input image
Dynamic Brush Configuration List. Multiple configurations can be set up (up to 6 groups).
6Video Length in seconds
5, 10 Response
Accepted - Task created successfully