Image to Video - MuleRun Docs

Image to Video Generation

curl --request POST \
  --url https://api.mulerun.com/vendors/klingai/v1/kling-v2.6/image-to-video/generation \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data @- <<EOF
{
  "mode": "pro",
  "duration": 5,
  "image": "https://example.com/image.jpg",
  "prompt": "The man <<<voice_1>>> said, 'Hello, welcome to the show.'",
  "sound": "on",
  "voice_list": [
    {
      "voice_id": "voice_id_1"
    }
  ]
}
EOF

{
  "task_info": {
    "id": "8e1e315e-b50d-4334-a231-be7d19a372f4",
    "status": "pending",
    "created_at": "2025-09-21T00:00:00.000Z",
    "updated_at": "2025-09-21T00:00:00.000Z"
  }
}

POST

vendors

klingai

kling-v2.6

image-to-video

generation

Image to Video Generation

curl --request POST \
  --url https://api.mulerun.com/vendors/klingai/v1/kling-v2.6/image-to-video/generation \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data @- <<EOF
{
  "mode": "pro",
  "duration": 5,
  "image": "https://example.com/image.jpg",
  "prompt": "The man <<<voice_1>>> said, 'Hello, welcome to the show.'",
  "sound": "on",
  "voice_list": [
    {
      "voice_id": "voice_id_1"
    }
  ]
}
EOF

{
  "task_info": {
    "id": "8e1e315e-b50d-4334-a231-be7d19a372f4",
    "status": "pending",
    "created_at": "2025-09-21T00:00:00.000Z",
    "updated_at": "2025-09-21T00:00:00.000Z"
  }
}

This API supports Kling v2.6 video generation model with audio and voice generation. Please refer to Kling’s official documentation for more details.

Overview

Generate videos from images using the Kling v2.6 model with built-in audio and custom voice generation support.

Key Features

Image-to-video generation with audio
Standard and Professional quality modes
5s or 10s duration
End frame control (image_tail)
Motion brush support (static_mask, dynamic_masks)
Audio generation support (new in v2.6)
Custom voice generation (new in v2.6)

Image Requirements

Property	Requirement
Formats	JPEG, JPG, PNG
Dimensions	Min 300px for both width and height
Aspect Ratio	Between 1:2.5 and 2.5:1
File Size	Max 10MB
Input	Public URL or Base64 encoded data

Audio & Voice Generation

sound Parameter

on: Generate video with synchronized audio
off: Generate silent video (default)

voice_list Parameter

Reference custom voices in video generation:

Up to 2 voices per task
Use <<<voice_1>>> syntax in prompt to specify voice
Requires sound: "on"

Example Requests

Basic Image-to-Video with Audio

{
  "image": "https://example.com/landscape.jpg",
  "prompt": "Birds chirping as camera pans across the landscape",
  "mode": "std",
  "duration": 5,
  "sound": "on"
}

With Custom Voice

{
  "image": "https://example.com/person.jpg",
  "prompt": "The man <<<voice_1>>> said, 'Hello, welcome to our channel.'",
  "mode": "pro",
  "duration": 5,
  "sound": "on",
  "voice_list": [
    {"voice_id": "your_custom_voice_id"}
  ]
}

With Multiple Voices

{
  "image": "https://example.com/two_people.jpg",
  "prompt": "Person A <<<voice_1>>> says 'Good morning!' and Person B <<<voice_2>>> replies 'Good morning to you too!'",
  "mode": "pro",
  "duration": 10,
  "sound": "on",
  "voice_list": [
    {"voice_id": "voice_id_1"},
    {"voice_id": "voice_id_2"}
  ]
}

With Dynamic Masks

{
  "image": "https://example.com/scene.jpg",
  "prompt": "Object moves along the path with ambient sounds",
  "static_mask": "https://example.com/static_mask.png",
  "dynamic_masks": [
    {
      "mask": "https://example.com/dynamic_mask.png",
      "trajectories": [
        {"x": 100, "y": 200},
        {"x": 300, "y": 400}
      ]
    }
  ],
  "mode": "std",
  "duration": 5,
  "sound": "on"
}

Parameters

sound

Options: on, off
Default: off
Note: Must be on when using voice_list

voice_list

Optional: Yes
Max Items: 2
Description: Custom voice IDs for voice generation
Note: Use voice IDs from the custom voice API, NOT Lip-Sync API

Prompt Voice Syntax

Use <<<voice_N>>> in your prompt to specify which voice speaks:

<<<voice_1>>> - First voice in voice_list
<<<voice_2>>> - Second voice in voice_list (if provided)

Example: The man <<<voice_1>>> said, "Hello."

Authorizations

Authorization

string

header

required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json

image

string

required

Reference Image. Support inputting image Base64 encoding or image URL.

Important: When using Base64 encoding, do not add any prefixes such as data:image/png;base64,. Provide only the Base64-encoded string itself.

Supported image formats: .jpg, .jpeg, .png
Image file size cannot exceed 10MB
Width and height dimensions must not be less than 300px
Aspect ratio should be between 1:2.5 ~ 2.5:1

image_tail

string | null

Reference Image - End frame control. Support inputting image Base64 encoding or image URL.

At least one parameter should be filled in between image and image_tail
image+image_tail and dynamic_masks/static_mask cannot be used at the same time

prompt

string

Positive text prompt. Cannot exceed 2500 characters.

Use <<<voice_1>>> to specify the voice, matching the sequence in voice_list. Example: The man <<<voice_1>>> said, "Hello."

Maximum string length: 2500

negative_prompt

string

Negative text prompt. Cannot exceed 2500 characters.

Maximum string length: 2500

voice_list

object[] | null

List of voices referenced when generating videos.

A video generation task can reference up to 2 voices
When voice_list is not empty and prompt references the voice ID, billing is based on "with voice generation"

Maximum array length: 2

Show child attributes

sound

enum<string>

default:off

Generate audio simultaneously when generating videos.

on: Enable audio generation (required when using voice_list)
off: Disable audio generation (silent video)

Available options:

on,

off

mode

enum<string>

default:std

Video generation mode

std: Standard Mode, which is cost-effective. pro: Professional Mode, generates videos use longer duration but higher quality video output.

Available options:

std,

pro

static_mask

string | null

Static Brush Application Area (Mask image created by users using the motion brush).

Support inputting image Base64 encoding or image URL
The aspect ratio of the mask image must match the input image

dynamic_masks

object[] | null

Dynamic Brush Configuration List. Multiple configurations can be set up (up to 6 groups).

Maximum array length: 6

Show child attributes

duration

enum<integer>

default:5

Video Length in seconds

Available options:

5,

10

Response

202 - application/json

Accepted - Task created successfully

task_info

object

Show child attributes

Text to Video Task Image to Video Task

⌘I

Documentation Index

​Overview

​Key Features

​Image Requirements

​Audio & Voice Generation

​sound Parameter

​voice_list Parameter

​Example Requests

​Basic Image-to-Video with Audio

​With Custom Voice

​With Multiple Voices

​With Dynamic Masks

​Parameters

​sound

​voice_list

​Prompt Voice Syntax

Authorizations

Body

Response

Overview

Key Features

Image Requirements

Audio & Voice Generation

sound Parameter

voice_list Parameter

Example Requests

Basic Image-to-Video with Audio

With Custom Voice

With Multiple Voices

With Dynamic Masks

Parameters

sound

voice_list

Prompt Voice Syntax