Kling AI Provider

The Kling AI provider contains support for Kling AI's video generation models, including text-to-video, image-to-video, motion control, and multi-shot video generation.

Setup

The Kling AI provider is available in the @ai-sdk/klingai module. You can install it with

pnpm add @ai-sdk/klingai

Provider Instance

You can import the default provider instance klingai from @ai-sdk/klingai:

import { klingai } from '@ai-sdk/klingai';

If you need a customized setup, you can import createKlingAI from @ai-sdk/klingai and create a provider instance with your settings:

import { createKlingAI } from '@ai-sdk/klingai';
const klingai = createKlingAI({
accessKey: 'your-access-key',
secretKey: 'your-secret-key',
});

You can use the following optional settings to customize the Kling AI provider instance:

  • accessKey string

    Kling AI access key. Defaults to the KLINGAI_ACCESS_KEY environment variable.

  • secretKey string

    Kling AI secret key. Defaults to the KLINGAI_SECRET_KEY environment variable.

  • baseURL string

    Use a different URL prefix for API calls, e.g. to use proxy servers. The default prefix is https://api-singapore.klingai.com.

  • headers Record<string,string>

    Custom headers to include in the requests.

  • fetch (input: RequestInfo, init?: RequestInit) => Promise<Response>

    Custom fetch implementation. You can use it as a middleware to intercept requests, or to provide a custom fetch implementation for e.g. testing.

Video Models

You can create Kling AI video models using the .video() factory method. For more on video generation with the AI SDK see generateVideo().

This provider currently supports three video generation modes: text-to-video, image-to-video, and motion control.

Not all options are supported by every model version and mode combination. See the KlingAI Capability Map for detailed compatibility across models.

Text-to-Video

Generate videos from text prompts:

import { klingai, type KlingAIVideoModelOptions } from '@ai-sdk/klingai';
import { experimental_generateVideo as generateVideo } from 'ai';
const { videos } = await generateVideo({
model: klingai.video('kling-v2.6-t2v'),
prompt: 'A chicken flying into the sunset in the style of 90s anime.',
aspectRatio: '16:9',
duration: 5,
providerOptions: {
klingai: {
mode: 'std',
} satisfies KlingAIVideoModelOptions,
},
});

Image-to-Video

Generate videos from a start frame image with an optional text prompt. The popular start+end frame feature is available via the imageTail option:

import { klingai, type KlingAIVideoModelOptions } from '@ai-sdk/klingai';
import { experimental_generateVideo as generateVideo } from 'ai';
const { videos } = await generateVideo({
model: klingai.video('kling-v2.6-i2v'),
prompt: {
image: 'https://example.com/start-frame.png',
text: 'The cat slowly turns its head and blinks',
},
duration: 5,
providerOptions: {
klingai: {
// Pro mode required for start+end frame control
mode: 'pro',
// Optional: end frame image
imageTail: 'https://example.com/end-frame.png',
} satisfies KlingAIVideoModelOptions,
},
});

Multi-Shot Video Generation

Generate videos with multiple storyboard shots, each with its own prompt and duration (Kling v3.0+):

import { klingai, type KlingAIVideoModelOptions } from '@ai-sdk/klingai';
import { experimental_generateVideo as generateVideo } from 'ai';
const { videos } = await generateVideo({
model: klingai.video('kling-v3.0-t2v'),
prompt: '',
aspectRatio: '16:9',
duration: 10,
providerOptions: {
klingai: {
mode: 'pro',
multiShot: true,
shotType: 'customize',
multiPrompt: [
{
index: 1,
prompt: 'A sunrise over a calm ocean, warm golden light.',
duration: '4',
},
{
index: 2,
prompt: 'A flock of seagulls take flight from the beach.',
duration: '3',
},
{
index: 3,
prompt: 'Waves crash against rocky cliffs at sunset.',
duration: '3',
},
],
sound: 'on',
} satisfies KlingAIVideoModelOptions,
},
});

Multi-shot also works with image-to-video by combining a start frame image with per-shot prompts.

Motion Control

Generate video by transferring motion from a reference video to a character image:

import { klingai, type KlingAIVideoModelOptions } from '@ai-sdk/klingai';
import { experimental_generateVideo as generateVideo } from 'ai';
const { videos } = await generateVideo({
model: klingai.video('kling-v2.6-motion-control'),
prompt: {
image: 'https://example.com/character.png',
text: 'The character performs a smooth dance move',
},
providerOptions: {
klingai: {
videoUrl: 'https://example.com/reference-motion.mp4',
characterOrientation: 'image',
mode: 'std',
} satisfies KlingAIVideoModelOptions,
},
});

Video Provider Options

The following provider options are available via providerOptions.klingai. Options vary by mode — see the KlingAI Capability Map for per-model support.

Common Options

  • mode 'std' | 'pro'

    Video generation mode. 'std' is cost-effective. 'pro' produces higher quality but takes longer.

  • pollIntervalMs number

    Polling interval in milliseconds for checking task status. Defaults to 5000.

  • pollTimeoutMs number

    Maximum wait time in milliseconds for video generation. Defaults to 600000 (10 minutes).

Text-to-Video and Image-to-Video Options

  • negativePrompt string

    A description of what to avoid in the generated video (max 2500 characters).

  • sound 'on' | 'off'

    Whether to generate audio simultaneously. Only V2.6 and subsequent models support this, and requires mode: 'pro'.

  • cfgScale number

    Flexibility in video generation. Higher values mean stronger prompt adherence. Range: [0, 1]. Not supported by V2.x models.

  • cameraControl object

    Camera movement control with a type preset ('simple', 'down_back', 'forward_up', 'right_turn_forward', 'left_turn_forward') and optional config with horizontal, vertical, pan, tilt, roll, zoom values (range: [-10, 10]).

  • multiShot boolean

    Enable multi-shot video generation (Kling v3.0+). When true, the video is split into up to 6 storyboard shots with individual prompts and durations.

  • shotType 'customize' | 'intelligence'

    Storyboard method for multi-shot generation. 'customize' uses multiPrompt for user-defined shots. 'intelligence' lets the model auto-segment based on the main prompt. Required when multiShot is true.

  • multiPrompt Array<{index, prompt, duration}>

    Per-shot details for multi-shot generation. Each shot has an index (number), prompt (string, max 512 characters), and duration (string, in seconds). Shot durations must sum to the total duration. Required when multiShot is true and shotType is 'customize'.

  • voiceList Array<{voice_id: string}>

    Voice references for voice control (Kling v3.0+). Up to 2 voices. Reference via <<<voice_1>>> template syntax in the prompt. Requires sound: 'on'. Cannot coexist with elementList on the I2V endpoint.

Image-to-Video Only Options

  • imageTail string

    End frame image for start+end frame control. Accepts an image URL or raw base64-encoded data. Requires mode: 'pro' for most models.

  • staticMask string

    Static brush mask image for motion brush. Accepts an image URL or raw base64-encoded data.

  • dynamicMasks Array

    Dynamic brush configurations for motion brush. Up to 6 groups, each with a mask (image URL or base64) and trajectories (array of {x, y} coordinates).

  • elementList Array<{element_id: number}>

    Reference elements for element control (Kling v3.0+ I2V). Supports video character elements and multi-image elements. Up to 3 reference elements. Cannot coexist with voiceList.

Motion Control Only Options

  • videoUrl string (required)

    URL of the reference motion video. Supports .mp4/.mov, max 100MB, duration 3–30 seconds.

  • characterOrientation 'image' | 'video' (required)

    Orientation of the characters in the generated video. 'image' matches the reference image orientation (max 10s video). 'video' matches the reference video orientation (max 30s video).

  • keepOriginalSound 'yes' | 'no'

    Whether to keep the original sound from the reference video. Defaults to 'yes'.

  • watermarkEnabled boolean

    Whether to generate watermarked results simultaneously.

Video generation is an asynchronous process that can take several minutes. Consider setting pollTimeoutMs to at least 10 minutes (600000ms) for reliable operation.

Video Model Capabilities

Text-to-Video

ModelDescription
kling-v3.0-t2vLatest v3.0, multi-shot, voice control, sound (3-15s)
kling-v2.6-t2vV2.6, sound in pro mode
kling-v2.5-turbo-t2vOptimized for speed, std and pro
kling-v2.1-master-t2vHigh-quality generation, pro only
kling-v2-master-t2vMaster-quality generation
kling-v1.6-t2vV1.6 generation, std and pro
kling-v1-t2vOriginal V1 model, supports camera control (std)

Image-to-Video

ModelDescription
kling-v3.0-i2vLatest v3.0, multi-shot, element/voice control, sound (3-15s)
kling-v2.6-i2vV2.6, sound and end-frame in pro mode
kling-v2.5-turbo-i2vOptimized for speed, end-frame in pro
kling-v2.1-master-i2vHigh-quality generation, pro only
kling-v2.1-i2vV2.1 generation, end-frame in pro
kling-v2-master-i2vMaster-quality generation
kling-v1.6-i2vV1.6 generation, end-frame in pro
kling-v1.5-i2vV1.5 generation, end-frame and motion brush in pro
kling-v1-i2vOriginal V1 model, end-frame and motion brush in std/pro

Motion Control

ModelDescription
kling-v2.6-motion-controlTransfers motion from a reference video to a character image

You can also pass any available provider model ID as a string if needed.