Model

Prompt

Translate Prompt

0 / 2500

Reference image

Upload Image

JPG/JPEG/PNG (max 10MB)

Reference image defines characters, background, and other elements. Size needs to be ≥300px, aspect ratio 2:5–5:2.

Reference video

Click to upload or drag and drop

MP4, MOV (max 50MB)

The character actions in the generated video follow the reference video.

Character orientation

Output quality

Before

After

Kling Motion Control — Copy the Move, Keep Your Character

Kling Motion Control is motion transfer made practical: feed it a video of someone moving, a picture of who should be moving, and a line about the scene — and it returns your character performing that exact movement, expressions and camera included. This page runs both Kling generations side by side, with reference videos up to thirty seconds at 720p or 1080p. Below: how the three inputs divide the work, which motions transfer cleanly, and the framing rule that decides most results — drawn from Kuaishou's official guides and field testing.

Full-Body Motion Sync

Precise Hand Control

Up to 30s Videos

720p & 1080p Output

Reference Image + Video

Fast Generation

Three Inputs, Three Different Jobs

Kuaishou's own documentation splits the work cleanly. Knowing which input controls what is most of the skill.

The motion reference — video

Supplies the skeleton: every movement, its timing, the physics, and by default the camera and orientation too.

MP4 or MOV up to 50MB, 3–30 seconds, one clearly visible performer — clean framing beats high production value.

The character reference — image

Supplies the performer: face, body, outfit — who is doing the moving.

JPG or PNG over 300px, up to 10MB, aspect between 2:5 and 5:2 — with every limb the motion will need actually visible.

The text prompt — scene

Supplies the world: background, lighting, mood, style. It does not steer the motion — the video already owns that.

Describe atmosphere, not action: "neon stage, haze, hard rim light" works; "dance faster" does nothing.

Kling 2.6 or Kling 3.0 — Which Generation?

Both run here. The official line: 3.0 builds on 2.6 with stronger faces and a wider performance range.

Kling 3.0 Motion Control

Kuaishou positions it as the cinematic step up: stronger facial consistency across scenarios and high-precision capture for performance work. The default selection here.

Close-ups, expression-led performance, and anything where the face carries the shot.

Kling 2.6 Motion Control

The generation that made motion transfer reliable — testers consistently report distinct fingers and real weight transfer from it, two historic weak points of AI motion.

High-volume social output and dance content, where turnaround matters more than maximum facial fidelity.

Working rule: body-led content runs fine on 2.6; face-led content earns 3.0.

What Transfers Cleanly — and What to Approach Slowly

Compiled from official guidance and recurring field results.

Transfers well

Choreographed dance — the signature use, frame-accurate to the reference
Martial arts and sports moves with full-body visibility
Hand gestures and finger detail — distinct fingers since 2.6
Facial expressions riding on the performance, stronger again in 3.0
Weight and momentum: stomps, jumps, and landings read physically

Push carefully

Extremely fast or chaotic movement — official guidance warns output may shorten
References where limbs are blocked or leave the frame
Heavily stylized characters far from human proportions
Multi-person references — isolate one performer first
Long takes near the 30-second cap with complex action throughout

The Framing Rules That Decide the Result

One of these is called 'the most important setting in the entire interface' by Kling's own guide.

Match the framing: full body to full body.

If the motion video shows a full-body shot, the character image must be full-body too — half-body against full-body is the most common cause of broken outputs, per the official guide.

Choose who sets the orientation.

Matches Video, the default, lets the reference drive movement, expression, camera, and facing — and supports 3–30 second references. Matches Image keeps your character's original facing and works on 3–10 second references.

Output length follows the reference — usually.

The render matches your motion video's duration, but highly complex or fast action can come back shorter. Plan the edit around the move, not the clock.

Four Jobs Motion Transfer Does Well

Each card pairs the brief with the inputs, the payoff, and the catch.

Dance, recreated on anyone

The brief: A trending routine that should be performed by your character, not the original dancer.

The inputs: The routine clip + a full-body character image, framing matched.

What returns: Your character performing the routine beat-for-beat, camera moves included.

Why it works: Choreography is the documented signature case — timing and physics carry over intact.

Watch out for: Routines with floorwork and heavy occlusion; pick a take where limbs stay visible.

Motion posters that stop the scroll

The brief: A key visual that breathes: a character poster with living motion inside it.

The inputs: A short, controlled motion clip — a turn, a cape lift, hair in wind — plus your poster art.

What returns: A loop-ready animated poster for premieres, drops, and announcements.

Why it works: Brief, deliberate motion is the easiest transfer there is — minimal drift, maximum polish.

Watch out for: Text-heavy art: type can wobble during motion — composite the title afterwards in an editor.

Cinematic performance previz

The brief: Blocking an acted scene before committing to a shoot.

The inputs: A reference performance — yours, filmed on a phone — plus the designed character, 3.0 selected.

What returns: The character delivering the performance with facial consistency held across the shot.

Why it works: Exactly the scenario Kuaishou names for 3.0: cinematic performance and high-precision capture.

Watch out for: Final theatrical delivery — treat it as previz with production-grade ambitions.

A brand mascot that actually moves

The brief: The mascot needs to dance, wave, and react across an entire campaign.

The inputs: One library of motion clips + the mascot sheet, reused combination by combination.

What returns: A consistent mascot performance series produced without a suit or a studio.

Why it works: One motion library times one character image equals repeatable output — the pattern that scales.

Watch out for: Mascots with non-human proportions — giant heads, missing limbs — drift more; test with five seconds first.

Where Motion Transfer Breaks — and the Fixes

Five failure modes that show up in real use, each with the working answer.

Hands grow extra fingers when the image hides them.

Fix: If the motion needs hands, the image must show hands — pockets and crossed arms force the model to hallucinate, and that is where six-finger glitches live.

Very fast action comes back blurred or shortened.

Fix: Slow the reference at capture, split the move into beats, or transfer the cleanest section of the take.

Occluded or cluttered references confuse the skeleton.

Fix: Re-shoot or trim so one performer stays fully visible against a distinct background; a tripod beats handheld.

Characters far from human proportions drift mid-motion.

Fix: Keep designs roughly humanoid, run a five-second test before the full take, and favor stylized-but-bipedal characters.

The scene prompt cannot rescue a weak motion video.

Fix: Atmosphere is the prompt's only job here. Fix problems at the source — a better reference in means a better performance out.

Input Prep Is the Real Prompt Engineering

On this tool, quality is decided before you type a word. Three checklists cover it.

Motion video checklist

One performer, fully in frame for the whole take
3–30 seconds, MP4 or MOV, under 50MB
Stable camera — unless you want the camera move transferred too
Action readable at a glance: if you squint and lose it, so will the model

Character image checklist

Framing matched to the video — full-body for full-body
Every limb the motion uses, visible: no pockets, no crossed arms
Sharp, over 300px, aspect ratio between 2:5 and 5:2
Facing roughly aligned with the video's general orientation

Scene prompt checklist

Atmosphere only: place, light, weather, style
Name the look the way a gaffer would: "warm tungsten practicals, light haze"
No action words — the video owns the choreography
Keep wording identical when running multiple characters through one motion

Motion Control, Image-to-Video, or a Mocap Pipeline?

Three ways to make a character move — each owns a different brief.

Motion Control — this page

The movement already exists on video and must be copied precisely: dance, performance, choreography, gesture.

Image-to-video

You want the model to invent plausible motion from a still — ambient, loose, described in a prompt rather than copied from footage.

A mocap pipeline

Frame-exact skeletal data for game engines or VFX, with extreme stylization and occlusion — the traditional rig still earns its cost there.

How Motion Transfer Works Here

Two uploads and a line of scene direction — the tool sits at the top of this page.

Upload the move

Drop in a 3–30 second MP4 or MOV of the motion — one visible performer, steady framing, under 50MB.

Add the performer

Upload the character image with framing matched to the video and every needed limb in view; pick the generation and the orientation mode.

Set the scene and run

One line of atmosphere — place, light, mood — then generate at 720p or 1080p and inspect hands and face at full size.

Kling Motion Control: Field FAQ

The setup questions that decide output quality — answered from official docs and tested results.

Lead with 3.0 — it is the default here for a reason: Kuaishou positions it as the upgrade for facial consistency and high-precision performance capture. Drop to 2.6 for body-led, high-volume content like dance feeds, where its proven motion fidelity and quick turnaround matter more than maximum face quality. The inputs are identical either way, so switching costs nothing but a re-run.

Almost always because the image hid them. If the motion needs hands but your character has them in pockets or crossed, the model must invent hands from nothing — that is where six-finger glitches and blurry textures come from. Use an image with both hands clearly visible and re-run. Hand rendering itself has been a strength since 2.6; hidden-limb hallucination is the real culprit.

Output length normally matches the reference, but Kling's official guidance notes that highly complex or fast-paced action can return shorter renders. Treat it as a signal: the move outran the model. Slow the reference at capture, trim to the cleanest section, or split the sequence into beats and rejoin them in an edit.

Yes — the official guide calls this the most important setting in the entire interface. Full-body video demands a full-body image; half-body pairs with half-body. A mismatch forces the model to invent the missing anatomy, and invented anatomy is where outputs break. Check the two frames side by side before anything else.

Both. Under the default Matches Video orientation, the character follows the reference's movements and expressions, camera and facing included. Facial fidelity is also the headline improvement of the 3.0 generation — Kuaishou cites stronger facial consistency across scenarios — so for expression-led shots, run 3.0 and keep the face unobstructed in both inputs.

One performer, fully visible, start to finish — that single property predicts more than anything else. Then: 3–30 seconds in MP4 or MOV under 50MB, a steady camera unless you want the camera move copied, action readable at a glance, minimal background clutter. Production value is irrelevant — a clean phone clip on a tripod regularly beats polished footage with occlusion.

Who controls the facing. Matches Video, the default, hands everything to the reference — movement, expression, camera, and orientation — and accepts 3–30 second clips. Matches Image keeps your character's original facing from the still while transferring the motion, and works with 3–10 second references. Use the default for faithful recreation; switch when the character's own pose and direction are the point.

Yes — that is exactly what the text prompt is for. The division of labor: video drives motion, image defines the character, prompt sets the world. Write atmosphere like a set note — 'rain-soaked rooftop at night, neon spill, light haze' — and the performance plays inside it. What the prompt cannot do is alter the choreography; action words are ignored by design.

Pair your poster art with a short, controlled motion clip — a slow turn, a cape lift, hair in wind — and transfer it. Brief, deliberate movement is the easiest case there is, which is why animated posters are among the most reliable outputs. One production note: keep title typography out of the generated layer and composite text afterwards, since type can wobble during motion.

For a growing share of work, functionally yes; for the rest, honestly no. Social content, previz, motion posters, and mascot animation no longer justify suits and studios — a reference video does the job same-day. Game-engine skeletons, frame-exact VFX data, and extreme non-human rigs still belong to traditional mocap. The dividing line: do you need a video performance, or the underlying data?

Stage it. Official guidance flags fast, complex action as the case where output degrades or shortens — so start with a simpler section, confirm the character holds, then escalate. Three practical levers: capture the reference in slow motion, cut the sequence into beats and transfer each, and keep the most chaotic moment mid-clip rather than at the start.

Yes — and that is the production pattern worth building. One clean motion reference becomes a template: swap character images through it and every cast member performs identically, which is exactly what campaign series and mascot libraries need. Keep the scene prompt wording fixed across runs for visual consistency, and hold each new image to the same framing rule.

Keep the Character Working

Generate fresh footage, rewrite existing shots, or give the character a voice.

AI Video Generator

AI Video Editor

AI Avatar Generator

The Move Is Already Filmed — Recast It

Upload the motion, add your character, describe the stage. Kling Motion Control returns the performance recast at up to 1080p — dance, gesture, and expression intact.

Kling Motion Control — Copy the Move, Keep Your Character

Kling Motion Control — Copy the Move, Keep Your Character

Three Inputs, Three Different Jobs

The motion reference — video

The character reference — image

The text prompt — scene

Kling 2.6 or Kling 3.0 — Which Generation?

Kling 3.0 Motion Control

Kling 2.6 Motion Control

What Transfers Cleanly — and What to Approach Slowly

Transfers well

Push carefully

The Framing Rules That Decide the Result

Four Jobs Motion Transfer Does Well

Dance, recreated on anyone

Motion posters that stop the scroll

Cinematic performance previz

A brand mascot that actually moves

Where Motion Transfer Breaks — and the Fixes

Input Prep Is the Real Prompt Engineering

Motion video checklist

Character image checklist

Scene prompt checklist

Motion Control, Image-to-Video, or a Mocap Pipeline?

Motion Control — this page

Image-to-video

A mocap pipeline

How Motion Transfer Works Here

Upload the move

Add the performer

Set the scene and run

Kling Motion Control: Field FAQ

Kling 2.6 vs Kling 3.0 Motion Control — which should I use?

Why do my character's hands come out wrong?

Why is the generated video shorter than my reference?

Does my image have to match the video's framing?

Can it copy facial expressions, or only body movement?

What makes a good motion reference video?

Matches Video vs Matches Image — what actually changes?

Can I change the background and lighting during the transfer?

How do I make a motion poster from a still image?

Is this a replacement for motion capture?

What if the motion is too fast or too complex?

Can I reuse one motion video across multiple characters?

Keep the Character Working

The Move Is Already Filmed — Recast It

Kling Motion Control — Copy the Move, Keep Your Character

Three Inputs, Three Different Jobs

The motion reference — video

The character reference — image

The text prompt — scene

Kling 2.6 or Kling 3.0 — Which Generation?

Kling 3.0 Motion Control

Kling 2.6 Motion Control

What Transfers Cleanly — and What to Approach Slowly

Transfers well

Push carefully

The Framing Rules That Decide the Result

Four Jobs Motion Transfer Does Well

Dance, recreated on anyone

Motion posters that stop the scroll

Cinematic performance previz

A brand mascot that actually moves

Where Motion Transfer Breaks — and the Fixes

Input Prep Is the Real Prompt Engineering

Motion video checklist

Character image checklist

Scene prompt checklist

Motion Control, Image-to-Video, or a Mocap Pipeline?

Motion Control — this page

Image-to-video

A mocap pipeline

How Motion Transfer Works Here

Upload the move

Add the performer

Set the scene and run

Kling Motion Control: Field FAQ

Kling 2.6 vs Kling 3.0 Motion Control — which should I use?

Why do my character's hands come out wrong?

Why is the generated video shorter than my reference?

Does my image have to match the video's framing?