0 / 2500
Reference image defines characters, background, and other elements. Size needs to be ≥300px, aspect ratio 2:5–5:2.
Kling Motion Control — Copy the Move, Keep Your Character
Kling Motion Control is motion transfer made practical: feed it a video of someone moving, a picture of who should be moving, and a line about the scene — and it returns your character performing that exact movement, expressions and camera included. This page runs both Kling generations side by side, with reference videos up to thirty seconds at 720p or 1080p. Below: how the three inputs divide the work, which motions transfer cleanly, and the framing rule that decides most results — drawn from Kuaishou's official guides and field testing.
Three Inputs, Three Different Jobs
Kuaishou's own documentation splits the work cleanly. Knowing which input controls what is most of the skill.
The motion reference — video
Supplies the skeleton: every movement, its timing, the physics, and by default the camera and orientation too.
MP4 or MOV up to 50MB, 3–30 seconds, one clearly visible performer — clean framing beats high production value.
The character reference — image
Supplies the performer: face, body, outfit — who is doing the moving.
JPG or PNG over 300px, up to 10MB, aspect between 2:5 and 5:2 — with every limb the motion will need actually visible.
The text prompt — scene
Supplies the world: background, lighting, mood, style. It does not steer the motion — the video already owns that.
Describe atmosphere, not action: "neon stage, haze, hard rim light" works; "dance faster" does nothing.
Kling 2.6 or Kling 3.0 — Which Generation?
Both run here. The official line: 3.0 builds on 2.6 with stronger faces and a wider performance range.
Kling 3.0 Motion Control
Kuaishou positions it as the cinematic step up: stronger facial consistency across scenarios and high-precision capture for performance work. The default selection here.
Close-ups, expression-led performance, and anything where the face carries the shot.
Kling 2.6 Motion Control
The generation that made motion transfer reliable — testers consistently report distinct fingers and real weight transfer from it, two historic weak points of AI motion.
High-volume social output and dance content, where turnaround matters more than maximum facial fidelity.
Working rule: body-led content runs fine on 2.6; face-led content earns 3.0.
What Transfers Cleanly — and What to Approach Slowly
Compiled from official guidance and recurring field results.
Transfers well
- Choreographed dance — the signature use, frame-accurate to the reference
- Martial arts and sports moves with full-body visibility
- Hand gestures and finger detail — distinct fingers since 2.6
- Facial expressions riding on the performance, stronger again in 3.0
- Weight and momentum: stomps, jumps, and landings read physically
Push carefully
- Extremely fast or chaotic movement — official guidance warns output may shorten
- References where limbs are blocked or leave the frame
- Heavily stylized characters far from human proportions
- Multi-person references — isolate one performer first
- Long takes near the 30-second cap with complex action throughout
The Framing Rules That Decide the Result
One of these is called 'the most important setting in the entire interface' by Kling's own guide.
Match the framing: full body to full body.
If the motion video shows a full-body shot, the character image must be full-body too — half-body against full-body is the most common cause of broken outputs, per the official guide.
Choose who sets the orientation.
Matches Video, the default, lets the reference drive movement, expression, camera, and facing — and supports 3–30 second references. Matches Image keeps your character's original facing and works on 3–10 second references.
Output length follows the reference — usually.
The render matches your motion video's duration, but highly complex or fast action can come back shorter. Plan the edit around the move, not the clock.
Four Jobs Motion Transfer Does Well
Each card pairs the brief with the inputs, the payoff, and the catch.
Dance, recreated on anyone
The brief: A trending routine that should be performed by your character, not the original dancer.
The inputs: The routine clip + a full-body character image, framing matched.
What returns: Your character performing the routine beat-for-beat, camera moves included.
Why it works: Choreography is the documented signature case — timing and physics carry over intact.
Watch out for: Routines with floorwork and heavy occlusion; pick a take where limbs stay visible.
Motion posters that stop the scroll
The brief: A key visual that breathes: a character poster with living motion inside it.
The inputs: A short, controlled motion clip — a turn, a cape lift, hair in wind — plus your poster art.
What returns: A loop-ready animated poster for premieres, drops, and announcements.
Why it works: Brief, deliberate motion is the easiest transfer there is — minimal drift, maximum polish.
Watch out for: Text-heavy art: type can wobble during motion — composite the title afterwards in an editor.
Cinematic performance previz
The brief: Blocking an acted scene before committing to a shoot.
The inputs: A reference performance — yours, filmed on a phone — plus the designed character, 3.0 selected.
What returns: The character delivering the performance with facial consistency held across the shot.
Why it works: Exactly the scenario Kuaishou names for 3.0: cinematic performance and high-precision capture.
Watch out for: Final theatrical delivery — treat it as previz with production-grade ambitions.
A brand mascot that actually moves
The brief: The mascot needs to dance, wave, and react across an entire campaign.
The inputs: One library of motion clips + the mascot sheet, reused combination by combination.
What returns: A consistent mascot performance series produced without a suit or a studio.
Why it works: One motion library times one character image equals repeatable output — the pattern that scales.
Watch out for: Mascots with non-human proportions — giant heads, missing limbs — drift more; test with five seconds first.
Where Motion Transfer Breaks — and the Fixes
Five failure modes that show up in real use, each with the working answer.
Hands grow extra fingers when the image hides them.
Fix: If the motion needs hands, the image must show hands — pockets and crossed arms force the model to hallucinate, and that is where six-finger glitches live.
Very fast action comes back blurred or shortened.
Fix: Slow the reference at capture, split the move into beats, or transfer the cleanest section of the take.
Occluded or cluttered references confuse the skeleton.
Fix: Re-shoot or trim so one performer stays fully visible against a distinct background; a tripod beats handheld.
Characters far from human proportions drift mid-motion.
Fix: Keep designs roughly humanoid, run a five-second test before the full take, and favor stylized-but-bipedal characters.
The scene prompt cannot rescue a weak motion video.
Fix: Atmosphere is the prompt's only job here. Fix problems at the source — a better reference in means a better performance out.
Input Prep Is the Real Prompt Engineering
On this tool, quality is decided before you type a word. Three checklists cover it.
Motion video checklist
- One performer, fully in frame for the whole take
- 3–30 seconds, MP4 or MOV, under 50MB
- Stable camera — unless you want the camera move transferred too
- Action readable at a glance: if you squint and lose it, so will the model
Character image checklist
- Framing matched to the video — full-body for full-body
- Every limb the motion uses, visible: no pockets, no crossed arms
- Sharp, over 300px, aspect ratio between 2:5 and 5:2
- Facing roughly aligned with the video's general orientation
Scene prompt checklist
- Atmosphere only: place, light, weather, style
- Name the look the way a gaffer would: "warm tungsten practicals, light haze"
- No action words — the video owns the choreography
- Keep wording identical when running multiple characters through one motion
Motion Control, Image-to-Video, or a Mocap Pipeline?
Three ways to make a character move — each owns a different brief.
Motion Control — this page
The movement already exists on video and must be copied precisely: dance, performance, choreography, gesture.
Image-to-video
You want the model to invent plausible motion from a still — ambient, loose, described in a prompt rather than copied from footage.
A mocap pipeline
Frame-exact skeletal data for game engines or VFX, with extreme stylization and occlusion — the traditional rig still earns its cost there.
How Motion Transfer Works Here
Two uploads and a line of scene direction — the tool sits at the top of this page.
Upload the move
Drop in a 3–30 second MP4 or MOV of the motion — one visible performer, steady framing, under 50MB.
Add the performer
Upload the character image with framing matched to the video and every needed limb in view; pick the generation and the orientation mode.
Set the scene and run
One line of atmosphere — place, light, mood — then generate at 720p or 1080p and inspect hands and face at full size.
Kling Motion Control: Field FAQ
The setup questions that decide output quality — answered from official docs and tested results.
Keep the Character Working
Generate fresh footage, rewrite existing shots, or give the character a voice.
The Move Is Already Filmed — Recast It
Upload the motion, add your character, describe the stage. Kling Motion Control returns the performance recast at up to 1080p — dance, gesture, and expression intact.