Sync Lipsync 2.0 – Zero‑Shot AI Video Lip‑Sync Model

Title: lipsync 2.0

URL Source: https://sync.so/blog/lipsync-2-0/

Markdown Content: lipsync 2.0

most natural lipsyncing model in the world

Image 1: lipsync 2.0 Quick Overview:

lipsync-2 : the most natural video-to-video lipsyncing model in the world
Zero-shot, no need to wait for an “actor”, “clone”, or “avatar” to train before using it.
Learns and generates a speaker’s unique style of speech
Works across live-action, animated, and AI-generated humans
Use it to build video translation, word-level editing of video, and character re-animation workflows (including generating realistic AI UGC)

A whole new model

Introducing lipsync-2, the world's first zero-shot lipsyncing model that preserves a speaker's unique style without additional training or fine-tuning

lipsync-2 is a leap forward in realism, expressiveness, control, quality, and speed across live-action, animated, and AI-generated video

Features

Introducing zero-shot lipsync: style preservation

lipsync 2.0 learns from a representation of how a person speaks by watching how they speak from the input.

Notice how even across different languages, we preserve the speaking style of Nicolas Cage. Sync is the first zero-shot lipsyncing model to achieve this

0:00

/0:56

Temperature Control: Ability to control how expressive the lipsync generates.

0:00

/0:29

Active speaker detection: Handle long videos with multiple speakers — we built, ASD-1,a state-of-the-art active speaker detection pipeline that associates a unique voice with a unique face, and only applies lipsync when we detect that person is actively speaking.

Flawless animation:Works across animated characters, from Pixar-level animations to AI generated characters. Translation is only the beginning, with the power to edit dialogue in any video in post-production we’re on the cusp of reimagining how we create, edit, and consume videos forever.

**Record Once & Edit Dialogue Forever:**A world where you only ever have to hit record once. lipsync-2 is the only model that let’s you edit a dialogue while preserving the original speakers style,without needing to train or fine-tune beforehand.

AI Video

In an age where we can generate any video by typing a few lines of text, we don’t have to limit ourselves to what we can capture with a camera.

At sync, we believe AI lipsync is just the beginning.

We live in an extraordinary age.

A high schooler can craft a masterpiece with an iPhone. A studio can produce a movie at a tenth of the cost 10x faster. Every video can be distributed worldwide in any language,instantly. We make video as malleable as text.

Additional Resources:

Get $5 in credits here

Docs

Links/Buttons:

Markdown Content: lipsync 2.0

A whole new model

Features

Related Articles