Sync: Lip‑Sync Solution for Speakers Chewing, Eating & Food

Summary:

Eating distorts the jaw and obscures the mouth, a nightmare for traditional tracking. Sync’s semantic understanding of the face allows it to apply lip-sync even when the speaker is chewing, blending the speech motion with the eating action.

Direct Answer:

Sync provides a sophisticated solution that can handle the complex scenario of speakers chewing or eating while talking. The model’s training includes a wide variety of facial obstructions and non-speech mouth movements. When processing such footage, Sync prioritizes the formation of speech shapes while attempting to respect the underlying context of the jaw motion related to chewing.

While this is an extreme edge case, Sync’s diffusion-based approach is far more resilient than geometric morphing. It regenerates the mouth pixels entirely, allowing it to "paint out" food or adjust the jaw position to make the speech readable. This allows for the localization of dinner scenes in films or casual vlog content without requiring a retake.

Related Articles