LongCat Avatar

About LongCat Avatar

Introduction to LongCat Avatar

LongCat Avatar is an audio-driven avatar model for long-duration video generation. It focuses on identity consistency, precise lip synchronization, and natural human dynamics, including gestures and idle movements during silent segments. The system is designed to maintain visual quality across extended or theoretically infinite-length sequences.

Built on the LongCat-Video architecture, the model supports multiple generation modes and efficient inference suitable for production workflows. It is relevant for creators, studios, education teams, research groups, and SaaS providers that require consistent, realistic avatar videos at scale.

Key Takeaways

Supports audio-text-to-video (AT2V), audio-text-image-to-video (ATI2V), and audio-conditioned video continuation in a unified framework
Cross-Chunk Latent Stitching minimizes quality degradation across long sequences
Disentangled motion modeling produces natural gestures and idle behavior, even without speech
Reference Skip Attention preserves identity without copy-paste artifacts
Multi-person and theoretically infinite-length sequence support
Efficient inference via coarse-to-fine generation and Block Sparse Attention; optimized for fast 720p/30fps
Output resolutions up to 1080p; designed for stable long-form content
Open-source (MIT License) with local deployment support; model size noted at 13.6B parameters

How LongCat Avatar Works

The workflow begins with an audio input (speech, music, or podcast) and optional references (image or text). Users select a generation mode—AT2V, ATI2V, or audio-conditioned video continuation—then configure length, resolution, and whether multi-person generation is needed. The system is optimized for long-form stability and identity consistency across extended sequences.

Technically, LongCat Avatar separates the roles of audio and motion using a disentangled guidance mechanism. Cross-Chunk Latent Stitching reduces visual drift by avoiding redundant decode-encode cycles over long timelines. Reference Skip Attention preserves character identity without rigid cloning. A coarse-to-fine strategy combined with Block Sparse Attention enables practical, production-ready inference at 720p/30fps, with support for higher output resolutions up to 1080p.

Core Benefits and Applications

Long-form presenters and lectures: Maintain consistent identity and natural delivery across extended recordings.
Podcasts and interviews: Generate hour-scale speaking videos with stable appearance and lip-sync.
Entertainment and performance: Produce expressive acting or singing with rhythm-aware movement.
Sales, marketing, and corporate communications: Create presenters that handle pauses and silent moments naturally.
Multi-person conversations: Support multi-speaker interactions with individual identity preservation and turn-taking.

Pricing Overview (one-time credit packs)

Plan	Price	Credits	Approx. Videos	Resolution	Audio Duration (per gen)	Multi-Person	Priority	Notes
Base	$9.9	90	Up to 18	480p/720p/1080p	Up to 60s	Not listed	Standard	Audio-driven avatar generation
Pro	$29.9	400	Up to 80	480p/720p/1080p	Up to 60s	Yes	Priority	Designed for multi-person support
Ultimate	$49.9	800	Up to 160	480p/720p/1080p	Up to 60s	Yes (interactions)	Priority	Listed as supporting long-form video generation
Creator	$99.9	1800	Up to 360	480p/720p/1080p	Up to 60s	Yes	Highest	Listed as “multi-person & infinite-length support,” production-ready architecture, commercial license

Notes: The model is optimized for long-duration content and theoretically supports infinite-length sequences. Pricing tiers list a 60-second audio duration per generation for credit-based usage, which may reflect service-level constraints rather than model capability.

Introduction to LongCat Avatar

Key Takeaways

Supports audio-text-to-video (AT2V), audio-text-image-to-video (ATI2V), and audio-conditioned video continuation in a unified framework

Cross-Chunk Latent Stitching minimizes quality degradation across long sequences

Disentangled motion modeling produces natural gestures and idle behavior, even without speech

Reference Skip Attention preserves identity without copy-paste artifacts

Multi-person and theoretically infinite-length sequence support

Efficient inference via coarse-to-fine generation and Block Sparse Attention; optimized for fast 720p/30fps

Output resolutions up to 1080p; designed for stable long-form content

Open-source (MIT License) with local deployment support; model size noted at 13.6B parameters

How LongCat Avatar Works

Core Benefits and Applications

Long-form presenters and lectures: Maintain consistent identity and natural delivery across extended recordings.

Podcasts and interviews: Generate hour-scale speaking videos with stable appearance and lip-sync.

Entertainment and performance: Produce expressive acting or singing with rhythm-aware movement.

Sales, marketing, and corporate communications: Create presenters that handle pauses and silent moments naturally.

Multi-person conversations: Support multi-speaker interactions with individual identity preservation and turn-taking.

Pricing Overview (one-time credit packs)

Plan	Price	Credits	Approx. Videos	Resolution	Audio Duration (per gen)	Multi-Person	Priority	Notes
Base	$9.9	90	Up to 18	480p/720p/1080p	Up to 60s	Not listed	Standard	Audio-driven avatar generation
Pro	$29.9	400	Up to 80	480p/720p/1080p	Up to 60s	Yes	Priority	Designed for multi-person support
Ultimate	$49.9	800	Up to 160	480p/720p/1080p	Up to 60s	Yes (interactions)	Priority	Listed as supporting long-form video generation
Creator	$99.9	1800	Up to 360	480p/720p/1080p	Up to 60s	Yes	Highest	Listed as “multi-person & infinite-length support,” production-ready architecture, commercial license

About LongCat Avatar

Introduction to LongCat Avatar

Key Takeaways

How LongCat Avatar Works

Core Benefits and Applications

Pricing Overview (one-time credit packs)

Get Started

Categories

Tags

LongCat Avatar

About LongCat Avatar

Introduction to LongCat Avatar

Key Takeaways

How LongCat Avatar Works

Core Benefits and Applications

Pricing Overview (one-time credit packs)

Get Started

Categories

Tags