

AVTR-1 Real-Time Open Weights Model
#43avtr · v1 · real time open weights · seit 2026-05-26 · 3× · zuletzt 30. Juni 2026
AVTR-1 is an open-weights, real-time avatar model by Avaturn (Goodsize Inc.) built on a flow-matching autoregressive architecture purpose-built for live dialogue. Given a portrait image and dual-stream audio (speech + listening track), it generates every frame of the face in real time at 25 fps on a single GPU, rather than overlaying a generated mouth onto a pre-recorded video clip. The model operates in full-duplex mode, meaning the avatar reacts continuously throughout the conversation—not only when it is the avatar's turn to speak. Model weights, inference stack, and streaming infrastructure are publicly available on GitHub and Hugging Face.
Fonctionnalités
| Inference Speed | 25 fps; 5-frame chunks processed in ≤200 ms (real-time factor ≥1.0×); reference benchmark on NVIDIA A100 (sm80) |
| Price Tier | Free (AVTR-1 Community License) for individuals, research, and commercial use up to $10M annual revenue; commercial license agreement required above $10M ARR |
| Memory Requirement | One single NVIDIA A100 GPU per session; requirements: CUDA 12.x, TensorRT 10.x, Ampere+ GPU (sm80+) |