
SentiAvatar, the First Interactive 3D Digital Human Framework from SentiPulse and GSAI, Now Open Source
Release includes motion dataset, foundation model, and streaming architecture designed to align speech, gesture, and expression in live conversation
JINAN, China, April 9, 2026 /PRNewswire/ -- SentiPulse, a leading AI company focused on emotional foundation models and user experience innovation, in collaboration with a PhD team from the Gaoling School of Artificial Intelligence (GSAI), Renmin University of China (RUC), announced the open-source release of SentiAvatar, a framework for building expressive interactive 3D digital humans.
SentiAvatar powers SUSU, a real-time 3D avatar capable of conversation, expressive motion, and emotional delivery. The release includes the full SentiAvatar framework, the SUSU character model, and the SuSuInterActs high-quality motion dataset, all now freely available on GitHub.
In the rapidly advancing field of 3D digital humans, one long-overlooked yet critical issue is becoming increasingly clear: unnatural expression. The avatar's mouth moves and its hands gesture, but the actions don't match the meaning, and the face looks stiff. The combination quickly triggers the uncanny valley effect.
The issue is simple: human communication has never relied on spoken language alone. A shrug conveys helplessness, a nod signals agreement, and a slight raise of the eyebrow hints at doubt. These nonverbal signals—gestures, posture, facial expressions—are the soul of real conversation.
Yet getting a 3D digital human to naturally "gesture and move as they speak" in real conversation has proven far harder than expected. This is not purely an engineering challenge—it involves three persistent challenges that have never been solved: the lack of high-quality data, the unmet need to understand composite semantic actions, and the difficulty of syncing motion with the rhythm of speech.
SentiAvatar: A Framework for 3D Digital Human Motion Generation
The SuSuInterActs Dataset
SentiPulse built SuSuInterActs around a single character: SUSU (age 22, warm and lively, emotionally rich). The dataset contains 21,000 clips and 37 hours of multimodal conversational data, including synchronized speech, annotated behavioral text, full-body motion, and facial expressions—helping address the lack of high-quality Chinese-language datasets in the field.
Motion Foundation Model: Pre-trained on 200K+ Sequences
As conversational motion data is inherently limited by dialogue scenarios, the team pre-trained a proprietary Motion Foundation Model on more than 200,000 heterogeneous motion sequences (approximately 676 hours), learning general motion patterns that go far beyond dialogue-specific actions.
Core Architecture: Plan-Then-Infill
SentiAvatar introduces a novel dual-channel parallel architecture - Plan-Then-Infill. It separates body motion from facial expression, first planning what action to perform and then infilling in how to execute it frame by frame.
State-of-the-Art Real-Time Performance
SentiAvatar achieves new SOTA results on both the SuSuInterActs and BEATv2 datasets. Compared to mainstream models: MoMask lacks speech input and results in rhythm that feels static and disconnected; EMAGE syncs with audio but ignores semantic intent; AT2M-GPT can misinterpret the meaning of actions; and HunYuan-Motion can produce unstable outputs, often with distorted or unnatural movements. SentiAvatar delivers semantically accurate motion that is tightly aligned with audio.
The framework generates six-second motion sequences within 0.3 seconds and supports infinite-turn streaming interaction. This means digital humans can continuously generate coherent gestures and expressions during live conversation—without waiting for a full sentence to finish before processing, directly addressing one of the core causes of unnatural expression.
Open Source and Beyond: From Digital Humans to Digital Life
The SentiPulse team invites research organizations and individual developers worldwide to push the boundaries of 3D motion generation. Whether you want to build your own 3D companion from scratch or extend SUSU with richer expressive capabilities for games, film production, robotics, or beyond—the open-source framework is ready.
GitHub: https://sentiavatar.github.io/
Technical report: https://arxiv.org/abs/2604.02908
About SentiPulse
SentiPulse, founded in September 2025, is an AI company focused on emotional foundation models and user experience innovation. The company is dedicated to deepening the relationship between humans and AI through advanced technology—not simply as tools, but as a bridge to more natural and expressive interaction. The team consists of top researchers from leading Chinese universities and cross-disciplinary experts with deep expertise in multimodal models and 3D digital humans.
SOURCE SentiPulse
Share this article