D-ID Launches V4 Expressive Visual Agents for Real-Time, LLM-Connected Interaction at Enterprise Scale
V4 Avatars combine low-latency and highly cost-effective performance, diffusion-powered expressive delivery, and consistent identity for real-time user engagement and long-form enterprise video
NEW YORK, March 16, 2026 /PRNewswire/ -- D-ID, a leader in enterprise-grade AI avatar solutions, today announced the launch of V4 Expressive Visual Agents, a new generation of ultra-high-fidelity digital humans designed for real-time, LLM-connected conversations, as well as scripted long-form enterprise video content.
Built on a new diffusion-based model, and trained on performances captured from real actors, V4 Expressive Visual Agents deliver faster generation, low latency (sub-0.5-second) conversational turns, and highly accurate lip sync, in up to 4K resolution, enabling expressive, natural interactions that scale reliably across enterprise use cases.
Available today to 1,500 enterprise customers and millions of subscribers, V4 Avatars are engineered specifically for low-latency delivery, making them suitable for real-time, conversational experiences, as well as longer-form content such as training modules, explainers, and multilingual educational videos. To date, more than 800,000 visual agents and 300 million non-interactive avatars have been created using previous D-ID models. At launch, V4 Expressive Visual Agents are available to users on all D-ID plans, starting from as little as $5.90 a month, showcasing the groundbreaking cost efficiency of the V4 AI model.
Research shows that humanlike facial cues improve knowledge transfer, retention, and comprehension. As a result, enterprises are increasingly adopting high-fidelity avatars for onboarding, training, customer engagement, and internal communications, particularly where clarity, trust, and consistency matter.
V4 Expressive Visual Agents are the first high-quality expressive avatars to dynamically align with selected sentiments, ensuring that tone and intent match the underlying message. This allows spoken content to land clearly and confidently, with natural pacing and emphasis. They are designed to act as a visual interface layer for AI systems, enabling real-time, two-way interactions rather than one-way video playback. As an LLM responds, the avatar automatically adapts facial expressions and delivery based on context and sentiment, so empathy looks empathetic, urgency feels urgent, and confidence reads as confidence. This makes both customer-facing and employee-facing agents more natural, trustworthy, and effective.
V4 Expressive Visual Agents also add an optional camera layer that enables real-time sentiment awareness, feeding nonverbal cues into both the LLM response and the avatar's expressive delivery, including tone and facial expression. In addition, V4 Expressive Visual Agents can surface interactive UI elements inline during the conversation, to share contextual visuals such as images, charts, and video, as well as structured interactions like forms and quizzes, enabled via D-ID's MCP Apps.
Unlike short-form video generation tools optimized for cinematic clips lasting only seconds, V4 Avatars are designed for continuous, consistent output. Enterprises can generate minutes or hours of video with a stable avatar identity, as well as run real-time conversations at scale, at a fraction of the price (70x cheaper than Google VEO 3 Fast), making it far more cost-effective for courses, explainers, multilingual training, and repeatable content series. These savings compound when it comes to real-time interactions, costing pennies per chat when using D-ID.
"We have come a long way since our first models that delighted the world by turning still images into talking portraits," said D-ID Co-founder and CEO Gil Perry. "Today, with V4, we're setting a new benchmark for avatar fidelity and performance while keeping it fast enough for real-time conversations and consistent, efficient and secure enough for enterprise scale. This advancement in avatar technology positions D-ID as the frontrunner in providing the visual interface layer for the next wave of AI adoption as businesses seek to make interactions more natural and humanlike."
Following the acquisition of simpleshow in September 2025, D-ID expanded its enterprise distribution footprint and integrated its AI avatar capabilities into simpleshow's corporate training and explainer video ecosystem. Since then, D-ID's ARR has grown by 250%, reflecting cross-sell expansion and increased enterprise demand for interactive AI-driven video.
D-ID is the world leader in generative AI for video and digital humans, enabling frictionless, real-time interaction through its Real-Time Streaming API. Its technology powers lifelike digital presenters, learning companions, and virtual assistants for Fortune 500 companies and mission-driven organizations alike. In September 2025, D-ID acquired simpleshow, the global pioneer in AI-based explainer video creation. Based in Berlin, simpleshow helps organizations in more than 70 countries simplify complex messages through smart, scalable, and human-centric video communication. https://www.d-id.com
PRESS CONTACT:
Leah Stern, D-ID Press Office: [email protected]
Promo video
https://vimeo.com/1155661354/930ea90e6f
Spotlight on sentiments
https://vimeo.com/1154695614/53fdb27bcf
Evolution of D-ID avatars
https://vimeo.com/1154662907/32c1c66a85
Expressions for versatility
https://vimeo.com/1154639506/7582e1af88
Media Kit
https://drive.google.com/drive/folders/1BVOi8a6KqUx1y5KN6xf5coYe3sMiNe-5
Video: https://www.youtube.com/watch?v=hPI6_ei_6Y8
SOURCE D-ID

Share this article