SAN FRANCISCO, Jan. 22, 2026 /PRNewswire/ -- FlashLabs , an applied AI research and engineering lab building real-time agentic systems, today announced the release of Chroma 1.0, the world's first open-source, end-to-end, real-time speech-to-speech AI model with personalized voice cloning.

Chroma is built to remove one of the largest bottlenecks in human–AI interaction: latency. By operating natively in voice—without the traditional ASR → LLM → TTS pipeline—Chroma enables natural, fluid conversations that feel immediate, responsive, and human.

Chroma

"Voice is the most universal interface in the world, yet it has remained closed, fragmented, and delayed," said Yi Shi, Founder and Chief Research & Engineering at FlashLabs. "With Chroma, we're open-sourcing real-time voice intelligence so builders, researchers, and companies can create AI systems that truly work at human speed."

Built for Real-Time, Not Post-Processing

Unlike conventional voice systems that stitch together multiple components, Chroma is natively speech-to-speech, enabling:

End-to-end TTFT under 150ms

Natural conversational turn-taking

Low-latency emotional and prosodic control

Stable real-time inference without cascading delays

With Day-0 SGLang support, Chroma further reduces latency and improves throughput, achieving approximately 135ms end-to-end TTFT and real-time factors optimized for live deployment.

High-Fidelity Voice Cloning in Seconds

Chroma introduces few-second reference voice cloning, allowing users to generate highly realistic, personalized voices from minimal audio input.

In internal evaluations:

Speaker similarity score (SIM): 0.817

+10.96% above human baseline (0.73)

Best-in-class performance among both open and closed baselines

These results demonstrate that high-quality voice identity no longer requires large datasets or long fine-tuning cycles.

Strong Reasoning at Efficient Scale

Despite using compact ~4B-parameter architectures, Chroma delivers strong reasoning and dialogue capabilities by leveraging modern multimodal backbones and optimized real-time inference. This makes it suitable for edge deployment, agents, call centers, and interactive systems where latency and cost matter.

Applications

Chroma enables a new class of real-time voice applications, including:

Autonomous voice agents

AI call centers

Real-time translators

Conversational assistants

Interactive characters and NPCs

Multimodal AI systems

Availability

Chroma 1.0 is available today:

About FlashLabs

FlashLabs is an applied AI research lab focused on real-time, agentic, and multimodal intelligence. The team builds open and production-grade systems that power autonomous agents across voice, text, and action.

Media Contact:

Koki Kobayashi

6506097501

[email protected]

SOURCE FlashLabs