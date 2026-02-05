BEIJING, Feb. 5, 2026 /PRNewswire/ -- Kling AI, the AI-powered creative platform, today announced the launch of its Kling 3.0 models — including Video 3.0, Video 3.0 Omni, Image 3.0 and Image 3.0 Omni — giving creators greater narrative control and stronger consistency in AI video generation. The model series features major upgrades in consistency, photorealistic output, extended video duration up to 15s, and native audio generation across multiple languages, dialects, and accents.

Video 3.0's improved element consistency with video and image references

Powered by an integrated unified training framework, the Kling 3.0 model series supports full multimodal input and output spanning text, images, audio, and video, bringing the understanding, generation, and editing of video together in one streamlined AI workflow. The models integrate multiple tasks, including text-to-video, image-to-video, reference-to-video, and in-video editing, into a single, native multimodal architecture, enabling the models to follow complex narrative logic, deliver precise shot control, and maintain strong prompt adherence.

The Kling AI 3.0 models are now available for exclusive early access to Ultra subscribers and will soon be available to the public.

Video 3.0: Cinematic-quality Video Production

Video 3.0 pushes creative control further with improved element consistency, enabling creators to upload reference videos and multiple image references to ensure characters, objects, and scenes remain visually coherent across frames.

Its key features include:

Native Audio Across Languages & Accents: The model can generate speech in English, Chinese, Japanese, Korean, Spanish, and accents such as American, British and Indian accents. It can also produce complex multi-character dialogue scenes in which each character speaks a different language, with precise user control over content, delivery and speaking order.

The model can generate speech in English, Chinese, Japanese, Korean, Spanish, and accents such as American, British and Indian accents. It can also produce complex multi-character dialogue scenes in which each character speaks a different language, with precise user control over content, delivery and speaking order. Extended Video Duration: Video 3.0 model also supports longer video generation up to 15 seconds in length. The extended video duration also means that the model can handle intricate sequences, including long takes and multiple plot twists, with smooth, film-like transitions.

Video 3.0 model also supports longer video generation up to 15 seconds in length. The extended video duration also means that the model can handle intricate sequences, including long takes and multiple plot twists, with smooth, film-like transitions. Intelligent multi-shot storytelling: Video 3.0 understands multi-scene, multi-shot instructions, dynamically adjusting camera angles and shots to match creative direction — from classic shot-reverse-shot dialogues to advanced cross-cutting dialogue and voice-over.

Video 3.0 understands multi-scene, multi-shot instructions, dynamically adjusting camera angles and shots to match creative direction — from classic shot-reverse-shot dialogues to advanced cross-cutting dialogue and voice-over. Better preservation of text in imagery: The model can retain or generate text—such as signage, captions, and branded elements—with high accuracy. This is particularly valuable for e-commerce advertising use cases, where, for example, a character can wear a branded shirt and the logo remains sharp and readable throughout the video.

The model can retain or generate text—such as signage, captions, and branded elements—with high accuracy. This is particularly valuable for e-commerce advertising use cases, where, for example, a character can wear a branded shirt and the logo remains sharp and readable throughout the video. Photorealistic Output: Video 3.0 can produce photorealistic output with lifelike characters in expressive, dynamic performances for heightened realism.

Video 3.0 Omni: Advanced Storyboarding and Reference Control

Building on the "Elements" feature from Kling Video O1, Video 3.0 Omni offers advanced reference‑based generation for unmatched consistency. Creators can upload a reference video, enabling the AI to extract visual traits and voice characteristics of a character and replicate them faithfully across new scenes.

The new Video 3.0 Omni model also rolls out a multi-shot storyboard feature that allows uesrs to generate professional shots where they can specify the duration, shot size, perspective, narrative content and camera movements for each shot in storyboarding.

Image 3.0 Omni: Ultra-High-Resolution Visuals with Cinematic Realism

Alongside its video updates, Kling AI is introducing Image 3.0 and Image 3.0 Omni, now supporting 2K and 4K ultra-high-definition output for professional use cases, from virtual scene visualization to full-scale production assets. The model delivers exceptional realism, preserving textures, lighting, and material qualities with remarkable precision and consistency.

Built on the foundation of the newly introduced Kling O1 and 2.6 series, the Kling AI 3.0 model lineup embodies the Multi‑modal Visual Language (MVL) framework, marking a decisive evolution from basic video generation to sophisticated professional orchestration. It delivers tangible advances in narrative precision, output quality and cinematic control.

Since its launch in June 2024, Kling AI now serves over 60 million creators worldwide. To date, it has produced more than 600 million videos and forged partnerships with more than 30,000 enterprise clients. Its adoption spans the film and advertising industries, accelerating the visualization of storyboards and product concepts, and enhancing production workflows from animation and CGI to the creation of entirely new visual assets.

The debut of Kling 3.0 signals a fundamental shift in AI's role—from a mere generation tool to an intelligent creative partner capable of grasping artistic intent and turning ideas into reality—ushering in an era where anyone can turn their ideas into films.

