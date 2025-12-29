XPENG-PKU Research Breakthrough: XPENG, in collaboration with Peking University, has developed FastDriveVLA—a novel visual token pruning framework that enables autonomous driving AI to "drive like a human" by focusing only on essential information, achieving a 7.5x reduction in computational load.

GUANGZHOU, China, Dec. 28, 2025 /PRNewswire/ -- XPENG, in collaboration with Peking University, has had its paper "FastDriveVLA: Efficient End-to-End Driving via Plug-and-Play Reconstruction-based Token Pruning" accepted by AAAI 2026, one of the world's top conferences in artificial intelligence. AAAI 2026 received 23,680 submissions, with only 4,167 papers accepted, an acceptance rate of just 17.6%.

The paper introduces FastDriveVLA, an efficient visual token pruning framework specifically designed for end-to-end autonomous driving Vision-Language-Action (VLA) models. This work offers a new approach to visual token pruning by enabling AI to "drive like a human", focusing only on essential visual information while filtering out irrelevant data.

As AI large models evolve rapidly, VLA models are being widely adopted in end-to-end autonomous driving systems due to their strong capabilities in complex scene understanding and action reasoning. These models encode images into large numbers of visual tokens, which serve as the foundation for the model to "see" the world and make driving decisions. However, processing large numbers of tokens increases computational load onboard the vehicle, impacting inference speed and real-time performance.

While visual token pruning has been recognized as a viable method to accelerate VLA inference, existing approaches, whether based on text-visual attention or token similarity, have shown limitations in driving scenarios. To address this, XPENG and PKU developed FastDriveVLA, a novel reconstruction-based token pruning framework inspired by how human drivers focus on relevant foreground information while ignoring non-critical background areas.

The method introduces an adversarial foreground-background reconstruction strategy that enhances the model's ability to identify and retain valuable tokens. On the nuScenes autonomous driving benchmark, FastDriveVLA achieved state-of-the-art performance across various pruning ratios. When the number of visual tokens was reduced from 3,249 to 812, the framework achieved a nearly 7.5x reduction in computational load while maintaining high planning accuracy.

This is the second time this year that XPENG has been recognized at top-tier global AI conference. In June, XPENG was the only Chinese automaker invited to speak at CVPR WAD, where it shared advances in autonomous driving foundation models. At its AI Day in November, XPENG unveiled VLA 2.0 architecture, which removes the "language translation" step and enables direct Visual-to-Action generation, a breakthrough that redefines the conventional V-L-A pipeline.

These accomplishments reflect XPENG's full-stack in-house capabilities, from model architecture design and training to distillation and vehicle deployment. Looking ahead, XPENG remains committed to achieving L4 level autonomous driving to accelerate the integration of physical AI systems into vehicles, with the goal of delivering safe, efficient, and comfortable intelligent driving experiences to users around the world.

