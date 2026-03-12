Gimlet Cloud, built for running agentic AI inference, to deploy d-Matrix Corsair low latency, memory-optimized accelerators alongside GPUs

10x performance benefits in latency and throughput per Watt compared to GPU-only approach

Job-division between GPUs and d-Matrix accelerators enables faster interactivity, massive power savings

SAN FRANCISCO, March 12, 2026 /PRNewswire/ -- d-Matrix, a pioneer in low latency AI inference compute for data centers, and Gimlet Labs, an applied AI research and product company, today announced that Gimlet is incorporating d-Matrix CorsairTM accelerators into the Gimlet Cloud alongside traditional GPUs to deliver 10x speed ups for agentic AI inference workloads.

d-Matrix and Gimlet's combined solution can deliver order-of-magnitude performance increases on both inference latency and throughput per Watt compared to traditional GPU-only deployments. The solution is ideal for latency-sensitive workloads including speculative decoding, which is commonly adopted by large-scale AI deployments to reduce latency.

With d-Matrix Corsair accelerators on Gimlet's Cloud, workloads already well-optimized for agentic AI can achieve even greater performance gains, enabling token delivery speeds that enable industry-leading levels of interactivity required for today's most critical applications.

"Model providers are spending billions on inference, and the demand for fast tokens is higher than ever - but power remains a scarce resource," said Zain Asgar, founder and CEO of Gimlet Labs. "d-Matrix hardware is the ideal solution for the phases of inference that GPUs waste energy on. By leveraging Corsair for use cases like speculative decoding, we can deliver dramatically faster performance for our customers for the same footprint."

"From day one, d-Matrix has been uniquely focused on inference, founded on our belief that inference would not be a one-size-fits-all compute problem. As the only multi-silicon inference cloud, Gimlet is leading the industry with a fundamental new approach that delivers dramatic leaps forward in performance that homogeneous infrastructure simply cannot deliver," said Sid Sheth, founder and CEO of d-Matrix. "With power limits capping how fast AI can advance, it's imperative that AI service providers have the right tools for the right job and that we embrace doing more with less."

Gimlet's software stack is the first to intelligently divide and map agentic workloads across a variety of accelerators spanning multiple vendors, generations and architectures and runs each segment on the most optimal hardware. Gimlet's datacenters incorporate these different hardware types and connect them via high-speed interconnects to serve frontier labs and other AI native companies.

d-Matrix Corsair's unique memory-optimized architecture delivers high memory bandwidth and low latency, making it ideal for running memory-bound portions of the AI model. Corsair ships as a standard PCIe card with air cooling, which enables rapid deployments in existing data centers.

The companies plan to make their combined solution available to select customers through Gimlet Cloud in 2H 2026. To learn more, see the technical writeup: https://gimletlabs.ai/blog/low-latency-spec-decode-corsair. Customers can request early access here https://d-matrix.ai/gimletlabs.

About d-Matrix

d-Matrix is pioneering accelerated computing for AI inference, breaking through the limits of latency, cost and energy. Its Corsair compute accelerators, JetStream IO accelerators, and Aviator software deliver fast, sustainable AI inference at data center scale. Learn more at https://d-matrix.ai/.

About Gimlet Labs

Gimlet Labs' mission is to drive breakthrough improvements in AI efficiency that result in massive increases of compute available for AI workloads. For more information, simply visit: https://gimletlabs.ai/.

SOURCE d-Matrix