SAN FRANCISCO, Oct. 1, 2025 /PRNewswire/ -- Runloop.ai, the leading enterprise infrastructure platform for AI agents, today announced the launch of its Custom Benchmarks product. The new offering enables organizations to create highly specialized, private benchmarks that accurately measure and refine AI agents on their unique, proprietary codebases and business logic. To highlight the product's broad applications and strategic value, Runloop.ai is collaborating with Fermatix.ai, a specialist in full-cycle data generation, on a landmark pilot program.
The explosion of AI agents has created a critical need for rigorous and relevant evaluation and functional training. While public benchmarks are crucial for general model evaluation, they often fail to capture the specific requirements of AI agents or the validation needs of enterprises. Runloop.ai's Custom Benchmarks solve this problem by providing a secure, scalable platform for companies to build benchmarks that test against their own internal business logic, tech stacks, and performance metrics.
Key features of Runloop.ai's Custom Benchmarks product include:
- Private benchmarking: Securely test AI agents on proprietary code without exposing intellectual property.
- Accurate performance evaluation: Measure agent effectiveness in real-world, business-specific conditions.
- Scalable infrastructure: A reliable and isolated environment for running thousands of tests simultaneously.
- Strategic model refinement: Obtain data for targeted improvement and retraining of AI agents for specific tasks.
"As AI agents move from prototypes to production, the benchmarks we use to evaluate them must evolve from generic tests to strategic assets," said Jonathan Wall, CEO of Runloop.ai. "Our new Custom Benchmarks product empowers enterprises to define what 'good' looks like for their unique business, enabling them to fine-tune and trust their AI agents in real-world scenarios. The pilot with Fermatix.ai is the perfect example of this in action, demonstrating the value of this approach in the most demanding environments."
Fermatix.ai , a company known for creating expert-level training data tailored to industry-critical tasks and highly specialized domains, with annotators who are practicing industry experts, brings the perfect expertise for this pilot. By leveraging Runloop.ai's infrastructure, Fermatix.ai is strategically expanding its capabilities to offer custom, in-house verification for its clients. The collaboration allows Fermatix.ai to move beyond its current offerings and provide a new level of assurance by creating benchmarks tailored to specific enterprise needs. This pilot program will demonstrate how Fermatix.ai's expertise in data engineering and expert-level annotation can be applied to create high-fidelity, multilingual benchmarks on Runloop.ai's platform.
"At Fermatix.ai, we've built our reputation on creating expert-level training data with practicing industry professionals as annotators," said Sergey Anchutin, CEO and Founder of Fermatix.ai. "This partnership with Runloop.ai represents a strategic evolution—moving beyond one-time data labeling to creating reusable benchmarks that deliver ongoing value to our clients. By leveraging our domain expertise and Runloop's infrastructure, we're not just providing data anymore; we're building the testing standards that will define how enterprises evaluate their AI agents across industry-critical tasks."
The Custom Benchmarks product is now available to all Runloop.ai Pro clients, with early results from the Fermatix.ai pilot program expected to be shared in the coming months.
About Runloop.ai
Runloop provides infrastructure and tooling for building, testing, refining, and deploying AI agents at scale. Founded by engineers with deep experience in building large-scale systems, Runloop provides secure, isolated environments, rich developer tooling, and a suite of benchmarking capabilities that help companies deploy and manage AI agents with confidence.
Media contact:
Michelle Faulkner
Big Swing
617-510-6998
[email protected]
https://www.linkedin.com/company/runloopai https://x.com/runloopdev https://github.com/runloopai
SOURCE Runloop.ai

WANT YOUR COMPANY'S NEWS FEATURED ON PRNEWSWIRE.COM?

Newsrooms &
Influencers

Digital Media
Outlets

Journalists
Opted In
Share this article