Accessibility Statement Skip Navigation
  • Resources
  • Investor Relations
  • Journalists
  • Agencies
  • Client Login
  • Send a Release
Return to PR Newswire homepage
  • News
  • Products
  • Contact
When typing in this field, a list of search results will appear and be automatically updated as you type.

Searching for your content...

No results found. Please change your search terms and try again.
  • News in Focus
      • Browse News Releases

      • All News Releases
      • All Public Company
      • English-only
      • News Releases Overview

      • Multimedia Gallery

      • All Multimedia
      • All Photos
      • All Videos
      • Multimedia Gallery Overview

      • Trending Topics

      • All Trending Topics
  • Business & Money
      • Auto & Transportation

      • All Automotive & Transportation
      • Aerospace, Defense
      • Air Freight
      • Airlines & Aviation
      • Automotive
      • Maritime & Shipbuilding
      • Railroads and Intermodal Transportation
      • Supply Chain/Logistics
      • Transportation, Trucking & Railroad
      • Travel
      • Trucking and Road Transportation
      • Auto & Transportation Overview

      • View All Auto & Transportation

      • Business Technology

      • All Business Technology
      • Blockchain
      • Broadcast Tech
      • Computer & Electronics
      • Computer Hardware
      • Computer Software
      • Data Analytics
      • Electronic Commerce
      • Electronic Components
      • Electronic Design Automation
      • Financial Technology
      • High Tech Security
      • Internet Technology
      • Nanotechnology
      • Networks
      • Peripherals
      • Semiconductors
      • Business Technology Overview

      • View All Business Technology

      • Entertain­ment & Media

      • All Entertain­ment & Media
      • Advertising
      • Art
      • Books
      • Entertainment
      • Film and Motion Picture
      • Magazines
      • Music
      • Publishing & Information Services
      • Radio & Podcast
      • Television
      • Entertain­ment & Media Overview

      • View All Entertain­ment & Media

      • Financial Services & Investing

      • All Financial Services & Investing
      • Accounting News & Issues
      • Acquisitions, Mergers and Takeovers
      • Banking & Financial Services
      • Bankruptcy
      • Bond & Stock Ratings
      • Conference Call Announcements
      • Contracts
      • Cryptocurrency
      • Dividends
      • Earnings
      • Earnings Forecasts & Projections
      • Financing Agreements
      • Insurance
      • Investments Opinions
      • Joint Ventures
      • Mutual Funds
      • Private Placement
      • Real Estate
      • Restructuring & Recapitalization
      • Sales Reports
      • Shareholder Activism
      • Shareholder Meetings
      • Stock Offering
      • Stock Split
      • Venture Capital
      • Financial Services & Investing Overview

      • View All Financial Services & Investing

      • General Business

      • All General Business
      • Awards
      • Commercial Real Estate
      • Corporate Expansion
      • Earnings
      • Environmental, Social and Governance (ESG)
      • Human Resource & Workforce Management
      • Licensing
      • New Products & Services
      • Obituaries
      • Outsourcing Businesses
      • Overseas Real Estate (non-US)
      • Personnel Announcements
      • Real Estate Transactions
      • Residential Real Estate
      • Small Business Services
      • Socially Responsible Investing
      • Surveys, Polls and Research
      • Trade Show News
      • General Business Overview

      • View All General Business

  • Science & Tech
      • Consumer Technology

      • All Consumer Technology
      • Artificial Intelligence
      • Blockchain
      • Cloud Computing/Internet of Things
      • Computer Electronics
      • Computer Hardware
      • Computer Software
      • Consumer Electronics
      • Cryptocurrency
      • Data Analytics
      • Electronic Commerce
      • Electronic Gaming
      • Financial Technology
      • Mobile Entertainment
      • Multimedia & Internet
      • Peripherals
      • Social Media
      • STEM (Science, Tech, Engineering, Math)
      • Supply Chain/Logistics
      • Wireless Communications
      • Consumer Technology Overview

      • View All Consumer Technology

      • Energy & Natural Resources

      • All Energy
      • Alternative Energies
      • Chemical
      • Electrical Utilities
      • Gas
      • General Manufacturing
      • Mining
      • Mining & Metals
      • Oil & Energy
      • Oil and Gas Discoveries
      • Utilities
      • Water Utilities
      • Energy & Natural Resources Overview

      • View All Energy & Natural Resources

      • Environ­ment

      • All Environ­ment
      • Conservation & Recycling
      • Environmental Issues
      • Environmental Policy
      • Environmental Products & Services
      • Green Technology
      • Natural Disasters
      • Environ­ment Overview

      • View All Environ­ment

      • Heavy Industry & Manufacturing

      • All Heavy Industry & Manufacturing
      • Aerospace & Defense
      • Agriculture
      • Chemical
      • Construction & Building
      • General Manufacturing
      • HVAC (Heating, Ventilation and Air-Conditioning)
      • Machinery
      • Machine Tools, Metalworking and Metallurgy
      • Mining
      • Mining & Metals
      • Paper, Forest Products & Containers
      • Precious Metals
      • Textiles
      • Tobacco
      • Heavy Industry & Manufacturing Overview

      • View All Heavy Industry & Manufacturing

      • Telecomm­unications

      • All Telecomm­unications
      • Carriers and Services
      • Mobile Entertainment
      • Networks
      • Peripherals
      • Telecommunications Equipment
      • Telecommunications Industry
      • VoIP (Voice over Internet Protocol)
      • Wireless Communications
      • Telecomm­unications Overview

      • View All Telecomm­unications

  • Lifestyle & Health
      • Consumer Products & Retail

      • All Consumer Products & Retail
      • Animals & Pets
      • Beers, Wines and Spirits
      • Beverages
      • Bridal Services
      • Cannabis
      • Cosmetics and Personal Care
      • Fashion
      • Food & Beverages
      • Furniture and Furnishings
      • Home Improvement
      • Household, Consumer & Cosmetics
      • Household Products
      • Jewelry
      • Non-Alcoholic Beverages
      • Office Products
      • Organic Food
      • Product Recalls
      • Restaurants
      • Retail
      • Supermarkets
      • Toys
      • Consumer Products & Retail Overview

      • View All Consumer Products & Retail

      • Entertain­ment & Media

      • All Entertain­ment & Media
      • Advertising
      • Art
      • Books
      • Entertainment
      • Film and Motion Picture
      • Magazines
      • Music
      • Publishing & Information Services
      • Radio & Podcast
      • Television
      • Entertain­ment & Media Overview

      • View All Entertain­ment & Media

      • Health

      • All Health
      • Biometrics
      • Biotechnology
      • Clinical Trials & Medical Discoveries
      • Dentistry
      • FDA Approval
      • Fitness/Wellness
      • Health Care & Hospitals
      • Health Insurance
      • Infection Control
      • International Medical Approval
      • Medical Equipment
      • Medical Pharmaceuticals
      • Mental Health
      • Pharmaceuticals
      • Supplementary Medicine
      • Health Overview

      • View All Health

      • Sports

      • All Sports
      • General Sports
      • Outdoors, Camping & Hiking
      • Sporting Events
      • Sports Equipment & Accessories
      • Sports Overview

      • View All Sports

      • Travel

      • All Travel
      • Amusement Parks and Tourist Attractions
      • Gambling & Casinos
      • Hotels and Resorts
      • Leisure & Tourism
      • Outdoors, Camping & Hiking
      • Passenger Aviation
      • Travel Industry
      • Travel Overview

      • View All Travel

  • Policy & Public Interest
      • Policy & Public Interest

      • All Policy & Public Interest
      • Advocacy Group Opinion
      • Animal Welfare
      • Congressional & Presidential Campaigns
      • Corporate Social Responsibility
      • Domestic Policy
      • Economic News, Trends, Analysis
      • Education
      • Environmental
      • European Government
      • FDA Approval
      • Federal and State Legislation
      • Federal Executive Branch & Agency
      • Foreign Policy & International Affairs
      • Homeland Security
      • Labor & Union
      • Legal Issues
      • Natural Disasters
      • Not For Profit
      • Patent Law
      • Public Safety
      • Trade Policy
      • U.S. State Policy
      • Policy & Public Interest Overview

      • View All Policy & Public Interest

  • People & Culture
      • People & Culture

      • All People & Culture
      • Aboriginal, First Nations & Native American
      • African American
      • Asian American
      • Children
      • Diversity, Equity & Inclusion
      • Hispanic
      • Lesbian, Gay & Bisexual
      • Men's Interest
      • People with Disabilities
      • Religion
      • Senior Citizens
      • Veterans
      • Women
      • People & Culture Overview

      • View All People & Culture

      • In-Language News

      • Arabic
      • español
      • português
      • Česko
      • Danmark
      • Deutschland
      • España
      • France
      • Italia
      • Nederland
      • Norge
      • Polska
      • Portugal
      • Россия
      • Slovensko
      • Suomi
      • Sverige
  • Explore Our Platform
  • Plan Campaigns
  • Create with AI
  • Distribute Press Releases
  • Amplify Content
  • All Products
  • General Inquiries
  • Editorial Bureaus
  • Partnerships
  • Media Inquiries
  • Worldwide Offices
  • Hamburger menu
  • PR Newswire: news distribution, targeting and monitoring
  • Send a Release
    • ALL CONTACT INFO
    • Contact Us

      888-776-0942
      from 8 AM - 10 PM ET

  • Send a Release
  • Client Login
  • Resources
  • Blog
  • Journalists
  • RSS
  • News in Focus
    • Browse All News
    • Multimedia Gallery
    • Trending Topics
  • Business & Money
    • Auto & Transportation
    • Business Technology
    • Entertain­ment & Media
    • Financial Services & Investing
    • General Business
  • Science & Tech
    • Consumer Technology
    • Energy & Natural Resources
    • Environ­ment
    • Heavy Industry & Manufacturing
    • Telecomm­unications
  • Lifestyle & Health
    • Consumer Products & Retail
    • Entertain­ment & Media
    • Health
    • Sports
    • Travel
  • Policy & Public Interest
  • People & Culture
    • People & Culture
  • Send a Release
  • Client Login
  • Resources
  • Blog
  • Journalists
  • RSS
  • Explore Our Platform
  • Plan Campaigns
  • Create with AI
  • Distribute Press Releases
  • Amplify Content
  • All Products
  • Send a Release
  • Client Login
  • Resources
  • Blog
  • Journalists
  • RSS
  • General Inquiries
  • Editorial Bureaus
  • Partnerships
  • Media Inquiries
  • Worldwide Offices
  • Send a Release
  • Client Login
  • Resources
  • Blog
  • Journalists
  • RSS

Sup AI Sets New Benchmark Record with 52.15% on Humanity's Last Exam


News provided by

Sup AI

Dec 10, 2025, 10:12 ET

Share this article

Share toX

Share this article

Share toX

Surpasses every individual frontier model on the world's hardest open-source AI reasoning test

Important Disclosure: This is an independent evaluation conducted by Sup AI and is not officially endorsed, validated, or recognized by the Center for AI Safety, Scale AI, or the HLE benchmark creators. Sup AI is not affiliated with CAIS or Scale AI.

PALO ALTO, Calif., Dec. 10, 2025 /PRNewswire/ -- Sup AI announced today that its multi-model orchestration system has achieved 52.15% accuracy on Humanity's Last Exam (HLE), the most challenging publicly available benchmark for advanced AI reasoning. This performance establishes Sup AI as the new state-of-the-art (SOTA), outperforming all individual frontier models including Google's Gemini 3 Pro Preview, OpenAI's GPT-5 Pro and GPT-5.1, Anthropic's Claude Opus 4.5, and xAI's Grok-4.

Continue Reading
Model Accuracy on HLE Benchmark
Model Accuracy on HLE Benchmark

HLE is designed to resist saturation as AI capabilities improve, blending advanced mathematics, scientific reasoning, and logic into 2,500 expert-crafted questions. Crossing 50% accuracy on this benchmark marks a significant milestone in the progression of general AI reasoning capabilities.

Note: All models evaluated, including Sup AI, use enhanced evaluation settings such as custom instructions, web search, and low-confidence retries. These settings raise every model's score relative to published benchmarks, but relative rankings remain stable, and Sup AI maintains a clear lead.

Sup AI's Results at a Glance

Metric

Value

Accuracy

52.15 %

Questions Evaluated

1,369

Lead Over Next Best Model

+7.49 points

ECE (stated-confidence calibration)

35.22 %

Sup AI's score is statistically significant at p < 0.001, with a 95% confidence interval of ±2.65 percentage points.

A Clear Lead Over Frontier Models

Sup AI's ensemble system consistently outperformed every major standalone model:

Model

Accuracy

Sup AI

52.15 %

Gemini 3 Pro Preview

44.66 %

GPT-5 Pro

39.43 %

GPT-5.1

38.18 %

Claude Opus 4.5

29.56 %

DeepSeek v3.2 Thinking

24.08 %

This margin underscores a core principle: an orchestrated ensemble, when engineered properly, can outperform its strongest component models by a wide margin.

Why Sup AI Wins: Ensemble Intelligence

Sup AI dynamically routes each question to a set of frontier models most suited to the problem, analyzes probability distributions across their outputs, and synthesizes an answer weighted by confidence, specialization, and inter-model agreement. If confidence is insufficient or models disagree meaningfully, Sup AI automatically retries.

Sup AI also enables multimodal handling even for models that lack native support, pre-processing images or PDFs when required.

The result is not a simple vote. Instead, it's a structured, confidence-weighted synthesis that consistently outperforms every individual model.

About Humanity's Last Exam

Humanity's Last Exam (HLE) is a high-difficulty benchmark developed by independent researchers to evaluate deep reasoning, mathematical problem-solving, scientific understanding, and multi-step logic. With 2,500 public questions and no "trivial" shortcuts, HLE remains one of the few benchmarks where frontier models do not cluster near human-expert-level performance.

Sup AI evaluated 1,369 randomly selected questions, using standard Sup AI chat settings identical to what any user would access through the platform.

Leadership Perspective

"Crossing 50% on HLE isn't about luck. It's about architecture," said Ken Mueller, CEO of Sup AI. "No single model dominates every domain, but an orchestrated system that understands when to trust, when to weight, and when to retry can. Sup AI shows that careful ensemble engineering can push beyond the ceiling of any standalone model."

Evaluation Methodology

  • Each question (text and image) was submitted to the Sup AI API using the platform's normal system prompt.
  • Responses were structured into explanation, answer, and a self-reported confidence score.
  • GPT-5.1 served as the automated judge, evaluating answer correctness via strict extraction, semantic equivalence, and numerical-tolerance matching.
  • Accuracy and calibration metrics were computed using established statistical estimators.

Sup AI's complete evaluation code, predictions, judged outputs, metrics, and per-question results are publicly available for full reproducibility.

Significance

Sup AI's performance demonstrates:

  1. It is possible to meaningfully surpass top individual models with an ensemble system.
  2. Specialization matters — different models dominate different domains; orchestration captures these strengths.
  3. Benchmarks like HLE remain valuable, showing substantial headroom even as AI surpasses 50% accuracy.
  4. These capabilities are available today through Sup AI's production API and chat interface.

Availability

The full evaluation, code, and results can be accessed at:

GitHub Repository:

https://github.com/supaihq/hle

Sup AI Platform:

https://sup.ai

HLE Benchmark:

https://lastexam.ai

Sup AI invites researchers, engineers, and enterprise teams to independently reproduce the results and explore the platform's orchestration capabilities.

Citation
@misc{supai-hle-2025,
  title={Sup AI Achieves 52.15% on Humanity's Last Exam},
  author={Sup AI},
  year={2025},
  url={https://github.com/supaihq/hle}
}

For press inquiries or partnership discussions, please contact: [email protected]

SOURCE Sup AI

21%

more press release views with 
Request a Demo

Modal title

Also from this source

Sup AI Integrates DeepSeek Model into Multi-LLM Synthesis

Sup AI Integrates DeepSeek Model into Multi-LLM Synthesis

Sup AI, a leader in artificial intelligence innovation, proudly announces the integration of the DeepSeek model into its Multi-LLM platform. This...

More Releases From This Source

Explore

Computer & Electronics

Computer & Electronics

Computer Software

Computer Software

Computer Software

Computer Software

Artificial Intelligence

Artificial Intelligence

News Releases in Similar Topics

Contact PR Newswire

  • Call PR Newswire at 888-776-0942
    from 8 AM - 9 PM ET
  • Chat with an Expert
  • General Inquiries
  • Editorial Bureaus
  • Partnerships
  • Media Inquiries
  • Worldwide Offices

Products

  • For Marketers
  • For Public Relations
  • For IR & Compliance
  • For Agency
  • All Products

About

  • About PR Newswire
  • About Cision
  • Become a Publishing Partner
  • Become a Channel Partner
  • Careers
  • Accessibility Statement
  • APAC
  • APAC - Simplified Chinese
  • APAC - Traditional Chinese
  • Brazil
  • Canada
  • Czech
  • Denmark
  • Finland
  • France
  • Germany
  • India
  • Indonesia
  • Israel
  • Italy
  • Japan
  • Korea
  • Mexico
  • Middle East
  • Middle East - Arabic
  • Netherlands
  • Norway
  • Poland
  • Portugal
  • Russia
  • Slovakia
  • Spain
  • Sweden
  • United Kingdom
  • Vietnam

My Services

  • All New Releases
  • Platform Login
  • ProfNet
  • Data Privacy

Do not sell or share my personal information:

  • Submit via [email protected] 
  • Call Privacy toll-free: 877-297-8921

Contact PR Newswire

Products

About

My Services
  • All News Releases
  • Platform Login
  • ProfNet
Call PR Newswire at
888-776-0942
  • Terms of Use
  • Privacy Policy
  • Information Security Policy
  • Site Map
  • RSS
  • Cookies
Copyright © 2025 Cision US Inc.