
PALO ALTO, Calif., Dec. 9, 2025 /PRNewswire/ -- Today, EPRI released first-of-its-kind, domain-specific benchmarking results for the electric power sector. This initial application included multiple-choice and open-ended questions rooted in real-world utility topics, providing a more realistic view of how large language models (LLM) perform. Results indicate expert oversight remains imperative, especially with open-ended questions, which could result in less than 50% accuracy in some cases.
Many existing benchmarks assess broad academic knowledge, such as math, science, and coding, and may not capture the operational and contextual complexity of real-world utility environments. Benchmarking with electric power-specific questions, such as generation and transmission and distribution asset-related inquiries, helps assess how well LLMs understand and respond to technical, regulatory, and operational questions that utilities face.
"As utilities integrate AI into power system planning and operations, this benchmarking establishes a critical foundation for evaluating domain-specific tools and models. Accuracy is paramount, as errors can lead to significant operational and reliability consequences," said EPRI Vice President of AI Transformation and Chief AI Officer Remi Raphael. "Independent benchmarking by EPRI ensures the utility industry can trust and act on unbiased, credible insight."
Key takeaways from EPRI's initial benchmarking report included:
- Open-ended questions exposed a reliability gap. When the same questions were asked in open-ended form instead of multiple-choice questions (MCQs), average accuracy dropped on average by 27 percentage points. On expert-level questions, top models only scored between 46–71%.
- MCQs provide a strong but incomplete baseline. On EPRI's MCQs, leading frontier models scored 83–86%, broadly consistent with their performance on external math and science benchmarks, but these scores benefit from the structure of MCQs.
- Open-weight models are closing the gap. These are LLMs whose trained parameters — known as weights — are publicly available. While typically one generation behind proprietary frontier systems, they are rapidly improving. Their ability to be self-hosted can give utilities valuable deployment flexibility.
- Web search modestly improves accuracy. Allowing models to search the web boosted scores slightly (2–4%), while also introducing the risk of retrieving irrelevant or misleading information.
EPRI utilized a dataset comprising more than 2,100 questions and answers, generated by 94 power sector experts, drawing from publicly available sources, including the institute's reports covering 35 power sector topics. The benchmarking used three phases to test capabilities, with reproducibility on multiple LLMs, including GPT-5, Grok 4, and Gemini 2.5 Pro. Phase 1 measured model knowledge through multiple-choice questions, phase two repeated tests with web search, and phase three assessed open-ended responses using both knowledge and search. Each phase included three runs per model, with confidence intervals reported to capture variability.
The effort stems from EPRI's Open Power AI Consortium, launched earlier this year to drive the development and deployment of AI approaches tailored for the power sector, including future domain-augmented tools.
Future phases of EPRI's benchmarking effort will build on this foundation by evaluating domain-augmented tools and models and expanding beyond generic tests into real utility applications.
The full report is available here: Benchmarking Large Language Models for the Electric Power Sector and an interactive site is available here: WattWorks: The Power Sector's AI Benchmarking Hub.
Contact:
Rachel Gantz
Senior Manager of Corporate Media Relations
202-293-7517
[email protected]
About EPRI
Founded in 1972, EPRI is the world's preeminent independent, non-profit energy research and development organization, with offices around the world. EPRI's trusted experts collaborate with more than 450 companies in 45 countries, driving innovation to ensure the public has clean, safe, reliable, and affordable access to electricity across the globe. Together...shaping the future of energy.®
SOURCE EPRI
Share this article