
AINGENS Launches Clinical AI Reliability Test Showing Zero Hallucinations
AINGENS has released findings from a pilot clinical reliability assessment of MACg, demonstrating zero hallucinations and 100% accuracy across 75 questions spanning five clinical trial publications. The results show that well-designed, evidence-first workflows can substantially reduce hallucination risk when using AI to extract and summarize clinical trial data for medical and scientific content.
COLUMBIA, Md., March 2, 2026 /PRNewswire/ -- AI-generated content errors aren't disappearing. Courts have sanctioned attorneys for filing briefs with fabricated case citations produced by generative AI. Healthcare professionals risk patient harm when using unverified AI recommendations. Medical writers face compliance exposure when AI outputs lack traceable sourcing. "The issue isn't whether AI hallucinations exist. It's whether your workflow is designed to control them," said Ome Ogbru, PharmD, CEO and Founder of AINGENS.
The life sciences software company has released new findings from a pilot clinical reliability assessment for its flagship platform, MACg (Medical Affairs Content Generator) showing that workflow and system design are major drivers of hallucination risk, rather than hallucinations being an unavoidable feature of AI. In this 75-question evaluation of clinical trial data extraction tasks, MACg showed zero hallucinations and 100% accuracy for requested numerical and categorical data elements across five peer-reviewed clinical trial publications. The findings suggest that well-designed, document-grounded workflows can substantially reduce hallucination risk in structured clinical data extraction tasks.
"The models have improved, and how you design the workflow matters enormously", points out Dr. Ogbru. "In this evaluation, when MACg was used as designed and anchored to uploaded trial PDFs, it matched the clinician's manual reading for all requested data points. It is intended as a co-pilot, with human experts still responsible for final review and interpretation."
A Pragmatic Pilot Evaluation of Clinical Trial Extraction
The assessment was designed to reflect a realistic medical writing use case and to evaluate MACg's performance in a structured, document-grounded setting.
- Five peer-reviewed clinical publications were uploaded, including phase 3 trials and one systematic review.
- Fifteen predefined, structured questions were asked per study, for a total of 75 questions, covering trial design, endpoints, efficacy, safety, limitations, and author interpretation.
- Questions were asked one at a time and prompts did not include special instructions to "avoid hallucinations" or "use only attached documents," simulating real-world medical writing conditions.
A single PharmD reviewer compared each of the 75 responses directly with the source PDFs and evaluated hallucination frequency, factual accuracy, contextual understanding, and appropriate use of general medical knowledge. Across all responses, MACg produced zero hallucinations, correctly extracted all requested numerical data including sample sizes, effect estimates, confidence intervals, and p-values. When trial-specific information was missing or only partially reported, MACg did not invent numbers and instead acknowledged limitations or stayed qualitative rather than guessing. This was a single-system, single-rater pilot evaluation in a document-grounded, non-adversarial setting.
Hallucinations Persist in Healthcare, Legal, and Scientific Settings
AI-generated errors carry real consequences. Physicians and pharmacists have used unverified AI recommendations in patient care, creating safety risks. Attorneys have submitted legal briefs with fabricated citations. One case reached the Supreme Court. Generic, open-ended models operating without structured oversight or source verification are known to have higher hallucination risk. Evidence-first platforms like MACg reduce it.
AINGENS' clinical reliability test demonstrates measurable proof. In a documented, evidence‑first configuration, showed 0/75 hallucinations, 100% accuracy, and full source alignment. The platform achieved this by embedding four core design principles aligned with the study findings:
- Document-grounded reasoning: MACg prioritizes uploaded PDFs and PubMed search results for trial‑specific questions and, in this evaluation, consistently grounded its answers in those sources.
- Conservative handling of missing information: When information is incomplete, MACg acknowledges limitations, or stays qualitative rather than fabricating trial-specific values.
- Transparent support for claim: In many cases, MACg surfaces verbatim excerpts and inline references, allowing the evaluator to trace claims back to specific passages.
- Contextual integration: The platform links information across methods, results, tables, and figures without importing irrelevant data from other documents.
Reliability Data Released to Shape Credibility Standards in Medical AI
AINGENS has published the methodology and results of this pilot to increase transparency around AI reliability in medical and scientific communication. The company plans to expand its testing program with larger and more diverse trial sets, multiple independent reviewers, and a more challenging and ambiguous prompt style. The goal is to help establish clearer benchmarks and governance standards for how AI should be evaluated and deployed in evidence‑critical workflows.
"We released this data because the industry needs proof, not promises. Hallucinations are a symptom of design, but if you anchor every output to evidence, you mitigate the risk. Our test proves that it works," concludes Dr. Ogbru.
About AINGENS
AINGENS is a life sciences software company transforming how scientific and medical content is created in regulated healthcare environments. Founded by Ome Ogbru, PharmD, with more than 20 years of experience in pharma and biotech, the company combines deep life sciences expertise with advanced technologies to build integrated AI‑powered platforms that streamline some of the most time‑consuming steps in scientific, clinical, and medical workflows.
Its flagship platform, MACg (Medical Affairs Content Generator), is an end-to-end, evidence-based workspace that integrates real-time PubMed search, document-grounded reasoning, automated citation generation, drafting, slide generation and collaboration in a private, secure environment. By embedding traceability and source alignment directly into the workflow, AINGENS helps medical affairs and medical writing teams accelerate content creation without compromising scientific rigor or regulatory integrity. Learn more at https://macg.ai.
References:
Ogbru, O. (2026). MACg's hallucination, accuracy, and contextual understanding. AINGENS. aingens.com/resources-and-news/reliability-of-ma-cg-for-source-aligned-clinical-trial-data-extraction-hallucination-accuracy-and-contextual-understanding
Minuskin, E. (2025, October 8). EY survey: Companies advancing responsible AI governance linked to better business outcomes . Ey.com; EY. ey.com/en_gl/newsroom/2025/10/ey-survey-companies-advancing-responsible-ai-governance-linked-to-better-business-outcomes
Merken, S. (2026, February 3). Judge fines lawyers $12,000 over AI-generated submissions in patent case. Reuters. reuters.com/legal/litigation/judge-fines-lawyers-12000-over-ai-generated-submissions-patent-case-2026-02-03/
Hollenbeck, S. (2025, September 30). Heavy AI Users Face 3x More Hallucinations and Spend 10x Longer to Get Answers. Rev.com; Rev. rev.com/blog/ai-results
Metz, C., & Weise, K. (2025, May 5). A.I. Hallucinations Are Getting Worse, Even as New Systems Become More Powerful. The New York Times. nytimes.com/2025/05/05/technology/ai-hallucinations-chatgpt-google.html
Sankaran, A. (2025, December 18). The Hallucination Tax: Generative AI's Accuracy Problem. Forbes. forbes.com/councils/forbesbusinesscouncil/2025/12/18/the-hallucination-tax-generative-ais-accuracy-problem/
Media Inquiries:
Karla Jo Helms
JOTO PR™
727-777-4629
Jotopr.com
SOURCE AINGENS
Share this article