
Yonyou AI Lab Introduces Framework to Make Enterprise AI Decisions Traceable and Auditable
BEIJING, April 2, 2026 /PRNewswire/ -- Yonyou AI Lab has released a preprint introducing the ontology harness, a framework designed to make enterprise AI decisions fully traceable, verifiable, and auditable. The research argues that the accountability crisis in enterprise AI is not a model capability problem; it is an architectural one, and proposes a concrete solution backed by experimental data.
When Something Goes Wrong, No One Can Explain Anything
Enterprise AI deployments face a structural accountability gap that accuracy benchmarks cannot capture. When AI-driven decisions in compliance reviews, expense approvals, or supply chain routing go wrong, organizations frequently cannot explain how the decision was reached, what authorization it went through, or whether it can be reproduced. Regulators, auditors, and internal management are not asking whether the conclusion was correct. They are asking where it came from.
Yonyou AI Lab's position is that the answer is not to deploy a smarter model, but to change the architecture entirely: an architecture that makes AI decisions fundamentally traceable at every step. The ontology harness is that architecture.
Ontology Harness: From "Model-First" to "Ontology-First"
In conventional deployments, the large model holds full reasoning authority, treating the enterprise ontology as a data source it may consult or ignore. The ontology harness inverts this. The Enterprise Ontology (EO) becomes the governing authority, encoding every entity, business rule, and authorization boundary, while LOM(Large Ontology Model) serves as the harness that couples this authority to real business tasks. Every reasoning step must occur within ontology-authorized boundaries. Every decision leaves a verifiable trail.
The core pipeline is event → simulation → decision. Each incoming business event activates scenario conditions pre-encoded in the ontology, which drive deterministic graph mutations in an isolated sandbox, evolving a scenario-valid subgraph from which all decisions are exclusively derived. An answer produced by bypassing this simulation step, even if it happens to be correct, has no compliance basis and cannot be audited.
Why General-Purpose LLMs Fall Short
Consumer AI operates under a single-stage contract: find the best answer across the full knowledge space. Enterprise AI requires a two-stage contract that cannot be reversed: first determine the scenario-valid decision set through ontology-governed simulation, then find the optimal solution within that set. Stage one is a non-skippable compliance gate. General-purpose LLMs treat scenario constraints as soft preferences and skip simulation when they believe they already know the answer. This is not a model defect; it is a consequence of how they are trained. Deploying them directly into core enterprise processes with audit requirements is using the wrong architecture for the job.
Validating the Architecture with Data
Yonyou AI Lab benchmarked LOM-action, its ontology-governed model, against leading general-purpose LLMs across a set of enterprise graph reasoning tasks. The research introduces Tool-Chain F1, a metric that measures whether a model genuinely follows the complete simulation pipeline rather than arriving at correct answers by bypassing it. The gap between answer accuracy and Tool-Chain F1 is defined as Illusive Accuracy (IA): the higher the IA, the more "correct answers" were obtained by bypassing the architecture with no audit basis.
Results confirm a decisive architectural advantage. LOM-action achieved 93.82% accuracy and 98.74% Tool-Chain F1, against just 24–36% F1 for frontier models despite similar answer accuracy, a four-fold gap. A manual review of 50 frontier model outputs on basic tasks found that 47 were pure text responses invoking no tools whatsoever: the model answered from parametric memory, happened to be correct, and left no verifiable trace. On scenario simulation tasks most closely resembling real enterprise decisions, LOM-action achieved 100% accuracy with Tool-Chain F1 of 98.7%, while frontier models reached only 64–66% accuracy with F1 below 35%. The research team recommends Tool-Chain F1 of at least 0.90 and Illusive Accuracy no greater than 0.30 as deployment readiness thresholds for simulation-sensitive systems.
Four Principles for Production Deployment
The paper also outlines four engineering principles for production deployment of the ontology harness:
- Business logic belongs in the ontology, not code. Code-implemented rules duplicate knowledge and create unauditable bypass paths.
- All context must be ontology-aligned. Every entity entering the reasoning pipeline must be mapped to its canonical ontology code before any simulation begins.
- Ontology-fine-tuned models should be preferred over general-purpose LLMs for all applicable pipeline tasks. A smaller model trained to enforce the simulation pipeline is more trustworthy than a larger model that treats constraints as optional.
- The ontology schema and graph query logic must remain human-readable through audit interfaces. This enables operators to verify which nodes were matched, which were pruned, and from which graph state decisions were derived.
The full preprint is available at: https://www.preprints.org/manuscript/202604.0021
About Yonyou AI Lab
Yonyou AI Lab is the AI research division of Yonyou Network Technology Co., Ltd., focusing on enterprise AI architecture, ontology-driven reasoning, and auditable decision frameworks.
SOURCE Yonyou
Share this article