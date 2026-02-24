PSG Consulting and Innovating for the Public Good Groundbreaking Report on Large Language Models Exposes Imbalance in Training Data

WASHINGTON, Feb. 24, 2026 /PRNewswire/ -- PSG Consulting and Innovating for the Public Good: R&D for Democracy (IFPG) today released original research revealing that the largest single source of AI Large Language Model (LLM) training data is structurally skewed along ideological and factual lines and is vulnerable to ongoing manipulation. LLMs now shape how people encounter facts, narratives and moral claims. They function as unseen editors of the public sphere. Political bias and factual inaccuracies in these systems are not peripheral issues — they threaten democracy's strength and sustainability.

The report, AI Large Language Model Training: The Potential Risks of Ideological Skewing, presents first-of-its-kind research conducted by Dewey Square Group analyzing the sources and types of websites that allow or block "AI web crawlers" — the automated tools that collect open-web training data for LLMs.

The original research analyzes 27 AI web crawlers across 153 U.S. news and political outlets, categorized by ideological lean and factual reliability using ratings from the independent Media Bias/Fact Check. The sample includes conservative, non-partisan and left-leaning websites. It found that center-left and high-factuality outlets impose the strictest crawler blocks, while conservative, far-right and low-factuality sites impose far fewer. As a result, the portion of the open-web available for training LLMs becomes structurally skewed toward lower-quality and more politically conservative material.

Key Findings



Center-left outlets impose the strictest restrictions on AI crawlers, with less than 40% of data accessible to crawlers.





Far-right sites impose few restrictions and are nearly 80% accessible to AI crawlers.





Among the seven highest-impact AI web crawlers, median center-left outlets block 100% of access, while median conservative or far-right outlets block none.





Websites rated "very low" for factual accuracy are approximately 90% accessible to AI crawlers, while high-factual outlets fall below 50%.





Center-left sites explicitly block specific AI crawlers in approximately 32% of interactions across all 27 crawlers tracked, roughly 8× the conservative rate and nearly 10× the far-right rate.





The research documented approximately a dozen major public licensing agreements, with AI platforms collectively worth hundreds of millions of dollars, covering more than 50 publications across the political spectrum. However, these deals do not correct the underlying asymmetry: conservative and far-right content often enters training pipelines through both open-web crawling and licensing agreements, while center-left content that is blocked from open-web crawling must rely on licensing as its primary pathway into training data.





AI "Bothsides-ism" has emerged as a new risk category. This is evidenced in Anthropic's November 2025 report on Measuring Political Bias in Claude, which introduces a "political even-handedness." Critics have raised concerns that such approaches could risk incentivizing false equivalence, for example, presenting vaccine science and anti-vaccine claims with equal depth.





Transparency in AI training data has sharply declined since 2020. Subsequent model releases from major developers have offered only general descriptions, with few source breakdowns or detailed dataset proportions, making it impossible for independent researchers to verify whether training data is being responsibly sourced and implemented. Fine-tuning and alignment processes may partially offset the asymmetries documented in this research, but again, the AI platforms disclose too little about these interventions for independent researchers to verify their effectiveness.

"Our research reveals that the least factually accurate media outlets are the very outlets that are most accessible to AI crawlers. Meanwhile, the most accurate media outlets have notably lower exposure to AI crawlers," said Tim Chambers, Principal and Co-Founder of Dewey Square Group. "These results show a clear inverted funnel: as factual reliability increases, exposure decreases. This is concerning, especially as LLMs wield increasing power over the information made available through traditional online search, social media placements, personal assistants and new venues emerging every day."

"The findings in this report should be a call to action on the part of policymakers in Congress and in the states. As this technology increasingly impacts every aspect of our lives, it is imperative that we ensure factual distortions and imaginary realities are not allowed to shape public understanding. We either regulate these powerful AI tools or we let them regulate us," said Page Gardner, President of PSG Consulting and Founder of IFPG.

The report documents the risks of how data bias can arise through uneven crawler access, where publisher blocking patterns determine which content is available for AI training. These asymmetries can be compounded by later weighting and tuning decisions that remain opaque and unavailable for independent review. But these vulnerabilities are not the only risks across the chain of influence shaping LLM outputs. Human bias can enter through the demographics of contractors who rate model responses. Structural bias can enter through the concentration of capital, media and AI ownership, which gives a small network of investors and executives outsized influence. Regulatory lag allows these forces to interact with little oversight and an increasing lack of transparency.

Regulators and lawmakers have both the authority and the responsibility to address these structural asymmetries, and they must act to ensure a fair and accurate information ecosystem for all Americans. The report concludes with strategic recommendations for policymakers, developers, and state and federal regulators.

To download the full report, visit: https://www.psgconsulting.com/research-publications/potential-risks-of-ideological-skewing

