SAN FRANCISCO, Oct. 25, 2024 /PRNewswire/ -- The Common Crawl Foundation, a non-profit organization founded in 2007, dedicated to providing a copy of the internet to the public, and Constellation Network, a Web3 blockchain ecosystem notable for providing solutions to the US Department of Defense today announced a strategic partnership aimed at democratizing and enhancing the accessibility and utility of web-crawled data on blockchain technology for artificial intelligence (AI) and data applications.

This collaboration will explore potential opportunities for improving large language models used by AI, starting with Common Crawl's vast dataset that is used by 80% of Large Language Models, crawled over 250 billion web pages to date (19 billion in 2024 alone), and consists of an archive of nearly 9 petabytes of archived crawled data. By leveraging Constellation's decentralized network, Hypergraph, to add immutability, provenance, and auditability around the data the partnership aligns to provide joint solutions around responsible and transparent AI.

With AI projected to be a $3T industry by 2030, there are growing demands for secure solutions to sharing common data sets being used for the training of large language models, improving storage of queried and cleaned data, monetization opportunities for data, and enhanced transparency with the source of data. With Constellation's unique approach to providing tools to converge existing infrastructure with distributed and decentralized networks, and Common Crawl's history of data and growth of data utility, this partnership aligns to further democratize data.

"This partnership represents a significant step forward in securing trusted distribution of Common Crawl" said Rich Skrenta, Executive Director of the Common Crawl Foundation. "By combining our comprehensive web archive with Constellation's proven implementation of blockchain technology, researchers and developers from around the world can trust what they're getting from Common Crawl and have a model for authenticating large open data sets, such as those used for AI training".

Ben Jorgensen, CEO of Constellation Network states, "The partnership between Constellation Network and Common Crawl highlights mainstream adoption of web3 solutions outside the echo chambers of crypto. This alignment continues Constellation's mission of our zero trust network being used as a public good for a data-focused future". Jorgensen continues, "Our aim is to further attract new developers by showcasing capabilities, such as integrating immutability throughout digital workflows, and thus further differentiate ourselves from earlier generations of blockchain technology."

The two organizations will begin a phased approach to implement this initiative, starting with a customizable subnet, called a metagraph, which will integrate a subset of Common Crawl's data. This subnet is currently live on their test network and will be soon deployed to Constellation's public network, Hypergraph. Further details of the live metagraph will be featured in the coming weeks along with information on how organizations and developers can participate.

About Common Crawl Foundation

The Common Crawl Foundation is a 501(c)(3) non-profit organization dedicated to providing a copy of the internet to the public, free of charge. Their web archive consists of petabytes of data collected over years of web crawling, serving as a critical resource for researchers, businesses, and developers worldwide.

About Constellation Network

Constellation Network is a Web3 blockchain ecosystem that bridges crypto economies with traditional businesses. Its flagship network, Hypergraph, provides a solution for fast, scalable, and zero-fee transactions. Constellation's Network is validated by the US Department of Defense which has been a customer since 2019.

