Artificial intelligence (AI) has rapidly changed the way we live and work. Still, the challenge of AI data bias has come to the fore. As we move towards the future of Web3, it is only natural that we will see innovative new products, solutions and services that leverage both Web3 and AI together. And while some commentators argue that decentralized technologies may be the answer to data bias, that couldn’t be further from the truth.
The size of the Web3 market is still relatively small and difficult to quantify as the Web3 ecosystem is still in its early stages of development and the exact definition of Web3 is still evolving. While the market size in 2021 was estimated to be close to $2 billion, various analysts and research firms have reported an expected compound annual growth rate (CAGR) of around 45%, which, combined with the rapid development of Web3 solutions and consumer adoption, puts the Web3 market in the at a rate of about $80 billion by 2030.
While growing rapidly, the current state of the industry coupled with other tech industry factors put AI data bias on the wrong track.
The relationship between bias, quality and loudness
AI systems rely on large amounts of high-quality data to train their algorithms. OpenAI’s GPT-3, which includes the ChatGPT model, has been trained on a huge amount of high-quality data. The exact amount of data used for the training has not been disclosed by OpenAI, but it is estimated to be in the hundreds of billions of words or more.
This data was filtered and pre-processed to ensure high quality and relevance to the language generation task. OpenAI used advanced machine learning (ML) techniques such as transformers to train the model on this large dataset, enabling it to learn patterns and relationships between words and phrases and generate high-quality text.
The quality of the AI training data has a significant impact on the performance of a machine learning model, and the size of the dataset can also be a critical factor in determining the model’s ability to generalize to new data and tasks. But it is also true that both quality and volume have a significant impact on data bias.
Unique risk of bias
Bias in AI is an important issue as it can lead to unfair, discriminatory and harmful effects in areas such as employment, credit, housing and criminal justice, among others.
In 2018, Amazon was forced to remove an AI recruitment tool that showed bias against women. The tool was trained on resumes submitted to Amazon over a 10-year period that included mostly male applicants, leading the AI to downgrade resumes containing words like “female” and “female”.
In 2019, researchers discovered that a commercially available AI algorithm used to predict patient outcomes was biased towards black patients. The algorithm was trained on data mainly from Caucasian patients, leading to a higher false positive rate for Black patients.
The decentralized nature of Web3 solutions combined with artificial intelligence creates a unique risk of bias. The quality and availability of data in this environment can be challenging, making it difficult to thoroughly train AI algorithms, not only because of the lack of Web3 solutions in use, but also because of the population that is able to use them.
We can draw an analogy from genomic data collected by companies like 23andMe, which are biased against poor and marginalized communities. The cost, availability, and targeted marketing of DNA testing services such as 23andMe limit access to these services for those in low-income communities or those living in a region where the service does not operate, such as poorer and less developed countries.
As a result, the data these companies collect may inaccurately reflect the genomic diversity of the wider population, leading to potential errors in genetic testing and healthcare and medical development.
And that brings us to another reason why Web3 increases AI data bias.
Industry bias and focus on ethics
The lack of diversity in the Web3 start-up industry is a major concern. As of 2022, women occupy 26.7% of tech jobs. Of these, 56% are women of color. Technology leadership positions are even less represented by women.
In Web3, this imbalance is even greater. According to various analysts, less than 5% of Web3 startups have a female founder. This lack of diversity means there is a high probability that AI data bias will be unknowingly ignored as a problem by male and Caucasian founders.
To meet these challenges, the Web3 industry must prioritize diversity and inclusivity in both its data sources and its teams. In addition, the industry needs to rethink why diversity, equality and inclusion are necessary.
From a financial and scalable point of view, products and services designed from different perspectives are more likely to work for billions of customers than for millions, making these startups with different teams more likely to achieve high returns and opportunities at scale global. The Web3 industry also needs to focus on data quality and accuracy, ensuring that the data used to train AI algorithms is free of bias.
Can Web3 find an answer to AI data bias?
One solution to these challenges is the development of decentralized data markets, which enable secure and transparent exchange of data between individuals and organizations. This can help reduce the risk of biased data as it allows a wider range of data to be used in training AI algorithms. In addition, blockchain technology can be used to ensure the transparency and accuracy of data so that algorithms are not biased.
But ultimately we will face a major challenge of finding broad sources of data for many years until Web3 solutions are used by the majority of the audience.
While Web3 and blockchain continue to feature in the mainstream news, such products and services are likely to appeal to people in the startup and tech communities — who, as we know, lack diversity, but who are also a relatively small slice of the global pie.
It is difficult to estimate what percentage of the world’s population works in startups. In recent years, the industry has created approximately three million jobs in the United States. Considering that compared to the entire US population – and disregarding the job losses – the tech industry is not remotely representative of working-age citizens.
Until Web3 solutions become more widespread and extend their appeal and use beyond those inherently interested in technology and become sufficiently affordable and accessible to a wider population, access to high-quality data in sufficient quantities to train artificial intelligence systems will remain a major obstacle. The industry must take steps to address this problem now.
Alexandra Karpova is the head of marketing at Lumerin.
DataDecisionMakers
Welcome to the VentureBeat community!
DataDecisionMakers is a place where experts, including data technicians, can share insights and innovations related to data.
If you want to read about cutting edge ideas and current information, best practices and the future of data and data technology, join us at DataDecisionMakers.
You might even consider writing your own article!
Read more from DataDecisionMakers