Cascading AI Biases
A large language model is a type of system that’s been trained on a corpus of data, and which is capable of generating new data by referencing that corpus.
The most popular (as of the mid-2020s) artificial intelligence tools, like ChatGPT, Gemini, and Claude, are LLM-based AIs, and though this is only one approach to building AI systems, it’s proven to be immensely flexible in terms of the type of data ingested, the types of media generated, and the utility such tools provide all sorts of users.
There’s been a fair amount of concern about seeming bias in these systems, though, and that concern has increased as they’ve become more popular.
Such tools are being used for facial recognition purposes, for instance, and some populations are more commonly misidentified by them (people with dark skin in particular), and that’s led to some pretty significant and negative consequences for folks thus misidentified.
The suspicion is that this particular type of bias is the result of the training data being heavily weighed toward light-skinned faces, but because these tools are trained on all the stuff people have made over the generations—and because people are biased—there’s bound to be quite a lot of latent favoritism (and the opposite) baked into these systems, as well.
Interestingly, recent research has shown that in addition to the human-derived biases these LLM-based AIs systems have, they may also be creating their own, internal biases, and then sharing those biases with each other.
Keep reading with a 7-day free trial
Subscribe to Brain Lenses to keep reading this post and get 7 days of free access to the full post archives.