Understanding Gap
Evolutionary linguistics is the study of language through the lens of biological evolution and psychology.
The general idea is that language should be considered an extension of biology: a natural thing, not a synthetic invention. Thus while structural linguistics (which focuses more on language systems interacting with each other, changing as these systems conflict and entwine) can help us understand how and why language changes over time in different parts of the world, evolutionary linguistics attempts to shine a light on how and why language changes based on natural and social pressures—similar to how a bird’s beak might evolve based on the types of seeds available to eat, then, a language might evolve based on the priorities, needs, threats, and opportunities faced by those who speak it.
Today, many of the most prominent threats and opportunities, especially for young people, are found in the digital world, and that includes social interactions, status games, and the pursuit of knowledge and power.
Language has always tended to evolve between generations, as our generational labels are roughly based on the changing realities in which groups of people born at different times live their lives (and importantly, the realities they experience at different ages: as young children, as teens, as young adults, etc). Because new generational groups are shaped by different variables than previous generations, the language they speak also changes.
A recent study—co-authored by someone who’s part of the youngest current US generation, Generation Alpha, which encompasses Americans who were born between 2010 and 2024—looked at how the rapid evolution of language within this age demographic has made moderating online spaces difficult.
More specifically, because of the blending of language and emoji, and how phrases can have multiple (and sometimes opposite) meanings based on context, it’s become near-impossible for human or software moderators to keep tabs on online spaces to prevent bullying and abuse. The language, this paper posits, is just changing too rapidly for these “adults in the room” to keep up.
The folks behind this paper found that while Gen Alpha respondents could understand Gen Alpha lingo with which they were presented (“in my flop era,” “secured the bag,” “and I oop,” and “let him cook,” among many others) at a basic level 98% of the time, in different contexts 96% of the time, and in regards to their safety ramifications 92% of the time, parents were only able to do so 68%, 42%, and 35% of the time, and professional moderators only managed 72%, 45%, and 38%.
The researchers also tested four mainstream LLM-based AI systems (GPT-4, Claude, Gemini, and Llama 3), which scored, on average, about the same as the parents.
That latter point is interesting, as a lot of online moderation is being performed by AI systems (at least the front-line work) these days, and these systems seem to be trained on a corpus of data that doesn’t include a sufficient amount of Gen Alpha lingo, at least in the proper context. These systems either fails to parse different meanings of these words and emoji based on differing contexts, or they simply don’t have enough training data to attempt that parsing.
Which makes a sort of sense, considering how much more lingo from other generations is in current AI training sets (Gen Alpha is very new, after all). But this dearth creates an odd, dissonant situation, as Gen Alpha as a demographic spends a lot more time online than any other generation, and was born into a world in which everyone lived on their smartphones. They’re native denizens of these digital spaces, then, but the online worlds they occupy were shaped by people who had different worldviews, experiences, and consequently, linguistic norms.
Adults will almost certainly slowly pick up on some of these linguistic changes, probably adopting enough current Gen Alpha lingo make it uncool amongst the youths, prompting young people to move on to other lingo (same as it ever was), and AI systems will slowly incorporate more of such data, perhaps even prioritizing it as the corpuses of existing data produced by previous generations are exhausted, and the companies behind these systems make deals with the platforms on which Gen Alpha users spend most of their online time to gobble up what’s being produced, today, in real-time.
In the immediate future, though, there may be stratification in not just our generational, but also our digital linguistic landscapes, and that could create gaps that allow all sorts of abuse to occur, unrecognized and unmoderated.