Sink or swim. The Gen-AI revolution

Opinion Piece

For the last 20 years Wikipedia and its sister projects have been a beacon of free knowledge. A global network of volunteers has managed to create the single biggest collection of human knowledge the world has ever seen – it’s a truly remarkable story. In a world of trillion dollar tech giants, Wikipedia manages to sustain itself in the top 10 websites worldwide with nothing but donations from grateful readers and an army of volunteers.

As I arrived at the annual ‘Wikimania’ conference, this year hosted by Katowice, Poland’s City of Science, I wasn’t quite sure what to expect. The last time I attended this gathering of Wiki’s great and good was 5 years ago and the world has changed somewhat in that time – and I’m not talking about Covid. As 1000 Wikimedians from 143 countries shuffled into the auditorium the first thing that struck me was that the Wiki community is as strong, enthusiastic and passionate as ever.

Group of Wikimania attendees gathered outside the venue in Poland for a group photo — Wikimania attendees gather for a group photo. *Armineaghayan, CC BY-SA 4.0*

I was thrilled to learn that other organizations, like the Czech National Library are engaging with Wikidata (Think Wikipedia but as linked open data) to enrich authority and catalogue data, something I’ve been banging on about for years. There were also a number of community led authority control initiatives showcased from India, Italy and Switzerland. Seeing others tread the same path as myself offers both vindication and an opportunity to swap notes and learn from each other.

I also learned about a number of great projects aimed at using Wiki platforms to document endangered language, heritage and archives, often in places which lack the basic infrastructure to do this in an official, state-led capacity.

But I’m skirting around the biggest issue. Imagine you could just ask a helpful computer to scour the internet for the exact facts you need, or to summarize complicated topics in seconds. Now imagine that this machine could learn your habits, the way you like to receive information, the languages you speak, and then personalize every response to you? Wouldn’t that pose an existential threat to Wikipedia – and any other website that presents knowledge in just about any format?

Yes I’m talking about AI – in particular, large language models (LLMs) like ChatGPT and Gemini. Using more electricity than a small country these goliath machines get more and more powerful with every iteration and whilst there is still some debate as to whether they will ever be able rid themselves of the occasional hallucination or get to grips with the nuances of human knowledge, I’ve seen enough to know that they present a viable threat to traditional websites as destinations.

Let’s be clear, I’m not anti AI. There are some exciting opportunities to use this tech to make our lives easier, to hand over mundane and repetitive tasks to the machines so that we can focus on the things that make us human – creativity, critical thinking, community building and the like. However, trusting an algorithm with the world’s knowledge makes me uneasy. For a decade the likes of Google have relied on Wikipedia as its main source of knowledge. And whilst Wikipedia is not perfect, we all have the power to affect change, to root our misinformation, to tackle gender bias or to expand coverage in different languages. It’s an interactive and democratic process.

Last year a Google update known as the ‘The Killer Whale Update’ led to a 50% drop in Wikipedia being referenced as a source of facts about people. Instead Google is getting its information from its ‘Knowledge Vault’ – An amalgam of billions of pieces of information scraped together from all over the web with little or no human oversight to create a ‘trusted’ source of knowledge. Its very name conjures up images of a dark and shady, impenetrable bunker. Unlike Wikimedia projects, there is no place here for public scrutiny and interaction.

AI generated image depicting a possible AI enriched version of Wikipedia. Source; Chat GPT

We have also seen the introduction of AI generated summaries for web search results, and the top brass at Wikimedia openly voiced their concerns that Google’s plans to expand this feature, erodes the established practice of providing a list of websites as a primary search result output. This threatens to drastically reduce traffic to Wikipedia, and whilst it may still play a role in informing AI, without traffic donations will likely tumble and it begs the question, will editors still come and contribute without the clear incentive of a large (human) readership?

“A world without Wikimedia places knowledge in the hands of corporate organizations whose primary incentive is money not truth”

A world without Wikimedia places knowledge in the hands of corporate organizations whose primary incentive is money not truth. It’s likely that the AI snake has already started eating its tail, consuming and learning ‘facts’ from other AI generated content – the original source for which have long been lost along the way.

My main takeaway from this year’s Wikimania is that we have reached a watershed moment, like the dot com boom and the birth of Wikipedia itself. If, when the dust settles on the GenAI revolution, Wikipedia is to survive, it must adapt. In fact any website that purely offers information, from historical knowledge to medical advice or product reviews, could well serve primarily as repositories rather than destinations, although how exactly Google plans to maintain advertising revenues whilst diverting traffic away from websites remains to be seen.

“It’s sink or swim, and I fear sinking would erode the quality and diversity of the online information we increasingly depend on.”

The free and open knowledge movement has to find a way to adapt to a changing internet. It’s sink or swim, and I fear sinking would erode the quality and diversity of the online information we increasingly depend on.

The National Library of Wales too has to adapt. We must continue to support open knowledge by collaborating with Wikimedia and others, but we must also accept the new reality. Our justification for working so closely with Wikimedia has been – ‘that’s where people go for their knowledge, so we have a duty to support the quality and diversity of content relating to Wales, its language and people’. If that is no longer the case, then we have a duty to find new ways of contributing to, and influencing the datasets which power the AI knowledge engines of the future.

Jason Evans

Open Data Manager at the National Library of Wales

Video posts

.

About

Tags

Sink or swim. The Gen-AI revolution

Exploring Named Entity Recognition Feasability for Welsh Language Text

Reimagining cultural heritage data

A first look at LLaVA AI

Video posts

.

About

Tags