Could Programs Like Wiki Bot Ever Produce All Internet Content?
Imagine finishing a novel, and realizing that it’s one of the best novels you’ve ever read. Then, someone tells you that the novel was written by a robot. Would you believe them?
Today, the world of linguistics and artificial intelligence is in the earliest, pioneering development stage of “bot” authors. At present, at least two of the most significant content producers on the Internet – Wikipedia and Associated Press – both use robots to write online articles.
At first blush, this seems like a shocking evolution of the art of writing. Most people believe that there are certain human tasks or jobs that could never be replaced by robots — and an activity that’s as creative and complex as writing is one of those. Or is it?
The Wiki Bot
The virtual droid that received the greatest press recently is a Wikipedia bot named Lsjbot. It is the creation of Sverker Johansson of Sweden, who wrote the code to scrape information from a number of trusted sources, for the purpose of piecing together short articles called “stubs” on topics related to animal taxonomy.
Lsjbot reportedly pumps out 10,000 articles per day and so far has written over 2.7 million articles, all of them human-readable and intelligible. Media reports like that at Popular Science cite that this represents “8.5 percent of the articles on Wikipedia”. However, as the Wikimedia blog explains, those Swedish-language articles make up a very large bulk of articles on Swedish Wikipedia, but none of those make up the much more popular and voluminous English Wikipedia.
With that said, that doesn’t mean English Wikipedia is free from the bot invasion. The real invasion started way back in 2002, when Wikipedia curator “Ram-Man” created an artificial intelligence program he called “rambot”, which was essentially a script that would scrape tables from the U.S. Census, and push out thousands of articles per day covering just about any small town , city or county in the entire United States, and even in some municipalities in other countries as well.
Just about any area you search for on Wikipedia likely has its first draft Wiki article created by the Rambot. Even the tiny little 800 person town where I grew up has its own Wikipedia page, created in 2002!
Other Wikipedia article-producing bots through the years included:
- Robbot – A bot that initially was used to resolve interlanguage links, and eventually to resolve disambiguation page links.
- Asteroids – This bot scraped NASA data and wrote thousands of Wiki articles about asteroids.
Today there just under a thousand Wiki bots that prowl Wikipedia, constantly making edits to existing pages whenever errors or omissions are found. The most active is Cydebot, which to date has made over 4.5 million edits to Wikipedia pages.
Other Bot-Created Content
In July of this year, the Associated Press announced that it would be producing automated, robot-written business articles. Forbes reportedly uses bots to post short stock-based articles about companies that are doing well in the market.
The most impressive use of bot-technology for article creation was that of journalist/programmer Ken Schwenche of the Los Angeles Times, who wrote a program called Quakebot to automatically write articles about earthquakes only moments after they occur. The data for the articles comes directly out of the U.S. Geological Survey alerts. In a Slate interview, Ken reported that just this year, thanks to Quakebot, LAT became the first media outlet to report on a morning tremor within three minutes of the event actually occurring.
The post consisted of only four short paragraphs, and was fabricated by interlacing the relevant data with a pre-written template that Schwencke had created ahead of time.
Just like the Forbes stock reports and the AP’s business articles, the reports are fast, efficient and get the job done, but do they represent a future where more complex and creative articles might get written by bots? Should human writers be worried?
Writing About Complex Stories
Of course, linguistics has been a part of Artificial Intelligence for a very long time. In the article “Artificial Intelligence”, published in the Handbook of Pragmatics, the authors wrote:
Generating an extended piece of discourse involves some careful amount of planning. This complex task has conveniently been divided into two subtasks: deciding what to say and deciding how to say.
In other words, AI scientists, in attempting to get a machine to create discourse that appears authentic to humans, not only need to piece together the right words to say, but the “bot” also needs to understand how to say those things within the context of the subject matter. This is difficult enough for the human mind, where appreciation for context is embedded into children from a very young age. For machines, it’s a whole different ballgame.
Generating discourse is a multiple constrained process in which various knowledge sources should be taken into account: knowledge of the domain of discourse, the situational context and past discourse, as well as knowledge about the interlocutor or reader.
Understanding the subject matter, having a knowledgebase of existing information and data out there, and most importantly actually understanding what the reader wants, are all critical pieces of piecing together not only informational text, but also for creating more abstract writing like creative fiction.
Authors — even very young authors — learn to do this at an intuitive level. For programmers to create artificial intelligence that can do the same thing, it requires a level of algorithm generation (and self-teaching) that is still far more advanced than what the data-scraping bots of Wiki, Associated Press and others are yet capable of. Yet, these authors described how it isn’t impossible.
First, new symbols and structures can be created dynamically during program execution. Second, structures can be recursively defined and can thus represent a potentially infinite number of actual structures. And third, programs are also symbolic structures and can thus be created or manipulated by other programs.
If you look at the attempt by Ken Schwenche to use Quakebot in generating quick, accurate articles about earthquakes, you’ll see that some folks are playing simple games by formulating templates that the program can use to simply insert the data where it needs to go, and the article “sounds” like it was human generated — but only because it actually was human generated ahead of time.
However, there are some folks, like the company Narrative Science, who are taking this concept to a whole new level and actually applying a rough form of AI to the content that they produce for companies like Forbes and the government intelligence contractor In-Q-Tel.
Programs That Write Like Humans
What the programmers at Narrative Science are doing is taking complex data — whether it’s the scoring pattern and player stats during the course of a professional football game, or the stock values and business data about companies — and using the data itself to formulate exactly what needs to be said and how to say it.
So, for example in 2011, The New York Times provided a snippet of a sports report provided by Narrative Science, which shows just what this technology is capable of.
WISCONSIN appears to be in the driver’s seat en route to a win, as it leads 51-10 after the third quarter. Wisconsin added to its lead when Russell Wilson found Jacob Pedersen for an eight-yard touchdown to make the score 44-3 .
As you can see, Narrative Science creates an algorithm that uses both the context (sports), and the data (scores and player stats), to formulate a report that sounds exactly like what sports fans would expect to read from a human writing about sports.
Where Bots Go From Here
Even this impressive use of data analysis and AI linguistics is very limited in scope and capability. Company founder Kris Hammond made an over-the-top claim that in 20 years the company’s own computer program might be able to win a Pulitzer Prize in journalism.
While the enthusiasm is commendable, the reality is that it’ll probably take well over twenty years to accomplish that feat.
Case in point: Just this year, researchers at the University of New South Wales in Australia created a computer program that they called the “Moral Storytelling System”. The goal was the have the system create a fable based on user preferences.
The Moral Storytelling System created a fable that was intended to portray the lesson of retribution, where a fairy is punished for stealing a knight’s sword. This is the story that the computer program came up with.
Once upon a time there lived a unicorn, a knight and a fairy. The unicorn loved the knight.
One summer’s morning the fairy stole the sword from the knight. As a result, the knight didn’t have the sword anymore. The knight felt distress that he didn’t have the sword anymore. The knight felt anger towards the fairy about stealing the sword because he didn’t have the sword anymore. The unicorn and the knight started to hate the fairy.
The next day the unicorn kidnapped the fairy. As a result, the fairy was not free. The fairy felt distress that she was not free.
Not exactly an award-winning story. This is what the folks at Narrative Science and others like it are up against. If they want that Pulitzer, they’ve got a long way to go.
The act of writing is similar to the act of painting, where the imagination and whims of the human mind take shape in ways that are difficult to comprehend sometimes. There are so many other promising areas where artificial intelligence could be applied , it seems almost wrong to attempt replacing human creativity with the cold, empty logic of software. However, if there is ever an area that will be the last frontier of artificial intelligence — this will be it.
What do you think about the attempt to use bots to write like humans? How far in the future do you imagine this will become real? Share your thoughts and insights in the comments section below.