British Library Embarks On Project To Archive Billions Of Webpages & Tweets [Updates]

shutterstock 72113320   British Library Embarks On Project To Archive Billions Of Webpages & Tweets [Updates]The British Library is set to begin a massive archiving project of  1 billion webpages from local (.uk) domains starting April 6th. The archiving project is meant to preserve digital records for posterity, and also includes public Facebook updates and tweets. It will also include eBooks and iPad editions of newspapers, among other forms of digital information. The digital records will be preserved across six libraries in the UK and Ireland. The project aims to capture the slices of life from our present and preserve them for the future.

Lucie Burgess, head of strategy at the library, believes many momentous events have passed by without any effort to capture them. For instance, the response of social media to the Queen’s Diamond Jubilee celebrations has been lost to the “digital blackhole” of the 21st century. The British Library will try to identify UK sites in the .org and .com domains as well, and only publicly available information will be archived. Readers at any of the six libraries participating in the project will be able to access the data.

british library01   British Library Embarks On Project To Archive Billions Of Webpages & Tweets [Updates]

According to the library, the digital archiving project seeks to replicate the existing archives for printed matter, but while the latter could only archive 750 million pages in 300 years, the digital one will collect 1 billion webpages in a year and then take it beyond that. The British Library project reminds us of a similar project started early this year by the Library of Congress to archive all public tweets in the U.S.

What do you think of digital archiving? Should Facebook updates and tweets be archived?

Source: The British Library via The Guardian

Image credit: archive image via Shutterstock

Check out more about:

The comments were closed because the article is more than 180 days old.

If you have any questions related to what's mentioned in the article or need help with any computer issue, ask it on MakeUseOf Answers—We and our community will be more than happy to help.

29 Comments -

1 votes

Johann

As long as you can opt out on your own sites via robots.txt then I don’t see an issue. archive.org (Wayback Machine) has been doing this since 1996.

1 votes

Chris Marcoe

I think the questioner would be…why would you need to archive all tweets and facebook posts. Seriously, in 50 years, who is going to be interested in what my 16YO son said about his new accomplishment on Minecraft?

1 votes

Saikat Basu

Well, the social media of today is a microsm of the world we live in. We still talk about how the world reacted when JFK was shot or the moon landings. Wouldn’t it have been interesting to capture all this in a mass of social media updates. Today’s news will be tomorrow’s history. Future sociologists will have a field day.

0 votes

dragonmouth

“Today’s news will be tomorrow’s history. Future sociologists will have a field day.”

Was there intelligent life on Earth in those years? :)

0 votes

Jeremy Garnett

To which the answer will invariably be ‘No’, when compared to the intelligent life of the future.Especially as the intelligent life of the future may be to us as we are to the dinosaurs.

0 votes

Saikat Basu

Well, we also think about the monkeys don’t we :)

1 votes

dragonmouth

Why is the NSA recording ALL electronic transmission? Because somewhere in the billions of messages might be one transmission between “terrorists” or some other miscreants. The government wants to KNOW what its people are up to.

0 votes

Chris Marcoe

I actually thought of this after i posted. And I agree with you. Too bad it couldn’t be edited for just the useful stuff.

0 votes

Saikat Basu

Well, we don’t know exactly how they will filter it. Maybe, they have some algorithm in place. I can’t see them recording what we had for breakfast.

0 votes

Chris Marcoe

That is exactly what I mean. I post quite a bit on my Facebook about baking and also what I and inventing for dinner. It really doesn’t seem like something the Gov’t would need. Of course, 100K years down the road, some anthropologist might decide to study what I made for dinner back in the day, so…

0 votes

dragonmouth

“Too bad it couldn’t be edited for just the useful stuff.”

Who knows what the government considers “important”? With their proclivity for seeing conspiracies and terrorists under every rock and behind every tree, an innocent exchange about a vacation could be construed as a description of an al Qaeda training camp.

0 votes

Chris Marcoe

True. I didn’t think of that. though, I think it might be a little extreme.

0 votes

Muz RC

yes i was agree with you statement, who knows right maybe someone of them is among us here… XD

1 votes

Christine St Syr Griffin

I think its pretty awesome, what a huge task, how many actual persons will be working on the project? And wow. are they hiring?, christine

0 votes

Saikat Basu

If you are somewhere near the six libraries, you can enquire :) Drop us a note if you get a position.

1 votes

Lisa Santika Onggrid

Well, some Facebook status and Tweets are admittedly, worth archived, but they should work on a filter to skip ‘Good morning’s and ‘I eat eggs for breakfast’s from curation. I’m sure your grandchildren will give you more respect if they don’t see your hangover status updates.

0 votes

dragonmouth

“they should work on a filter to skip ‘Good morning’s and ‘I eat eggs for breakfast’s”

Ahhh. But they never know when one of those innocuous phrases could could be a code phrase for something sinister, so they have to capture all of them. :)

0 votes

Saikat Basu

I don;t think the British Library is MI-5/6 or the NSA. Only “public” updates will be archived. It’s out in the open as it is.

0 votes

dragonmouth

Are you sure of that??? :)

0 votes

Jeremy Garnett

What better way of recording faux pas such as to influence politics and social positioning for decades to come.

0 votes

macwitty

Well I think this will say a lot about our time and I just do not know how they will handle the permissions on e.g. facebook.

1 votes

Scott M

A great project.I find myself agreeing with many others about archiving Facebook and Tweets,from a concern of privacy as well as that of relevance.I think archiving tweets facebook might only serve to underline the vapidity of the majority of what is currently messaged.I haven’t any way to accurately gauge the importance of most of what is posted but I doubt the figure would be very high.High profile tweeters perhaps such as politicians,those involved in historical contexts such as revolutions or professionals keeping others updated perhaps,but the vast majority are simply forgettable other than to those immediately involved in the messages.

0 votes

Max

Haven’t we heard this before?

1 votes

Oklahoma_Mike

I can see a day when they want you to pay for access to this currently free data to “cover their expenses”. What is free today will eventually be taxed in some form or another.

0 votes

hussein

it is a fantastic library

0 votes

Norman P

Sounds like a huge waste of resources to me. In a way it is kind of funny to think about some poor sap 100 years from now stuck in a little office sorting through all the random nonsense I’ve posted on Facebook and trying to make sense of it all…good luck guy, hope you packed a lunch!

1 votes

Saikat Basu

Aah, let’s not be too skeptical here. I think it will have its uses…let’s not forget that 100 years from now, our current time will be “history”.

0 votes

Norman P

Great point!

0 votes

Jeff D

Why should they bother to do this? It seems a trememdous expense to to archive essentially useless and unimportant information. Shouldn’t they concentrate their efforts on soemthing imprtant?