Why an important part of internet history may disappear forever

Why an important part of internet history may disappear forever

Since it came into the world, the internet has revolutionized our way of communicating. Since then, this tool has become vital part of our history and the conservation of this data can be key to facilitating the work of future historians. However, this history of how we live our lives digitally could be deleted in the future.

The effort to maintain human history on the Internet

The informal organization is embarked on what may be the most ambitious digital archive project in history. In total, they have already gathered 866 billion web pages, 44 million books, 10.6 million videos from movies and TV shows and more.

Wayback Machine.jpeg

Founded in 1996, Internet Archive is the most ambitious project to protect humanity’s digital archive.

@MarkGraham

Its objective is clear, but complex: to preserve the greatest amount of information in an era characterized by vertiginousness and the constant creation of content. “The risks are multiple. Not only that the technology can fail (that certainly happens). But more importantly, that institutions fail or companies go bankrupt. News organizations are absorbed by other news organizations or, more and more frequently, they are closed“he explained Mark Grahamdirector of the Wayback Machine, of the Internet Archive.

The mission is not simple. Currently, a quarter of all websites that existed at some point between 2013 and 2023 no longer existas indicated by a recent study by the Pew Research Center, a Washington DC-based think tank, which raised the alarm about the disappearance of our digital history.

The researchers found that the problem is more serious the older a web page is: the 38% of websites Pew tried to access that existed in 2013 no longer work. Currently, around 8% of web pages published at some point in 2023 disappeared in October of that same year.

This isn’t just a concern for history buffs and internet obsessives. According to the study, one in five government websites contains at least one broken link. Pew found that more than half of Wikipedia articles have an invalid link in their references sectionwhich means that the evidence that supports online encyclopedia information is slowly disappearing.

The work of Wayback Machine, from the Internet Archive, seeks precisely to protect this information. Using armies of robots, the organization scours the labyrinths of the Internet to download working copies of websites as they change over time and make them available to the public free of charge.

Some other organizations are working on similar projects. The United States Library of Congressfor example, preserves government websites, the sites of members of Congress, and a collection of American news sites. This same body also kept a copy of each of the tweets sent since the founding of Twitter (now known as X), until the project closed in 2017.

Other governments carry out their own initiatives. The UK Web Archive conducts a Annual crawl of websites with .UK domain namescapturing a snapshot of the British internet at least once a year.

Threats to the internet digital archive

Last week, the organization announced a major partnership with Google, in which the technology giant will include links to the Wayback Machine in search resultsalthough no financial details of the agreement were published. However, other recent news shows that the project remains fragile. That vulnerability was revealed in a court case against the Internet Archive by four major book publisherswho alleged that the practice of scanning physical books and lending digital copies violates copyright law in the US.

In detail, before the Covid-19 pandemic, the Internet Archive only provided one digital copy at a time of each physical book in your collection. But during the quarantine, the organization lifted that restriction, allowing users to borrow unlimited digital copies of books to try to compensate for the closure of physical libraries.

memory-internet.jpg

The Internet became the great memory of humanity and different organizations seek to preserve it.

The Internet became the great memory of humanity and different organizations seek to preserve it.

In response to complaints from publishers, A US court ruled that this practice was illegal in 2023 and, at the beginning of September, The Internet Archive’s appeal against that decision was rejected. The organization previously said it agreed to pay a publishing industry trade group an undisclosed sum in connection with the case.

Internet Archive faces a similar case with record labels, for digitizing discs, conflict that could cost you US$400 million if he loses. It is an amount that could endanger the survival of the non-profit organization.

Existential legal battles are not the only dangers threatening the world of digital preservation. The British Library’s UK Web Archive faced a cyber attack that he left out online its digital systems in October 2023. Almost a year later, this file is still dealing with the fallout. Online access to much of its collection remains unavailable.

The organization shares these concerns. If the Internet Archive’s work were to stop and “that void was not immediately filled, then much of what is currently available on the public web would be at risk“Graham detailed.

An unofficial answer

Without a formal effort to organize attempts to preserve the Internet, this titanic task remains in hands of amateurs, volunteers and a few unofficial organizations They generally operate independently. “It makes sense that the file response is decentralized“explained Mar Hicks, a technology historian at the University of Virginia in the US. “But one of the problems is the variety of priorities.”

Along these lines, the historian assured that “when everything is so decentralized, the priorities are going to be very different.” The concern about such an ad hoc and decentralized approach is that there may be overlaps, meaning waste valuable file resources by getting duplicate or triplicate copies of popular websitesall while overlooking some areas that may have historical importance because they fall between the responsibilities of different groups.

“Archivists will say these problems have been around for a long time,” Hicks said. This problem is amplified by the level of material produced in our digital world: Almost 1 billion emails are sent every day and more than 500 hours of video content are published on YouTube every minute.

“The Internet is essentially a hose of information and material,” Hicks defined. “There’s no point in trying to capture everything that comes out of the hose.. “That wouldn’t make sense from a resource standpoint.”

For Hicks, there needs to be some kind of priority on what is being saved from our generation’s fingerprints. Otherwise, we run the risk that rapidly rising costs will sideline efforts to save the history of the web, for example.or talk about the oceans of digital files that are offline.

One thing is clear, Hicks noted: We should all contribute to support the fight for preservation. “From a very pragmatic perspective, if we don’t pay these people and make sure these archives are funded, they won’t exist in the future, they’ll disintegrate, and then the whole point of collecting them will have gone out the window,” Hicks says. “Because the objective of the file is not simply collected, but persists indefinitely into the future“.

Source: Ambito

Leave a Reply

Your email address will not be published. Required fields are marked *

Latest Posts