Preserving The Internet: The Wayback Machine

The Wayback Machine

News broke this week that The Wayback Machine has now archived 4 hundred billion webpages. The Wayback Machine enables users to see how another website looked in the past, and after launching in 1996 the project now requires an incredible 5 petabytes of storage to maintain all its data.

The figures behind the project are impressive; The Wayback Machine’s database is queried over 1,000 times every second by over 500,000 people a day – making Archive.org the 250th most popular site on the entire internet.

The company’s mission is equally impressive, explicitly expressing their ideology on their website “Most societies place importance on preserving artifacts of their culture and heritage. Without such artifacts, civilization has no memory and no mechanism to learn from its successes and failures. Our culture now produces more and more artifacts in digital form. The Archive’s mission is to help preserve those artifacts and create an Internet library for researchers, historians, and scholars”.

Brewster Kahle launched the project in 1996 at the same time he started the now-famous web crawling company Alexa Internet. The project has its roots in the development of software that could crawl and download all publicly accessible World Wide Web pages, the Gopher hierarchy, the Netnews bulletin board system, and downloadable software. The archived content itself wasn’t available until 2001 but by 1999 the archive had already expanded its collections to include texts, audio, moving images and software.

Uses of such an archive are widespread. There is a day-today practical use, as evidenced when The Wayback Machine provided access to important Federal Government sites that went dark during the Federal Government shutdown in the United States. There is also an educational aspect, with importance lessons to be learned from the vast amount of big data stored within its archives. Finally, there is a historical aspect, as the development of our internet has been preserved for future generations to enjoy.

The cloud-based project has not been without its controversies. In 2012 China restored access to the database after blocking it for several years, while in the USA an activist sued the organisation for $100,000 after claiming that archiving her site breached her terms of service. The dispute was ultimately settled out of court.

The success of the project has started to attract imitators. Online companies such as Archive.It, Freezepage, and iTools all offer similar services, but not of them can offer the same quality and depth of content as The Wayback Machine.

Is this a vital project or a waste of valuable storage space? Are there ethical questions surrounding the unhibited archiving of so many sites, or are there taking a virtual photograph of events? Let us know in the comments below.

By Daniel Price

Episode 6: Cloud Migration: Why It’s More Important Than Ever

The Importance of Cloud Migration Moving fully to the cloud is still a concern for ...

Episode 5: How the Pandemic is Changing Business and the Cloud

An Interview with Ed Dryer of Steadfast With the global pandemic wreaking havoc on business ...

Episode 2: Coronavirus Phishing Emails and Work-from-Home Meetings

Coronavirus Phishing Emails What to watch out for as scammers exploit pandemic panic, and tips ...
David Shearer

Looking Back – and Looking Forward to 2020

As we celebrate our thirtieth anniversary here at (ISC)², it’s incredible to look back at the changes our industry has been through. From advances in ...
Bruce Guptill

As The Digital Workplace Strengthens, Traditional Business Thinking Must Die

The Digital Workplace The cloud-driven, digital workplace is enabling better ways of working, new ways of doing business, and entirely new business opportunities. It is ...
Suraj Gupta

The Rise of the “Ecosystem of Ecosystems”

Ecosystems Emergence Even during these uncertain times, once fierce competitors are now collaborating and co-existing to not only survive, but thrive. Salesforce is partnering with ...
Brad Thies

SOC Reporting Requirements You Need to Know in a Cloud Environment

SOC Reporting Requirements Security lapses in some of the world's biggest companies continue to appear in news headlines, and information security is top of mind ...
Anita Raj

Will there be a normal to go back to after COVID-19?

The COVID-19 Aftermath Until November last year, not one of us would have expected life to take such a dramatic turn in as short as ...
Martin Mendelsohn

Who Should Protect Our Data?

Who Should Protect Our Data in The Cloud? You would think that cloud service providers are safe havens for your personal data – they all ...
Steve Prentice

Cloud-Based Financial Software Reinforces the 80/20 Rule of Business Management

Cloud-Based Financial Software Sponsored by Sage 50cloud Small businesses are known for being innovative and customer-focused in a way that their larger competitors cannot. This ...
Bruce Guptill

How CFOs and CIOs See Finance Management Priorities

Cloud and the Finance-IT Effectiveness Gap IT leaders today tend to be much better aligned with business and operational leaders and business goals than they ...
Ajay

The Quest to Bring Computers to People – Personal Computing

The quest to bring computers to people,' rather than people to computers" resulted in the invention of Personal Computer The world changed its direction a ...
Ajay

Explainable Intelligence Part 3 – The Strategy for XAI

The Strategy for XAI It is not enough to say that something is true just because 'I know it’s true!' – we have to have ...