Preserving The Cloud: The Wayback Machine

Preserving The Cloud: The Wayback Machine

Preserving The Cloud: The Wayback Machine

News broke this week that The Wayback Machine has now archived 4 hundred billion webpages. The Wayback Machine enables users to see how another website looked in the past, and after launching in 1996 the project now requires an incredible 5 petabytes of storage to maintain all its data.

The figures behind the project are impressive; The Wayback Machine’s database is queried over 1,000 times every second by over 500,000 people a day – making Archive.org the 250th most popular site on the entire internet.

wayback_machine_logo

The company’s mission is equally impressive, explicitly expressing their ideology on their website “Most societies place importance on preserving artifacts of their culture and heritage. Without such artifacts, civilization has no memory and no mechanism to learn from its successes and failures. Our culture now produces more and more artifacts in digital form. The Archive’s mission is to help preserve those artifacts and create an Internet library for researchers, historians, and scholars”.

Brewster Kahle launched the project in 1996 at the same time he started the now-famous web crawling company Alexa Internet. The project has its roots in the development of software that could crawl and download all publicly accessible World Wide Web pages, the Gopher hierarchy, the Netnews bulletin board system, and downloadable software. The archived content itself wasn’t available until 2001 but by 1999 the archive had already expanded its collections to include texts, audio, moving images and software.

Uses of such an archive are widespread. There is a day-today practical use, as evidenced when The Wayback Machine provided access to important Federal Government sites that went dark during the Federal Government shutdown in the United States. There is also an educational aspect, with importance lessons to be learned from the vast amount of big data stored within its archives. Finally, there is a historical aspect, as the development of our internet has been preserved for future generations to enjoy.

The cloud-based project has not been without its controversies. In 2012 China restored access to the database after blocking it for several years, while in the USA an activist sued the organisation for $100,000 after claiming that archiving her site breached her terms of service. The dispute was ultimately settled out of court.

The success of the project has started to attract imitators. Online companies such as Archive.It, Freezepage, and iTools all offer similar services, but not of them can offer the same quality and depth of content as The Wayback Machine.

Is this a vital project or a waste of valuable storage space? Are there ethical questions surrounding the unhibited archiving of so many sites, or are there taking a virtual photograph of events? Let us know in the comments below.

By Daniel Price

Follow Me!

Daniel Price

Daniel is a Manchester-born UK native who has abandoned cold and wet Northern Europe and currently lives on the Caribbean coast of Mexico. A former Financial Consultant, he now balances his time between writing articles for several industry-leading tech (CloudTweaks.com & MakeUseOf.com), sports, and travel sites and looking after his three dogs.
Follow Me!
FacebookTwitterLinkedInGoogle+Share

Sorry, comments are closed for this post.

Join Our Newsletter

Receive updates each week on news, tips, events, comics and much more...

Advertising Programs

Click To Find Out!

Sponsored Posts

Sponsored Posts

CloudTweaks has enjoyed a great relationship with many businesses, influencers and readers over the years, and it is one that we are interested in continuing. When we meet up with prospective clients, our intent is to establish a more solid relationship in which our clients invest in a campaign that consists of a number of

Popular

Top Viral Impact

Cloud Infographic – Big Data Survey: What Are The Trends?

Cloud Infographic – Big Data Survey: What Are The Trends?

Jaspersoft Big Data Survey Shows Rise in Commitment to Projects and Decline in Confusion Nearly 1,600 Jaspersoft Community Members Participate in Second Jaspersoft Big Data Survey San Francisco, February 4, 2014 – Jaspersoft, the Intelligence Inside applications and business processes, today shared results from its Big Data Survey. Nearly 1,600 Jaspersoft community members responded to

Cloud Infographic: The Education Of Tomorrow

Cloud Infographic: The Education Of Tomorrow

Cloud Infographic: The Education Of Tomorrow  Online Education is a very exciting topic for many as it opens up many new doors and opportunities. We’ve touched on areas such as Massive Open Online Sources (MOOC) which provides tremendous levels of cloud based interconnectivity. We’ve taken a look into higher education,  the increased demand for online courses as well as

Cloud Computing Offers Key Benefits For Small, Medium Businesses

Cloud Computing Offers Key Benefits For Small, Medium Businesses

A growing number of small and medium businesses in the United States rely on as a means of deploying mission-critical software products. Prior to the advent of cloud-based products — software solutions delivered over the Internet – companies were often forced to invest in servers and other products to run software and store data. The

Can I Contribute To CloudTweaks?

Yes, much of our focus in 2015 will be on working with other influencers in a collaborative manner. If you're a technology influencer looking to collaborate long term with CloudTweaks – a globally recognized leader in cloud computing information – drop us an email with “tech influencer” in the subject line.

Please review the guidelines before applying.

Whitepapers

Top Research Assets

HP OpenStack® Technology Breaking the Enterprise Barrier

HP OpenStack® Technology Breaking the Enterprise Barrier

Explore how cloud computing is a solution to the problems facing data centers today and highlights the cutting-edge technology (including OpenStack cloud computing) that HP is bringing to the current stage. If you are a CTO, data center administrator, systems architect, or an IT professional looking for an enterprise-grade, hybrid delivery cloud computing solution that’s open,

Public Cloud Flexibility, Private Cloud Security

Public Cloud Flexibility, Private Cloud Security

Public Cloud Flexibility, Private Cloud Security Cloud applications are a priority for every business – the technology is flexible, easy-to-use, and offers compelling economic benefits to the enterprise. The challenge is that cloud applications increase the potential for corporate data to leak, raising compliance and security concerns for IT. A primary security concern facing organizations moving