Preserving The Cloud: The Wayback Machine

Preserving The Cloud: The Wayback Machine

Preserving The Cloud: The Wayback Machine

News broke this week that The Wayback Machine has now archived 4 hundred billion webpages. The Wayback Machine enables users to see how another website looked in the past, and after launching in 1996 the project now requires an incredible 5 petabytes of storage to maintain all its data.

The figures behind the project are impressive; The Wayback Machine’s database is queried over 1,000 times every second by over 500,000 people a day – making Archive.org the 250th most popular site on the entire internet.

wayback_machine_logo

The company’s mission is equally impressive, explicitly expressing their ideology on their website “Most societies place importance on preserving artifacts of their culture and heritage. Without such artifacts, civilization has no memory and no mechanism to learn from its successes and failures. Our culture now produces more and more artifacts in digital form. The Archive’s mission is to help preserve those artifacts and create an Internet library for researchers, historians, and scholars”.

Brewster Kahle launched the project in 1996 at the same time he started the now-famous web crawling company Alexa Internet. The project has its roots in the development of software that could crawl and download all publicly accessible World Wide Web pages, the Gopher hierarchy, the Netnews bulletin board system, and downloadable software. The archived content itself wasn’t available until 2001 but by 1999 the archive had already expanded its collections to include texts, audio, moving images and software.

Uses of such an archive are widespread. There is a day-today practical use, as evidenced when The Wayback Machine provided access to important Federal Government sites that went dark during the Federal Government shutdown in the United States. There is also an educational aspect, with importance lessons to be learned from the vast amount of big data stored within its archives. Finally, there is a historical aspect, as the development of our internet has been preserved for future generations to enjoy.

The cloud-based project has not been without its controversies. In 2012 China restored access to the database after blocking it for several years, while in the USA an activist sued the organisation for $100,000 after claiming that archiving her site breached her terms of service. The dispute was ultimately settled out of court.

The success of the project has started to attract imitators. Online companies such as Archive.It, Freezepage, and iTools all offer similar services, but not of them can offer the same quality and depth of content as The Wayback Machine.

Is this a vital project or a waste of valuable storage space? Are there ethical questions surrounding the unhibited archiving of so many sites, or are there taking a virtual photograph of events? Let us know in the comments below.

By Daniel Price

About Daniel Price

Daniel is a Manchester-born UK native who has abandoned cold and wet Northern Europe and currently lives on the Caribbean coast of Mexico. A former Financial Consultant, he now balances his time between writing articles for several industry-leading tech (CloudTweaks.com & MakeUseOf.com), sports, and travel sites and looking after his three dogs.

Find out more
View All Articles

Sorry, comments are closed for this post.

Comic
Connected Vehicles: Paving The Way For IoT On Wheels

Connected Vehicles: Paving The Way For IoT On Wheels

Connected Vehicles From cars to combines, the IoT market potential of connected vehicles is so expansive that it will even eclipse that of the mobile phone. Connected personal vehicles will be the final link in a fully connected IoT ecosystem. This is an incredibly important moment to capitalize on given how much time people spend…

Embedded Sensors and the Wearable Personal Cloud

Embedded Sensors and the Wearable Personal Cloud

The Wearable Personal Cloud Wearable tech is one avenue of technology that’s encouraging cloud connections and getting us all onto interconnected networks, and with the continued miniaturization and advancement of computing the types of wearable tech are always expanding and providing us with new opportunities. A few years ago, smartwatches were rather clunky devices with…

SWIFT Says Bank Hacks Set To Increase

SWIFT Says Bank Hacks Set To Increase

Bank Hacks Set To Increase SWIFT, whose messaging network is used by banks to send payment instructions worth trillions of dollars each day, said three clients were hacked over the summer and cyber attacks on banks are set to increase. The theft of $81 million in February from Bangladesh’s central bank using SWIFT messages rocked…

Security: Avoiding A Hatton Garden-Style Data Center Heist

Security: Avoiding A Hatton Garden-Style Data Center Heist

Data Center Protection In April 2015, one of the world’s biggest jewelry heists occurred at the Hatton Garden Safe Deposit Company in London. Posing as workmen, the criminals entered the building through a lift shaft and cut through a 50cm-thick concrete wall with an industrial power drill. Once inside, the criminals had free and unlimited…

The Lighter Side Of The Cloud – Data Merge

The Lighter Side Of The Cloud – Data Merge

By Christian Mirra Please feel free to share our comics via social media networks such as Twitter, Facebook, LinkedIn, Instagram, Pinterest. Clear attribution (Twitter example: via @cloudtweaks) to our original comic sources is greatly appreciated.

Connecting With Customers In The Cloud

Connecting With Customers In The Cloud

Customers in the Cloud Global enterprises in every industry are increasingly turning to cloud-based innovators like Salesforce, ServiceNow, WorkDay and Aria, to handle critical systems like billing, IT services, HCM and CRM. One need look no further than Salesforce’s and Amazon’s most recent earnings report, to see this indeed is not a passing fad, but…

Data Breaches: Incident Response Planning – Part 1

Data Breaches: Incident Response Planning – Part 1

Incident Response Planning – Part 1 The topic of cybersecurity has become part of the boardroom agendas in the last couple of years, and not surprisingly — these days, it’s almost impossible to read news headlines without noticing yet another story about a data breach. As cybersecurity shifts from being a strictly IT issue to…

Which Is Better For Your Company: Cloud-Based or On-Premise ERP Deployment?

Which Is Better For Your Company: Cloud-Based or On-Premise ERP Deployment?

Cloud-Based or On-Premise ERP Deployment? You know how enterprise resource management (ERP) can improve processes within your supply chain, and the things to keep in mind when implementing an ERP system. But do you know if cloud-based or on-premise ERP deployment is better for your company or industry? While cloud computing is becoming more and…

Cloud Services Providers – Learning To Keep The Lights On

Cloud Services Providers – Learning To Keep The Lights On

The True Meaning of Availability What is real availability? In our line of work, cloud service providers approach availability from the inside out. And in many cases, some never make it past their own front door given how challenging it is to keep the lights on at home let alone factors that are out of…

HOW THE CFAA RULING AFFECTS INDIVIDUALS AND PASSWORD-SHARING

HOW THE CFAA RULING AFFECTS INDIVIDUALS AND PASSWORD-SHARING

Individuals and Password-Sharing With the 1980s came the explosion of computing. In 1980, the Commodore ushered in the advent of home computing. Time magazine declared 1982 was “The Year of the Computer.” By 1983, there were an estimated 10 million personal computers in the United States alone. As soon as computers became popular, the federal government…

Cloud Infographic – The Future (IoT)

Cloud Infographic – The Future (IoT)

The Future (IoT) By the year 2020, it is being predicted that 40 to 80 billion connected devices will be in use. The Internet of Things or IoT will transform your business and home in many truly unbelievable ways. The types of products and services that we can expect to see in the next decade…

How Data Science And Machine Learning Is Enabling Cloud Threat Protection

How Data Science And Machine Learning Is Enabling Cloud Threat Protection

Data Science and Machine Learning Security breaches have been consistently rising in the past few years. Just In 2015, companies detected 38 percent more security breaches than in the previous year, according to PwC’s Global State of Information Security Survey 2016. Those breaches are a major expense — an average of $3.79 million per company,…

Driving Success: 6 Key Metrics For Every Recurring Revenue Business

Driving Success: 6 Key Metrics For Every Recurring Revenue Business

Recurring Revenue Business Metrics Recurring revenue is the secret sauce behind the explosive growth of powerhouses like Netflix and Uber. Unsurprisingly, recurring revenue is also quickly gaining ground in more traditional industries like healthcare and the automotive business. In fact, nearly half of U.S. businesses have adopted or are planning to adopt a recurring revenue model,…

What Top SaaS Vendors Do To Ensure Successful Onboarding

What Top SaaS Vendors Do To Ensure Successful Onboarding

What Top SaaS Vendors Do I am not going to mention names in this article, but if you want to be the best, you must look at what the best do – and do it better. The importance of investing in SaaS onboarding can be easily overlooked in favor of designing efficient and powerful software…

Four Reasons Why CIOs Must Transform IT Into ITaaS To Survive

Four Reasons Why CIOs Must Transform IT Into ITaaS To Survive

CIOs Must Transform IT The emergence of the Cloud and its three delivery models of Infrastructure as a Service (IaaS), Software as a Service (SaaS) and Platform as a Service (PaaS) has dramatically impacted and forever changed the delivery of IT services. Cloud services have pierced the veil of IT by challenging traditional method’s dominance…

Five Cloud Questions Every CIO Needs To Know How To Answer

Five Cloud Questions Every CIO Needs To Know How To Answer

The Hot Seat Five cloud questions every CIO needs to know how to answer The cloud is a powerful thing, but here in the CloudTweaks community, we already know that. The challenge we have is validating the value it brings to today’s enterprise. Below, let’s review five questions we need to be ready to address…