Preserving The Cloud: The Wayback Machine

Preserving The Cloud: The Wayback Machine

Preserving The Cloud: The Wayback Machine

News broke this week that The Wayback Machine has now archived 4 hundred billion webpages. The Wayback Machine enables users to see how another website looked in the past, and after launching in 1996 the project now requires an incredible 5 petabytes of storage to maintain all its data.

The figures behind the project are impressive; The Wayback Machine’s database is queried over 1,000 times every second by over 500,000 people a day – making Archive.org the 250th most popular site on the entire internet.

wayback_machine_logo

The company’s mission is equally impressive, explicitly expressing their ideology on their website “Most societies place importance on preserving artifacts of their culture and heritage. Without such artifacts, civilization has no memory and no mechanism to learn from its successes and failures. Our culture now produces more and more artifacts in digital form. The Archive’s mission is to help preserve those artifacts and create an Internet library for researchers, historians, and scholars”.

Brewster Kahle launched the project in 1996 at the same time he started the now-famous web crawling company Alexa Internet. The project has its roots in the development of software that could crawl and download all publicly accessible World Wide Web pages, the Gopher hierarchy, the Netnews bulletin board system, and downloadable software. The archived content itself wasn’t available until 2001 but by 1999 the archive had already expanded its collections to include texts, audio, moving images and software.

Uses of such an archive are widespread. There is a day-today practical use, as evidenced when The Wayback Machine provided access to important Federal Government sites that went dark during the Federal Government shutdown in the United States. There is also an educational aspect, with importance lessons to be learned from the vast amount of big data stored within its archives. Finally, there is a historical aspect, as the development of our internet has been preserved for future generations to enjoy.

The cloud-based project has not been without its controversies. In 2012 China restored access to the database after blocking it for several years, while in the USA an activist sued the organisation for $100,000 after claiming that archiving her site breached her terms of service. The dispute was ultimately settled out of court.

The success of the project has started to attract imitators. Online companies such as Archive.It, Freezepage, and iTools all offer similar services, but not of them can offer the same quality and depth of content as The Wayback Machine.

Is this a vital project or a waste of valuable storage space? Are there ethical questions surrounding the unhibited archiving of so many sites, or are there taking a virtual photograph of events? Let us know in the comments below.

By Daniel Price

About Daniel Price

Daniel is a Manchester-born UK native who has abandoned cold and wet Northern Europe and currently lives on the Caribbean coast of Mexico. A former Financial Consultant, he now balances his time between writing articles for several industry-leading tech (CloudTweaks.com & MakeUseOf.com), sports, and travel sites and looking after his three dogs.

Find out more
View All Articles

Sorry, comments are closed for this post.

Controversial Chinese Cybersecurity Law Under Review Again

Controversial Chinese Cybersecurity Law Under Review Again

Cybersecurity Law BEIJING. The National People’s Congress, the equivalence of the Chinese Parliament, moved forward in drafting a second version of a controversial cybersecurity law first introduced almost a year ago. This means the law is thought to be closer to passing and will bring greater censorship for both foreign and domestic citizens and businesses.…

Personal Account of Google CEO Compromised

Personal Account of Google CEO Compromised

Personal Account Compromised The security of our information online, whether it’s our banking details, emails or personal information, is important. Hackers pose a very real threat to our privacy when there are vulnerabilities in the security of the services we use online. It can be worrying then when the CEO of perhaps the largest holder…

How You Can Improve Customer Experience With Fast Data Analytics

How You Can Improve Customer Experience With Fast Data Analytics

Fast Data Analytics In today’s constantly connected world, customers expect more than ever before from the companies they do business with. With the emergence of big data, businesses have been able to better meet and exceed customer expectations thanks to analytics and data science. However, the role of data in your business’ success doesn’t end…

Data Protection and Session Fixation Attacks

Data Protection and Session Fixation Attacks

Keeping the man out of the middle: preventing session fixation attacks In a nutshell, session fixation is a type of man in the middle attack where an attacker is able to pretend to be a victim using a session variable. For instance, let’s say you have an application that uses sessions to validate the user.…

E-Commerce Advances For Savvy Marketers

E-Commerce Advances For Savvy Marketers

Digital Marketing Platforms Advertising and marketing techniques have progressed rapidly in the last decade with both channel focus and the direction of content shifting considerably due primarily to advances in cloud technology. Gartner’s Magic Quadrant for Digital Commerce 2016 singles out a few ecommerce providers who are topping their sector in both ability to execute…

Cloud Infographic – Big Data Analytics Trends

Cloud Infographic – Big Data Analytics Trends

Big Data Analytics Trends As data information and cloud computing continues to work together, the need for data analytics continues to grow. Many tech firms predict that big data volume will grow steadily 40% per year and in 2020, will grow up to 50 times that. This growth will also bring a number of cost…

How Data Science And Machine Learning Is Enabling Cloud Threat Protection

How Data Science And Machine Learning Is Enabling Cloud Threat Protection

Data Science and Machine Learning Security breaches have been consistently rising in the past few years. Just In 2015, companies detected 38 percent more security breaches than in the previous year, according to PwC’s Global State of Information Security Survey 2016. Those breaches are a major expense — an average of $3.79 million per company,…

Surprising Facts and Stats About The Big Data Industry

Surprising Facts and Stats About The Big Data Industry

Facts and Stats About The Big Data Industry If you start talking about big data to someone who is not in the industry, they immediately conjure up images of giant warehouses full of servers, staff poring over page after page of numbers and statistics, and some big brother-esque official sat in a huge government building…

Infographic: The Evolving Internet of Things

Infographic: The Evolving Internet of Things

Evolving Internet of Things  The Internet of Things, or IoT, a term devised in 1999 by British entrepreneur Kevin Ashton, represents the connection of physical devices, systems and services via the internet, and Gartner and Lucas Blake’s new infographic (below) explores the evolution of the IoT industry, investigating its potential impact across just about every…

Cloud Infographic: The Explosive Growth Of The Cloud

Cloud Infographic: The Explosive Growth Of The Cloud

The Explosive Growth Of The Cloud We’ve been covering cloud computing extensively over the past number of years on CloudTweaks and have truly enjoyed watching the adoption and growth of it. Many novices are still trying to wrap their mind around what the cloud it is and what it does, while others such as thought…

Cloud Computing – The Real Story Is About Business Strategy, Not Technology

Cloud Computing – The Real Story Is About Business Strategy, Not Technology

Enabling Business Strategies The cloud is not really the final destination: It’s mid-2015, and it’s clear that the cloud paradigm is here to stay. Its services are growing exponentially and, at this time, it’s a fluid model with no steady state on the horizon. As such, adopting cloud computing has been surprisingly slow and seen more…

Cloud Infographic – The Internet Of Things In 2020

Cloud Infographic – The Internet Of Things In 2020

The Internet Of Things In 2020 The growing interest in the Internet of Things is amongst us and there is much discussion. Attached is an archived but still relevant infographic by Intel which has produced a memorizing snapshot at how the number of connected devices have exploded since the birth of the Internet and PC.…