Preserving The Cloud: The Wayback Machine

Preserving The Cloud: The Wayback Machine

Preserving The Cloud: The Wayback Machine

News broke this week that The Wayback Machine has now archived 4 hundred billion webpages. The Wayback Machine enables users to see how another website looked in the past, and after launching in 1996 the project now requires an incredible 5 petabytes of storage to maintain all its data.

The figures behind the project are impressive; The Wayback Machine’s database is queried over 1,000 times every second by over 500,000 people a day – making Archive.org the 250th most popular site on the entire internet.

wayback_machine_logo

The company’s mission is equally impressive, explicitly expressing their ideology on their website “Most societies place importance on preserving artifacts of their culture and heritage. Without such artifacts, civilization has no memory and no mechanism to learn from its successes and failures. Our culture now produces more and more artifacts in digital form. The Archive’s mission is to help preserve those artifacts and create an Internet library for researchers, historians, and scholars”.

Brewster Kahle launched the project in 1996 at the same time he started the now-famous web crawling company Alexa Internet. The project has its roots in the development of software that could crawl and download all publicly accessible World Wide Web pages, the Gopher hierarchy, the Netnews bulletin board system, and downloadable software. The archived content itself wasn’t available until 2001 but by 1999 the archive had already expanded its collections to include texts, audio, moving images and software.

Uses of such an archive are widespread. There is a day-today practical use, as evidenced when The Wayback Machine provided access to important Federal Government sites that went dark during the Federal Government shutdown in the United States. There is also an educational aspect, with importance lessons to be learned from the vast amount of big data stored within its archives. Finally, there is a historical aspect, as the development of our internet has been preserved for future generations to enjoy.

The cloud-based project has not been without its controversies. In 2012 China restored access to the database after blocking it for several years, while in the USA an activist sued the organisation for $100,000 after claiming that archiving her site breached her terms of service. The dispute was ultimately settled out of court.

The success of the project has started to attract imitators. Online companies such as Archive.It, Freezepage, and iTools all offer similar services, but not of them can offer the same quality and depth of content as The Wayback Machine.

Is this a vital project or a waste of valuable storage space? Are there ethical questions surrounding the unhibited archiving of so many sites, or are there taking a virtual photograph of events? Let us know in the comments below.

By Daniel Price

About Daniel Price

Daniel is a Manchester-born UK native who has abandoned cold and wet Northern Europe and currently lives on the Caribbean coast of Mexico. A former Financial Consultant, he now balances his time between writing articles for several industry-leading tech (CloudTweaks.com & MakeUseOf.com), sports, and travel sites and looking after his three dogs.

View Website
View All Articles

Sorry, comments are closed for this post.

Three Challenges of Network Deployment in Hyperconverged Infrastructure for Private Cloud

Three Challenges of Network Deployment in Hyperconverged Infrastructure for Private Cloud

Hyperconverged Infrastructure In this article, we’ll explore three challenges that are associated with network deployment in a hyperconverged private cloud environment, and then we’ll consider several methods to overcome those challenges. The Main Challenge: Bring Your Own (Physical) Network Some of the main challenges of deploying a hyperconverged infrastructure software solution in a data center are the diverse physical…

Three Reasons Cloud Adoption Can Close The Federal Government’s Tech Gap

Three Reasons Cloud Adoption Can Close The Federal Government’s Tech Gap

Federal Government Cloud Adoption No one has ever accused the U.S. government of being technologically savvy. Aging software, systems and processes, internal politics, restricted budgets and a cultural resistance to change have set the federal sector years behind its private sector counterparts. Data and information security concerns have also been a major contributing factor inhibiting the…

Three Factors For Choosing Your Long-term Cloud Strategy

Three Factors For Choosing Your Long-term Cloud Strategy

Choosing Your Long-term Cloud Strategy A few weeks ago I visited the global headquarters of a large multi-national company to discuss cloud strategy with the CIO. I arrived 30 minutes early and took a tour of the area where the marketing team showcased their award winning brands. I was impressed by the digital marketing strategy…

Using Private Cloud Architecture For Multi-Tier Applications

Using Private Cloud Architecture For Multi-Tier Applications

Cloud Architecture These days, Multi-Tier Applications are the norm. From SharePoint’s front-end/back-end configuration, to LAMP-based websites using multiple servers to handle different functions, a multitude of apps require public and private-facing components to work in tandem. Placing these apps in entirely public-facing platforms and networks simplifies the process, but at the cost of security vulnerabilities. Locating everything…

Using Cloud Technology In The Education Industry

Using Cloud Technology In The Education Industry

Education Tech and the Cloud Arguably one of society’s most important functions, teaching can still seem antiquated at times. Many schools still function similarly to how they did five or 10 years ago, which is surprising considering the amount of technical innovation we’ve seen in the past decade. Education is an industry ripe for innovation…

Beacons Flopped, But They’re About to Flourish in the Future

Beacons Flopped, But They’re About to Flourish in the Future

Cloud Beacons Flying High When Apple debuted cloud beacons in 2013, analysts predicted 250 million devices capable of serving as iBeacons would be found in the wild within weeks. A few months later, estimates put the figure at just 64,000, with 15 percent confined to Apple stores. Beacons didn’t proliferate as expected, but a few…

Adopting A Cohesive GRC Mindset For Cloud Security

Adopting A Cohesive GRC Mindset For Cloud Security

Cloud Security Mindset Businesses are becoming wise to the compelling benefits of cloud computing. When adopting cloud, they need a high level of confidence in how it will be risk-managed and controlled, to preserve the security of their information and integrity of their operations. Cloud implementation is sometimes built up over time in a business,…

Disaster Recovery – A Thing Of The Past!

Disaster Recovery – A Thing Of The Past!

Disaster Recovery  Ok, ok – I understand most of you are saying disaster recovery (DR) is still a critical aspect of running any type of operations. After all – we need to secure our future operations in case of disaster. Sure – that is still the case but things are changing – fast. There are…