The week of May 8, 2016

The guerrilla archivists saving the Internet’s dying websites from oblivion

By Jesse Hicks

It’s probably safe to say that no matter how conscientious a citizen of the Internet you consider yourself to be, you’re not that worried about the impending demise of myVIP, the second-largest social network in Hungary. The site started in 2006 and quickly reached 2.5 million users—not bad in a country of just under 10 million people—only to see its future slowly choked away by the social networking behemoth that is Facebook. Today, myVIP is on its way to becoming a ghost town, populated only by abandoned profiles from yesteryear. And then, soon enough, it’ll probably just fade away: servers unplugged, data deleted, leaving behind not much more than hazy memories.

It’s the story of many a website that’s failed to pay its way. It’s almost a law of the Internet that things disappear without a trace. On the slow, quotidian level it’s called “link rot,” but more rarely, it’s entire sites winking out. Social networks, dependent as they are on being maintained by actual, you know, people, are especially vulnerable to becoming uncool, declining, and then vanishing completely. It’s easy to see this as inevitable.

Jason Scott, however, wants you to think otherwise. Seven years ago he was a co-founder of Archive Team, an ad hoc group of volunteers who came together when Yahoo announced it would be closing GeoCities. The search giant—this was 2009, remember—had bought GeoCities a decade earlier, at the tail end of the dot-com boom. The site, which offered free webpages to anyone who wanted to experiment with building a homepage, grew to at least 38 million pages. It was a massive trove of Web 1.0 ephemera; just check out Scott’s collection of “under construction” GIFs harvested from the site for a glimpse of a very different Internet.

And this trove was almost lost when Yahoo decided to pull the plug, in a decision Time called an “Internet atrocity.” According to accounts from the time, Yahoo never gave much explanation for the choice, nor did it respond to requests to help archive the data. But Archive Team and others scrambled to download as much of the GeoCities data as they could; today the site lives on through several mirrors, including a massive 652-gigabyte torrent file.

“Yahoo is untrustworthy, let’s do a census of all their products.”

It was an early, if qualified, victory. Perhaps most importantly, it brought together a group of like-minded people who could build the resources to perform this kind of ad hoc archiving whenever it’s necessary. And from their list of ongoing projects, it seems to be plenty necessary. Besides myVIP, which the Archive Team project main page suggests is “now on the verge of disappearing,” there’s GameFront, a host for video game files and news; LiveJournal, the blog community nearly as old as GeoCities; and Blingee, the image-hosting site that let users glitterbomb their photos.

You might expect Archive Team to have its eye on long-in-the-tooth sites like those, and you wouldn’t be far off. And, of course, it’s watching everything Yahoo owns, noting, “Yahoo is untrustworthy, let’s do a census of all their products.” But it’s also considering archiving Orkut, the Google-owned social network that failed, and the Pirate Bay, whose legal troubles make it likely to one day disappear. The group also looked at archiving RadioShack sites after the company filed for bankruptcy, and Scott says now that Ted Cruz has dropped out of the Republican primary race, they’ll be looking to preserve his campaign websites.

That’s just a small sample of the eclectic range of sites Archive Team covers. As Scott explains it, that broad inclusiveness reflects the Archive Team’s core philosophy. Some people, he says, might ask why: why save 500 gigabytes of Army manuals? Why stockpile hip-hop mixtapes, old fishing magazines, and video game manuals? What’s it all for?


The answer, he says, is something close to “because we can.” He offers a comparison to analog archives, which often have to revisit their collections and make a case for keeping or selling off different pieces. It’s a function of having finite space and resources. You can only fit so many boxes of papers into a physical archive, no matter how large or well-financed it is.

With digital archives, that’s less of a concern. Massive amounts of data can be stored on cheap hard drives indefinitely, which postpones the question of “Is this something worth having?” As he puts it, “the cost of having them versus not having them is so miniscule, it’s worth it.” That said, he does note that the Internet Archive, which becomes the repository for much of Archive Team’s work, does sometimes point out the cost of saving it all. The Internet Archive is a nonprofit, and donation reminders to support the Archive feature prominently on nearly every Archive Team webpage.

It’s nice to be able to archive almost anything that seems even marginally worth keeping. That wasn’t always the case. The GeoCities project required learning on the fly and scrounging up resources, and Scott says, even today, the team is like a crew of EMTs: With every new project they have to find the best way to save the patient. And they do. “There are many good people, both directly and indirectly, doing good work with the Archive Team,” Scott says, and today they have what he calls “world-class tools.” Their ArchiveBot can pull down between 100 and 400 gigabytes a day. Almost anyone can donate their time and computing power, and the whole organization runs pretty organically.

“There’s nothing about startup culture that convinces me it’s not just a bunch of sociopaths who have learned a new way to run a lemonade stand.”

Scott says that these days he considers himself a mascot rather than a co-founder. He’s frustrated that more tech companies don’t respect their users enough to respect their data, to safeguard it in any meaningful way. They shut down or fail, and who knows where the data goes. He’d like to see more companies try to preserve their histories. But, he says, “On the whole there’s not financial or legal incentive for them to do so, so they don’t.”

It’s part of what he sees as a larger, almost pathological disregard for end users. Startups shouldn’t be trusted with your data, he says. “There’s nothing about startup culture that convinces me it’s not just a bunch of sociopaths who have learned a new way to run a lemonade stand.”

Until that broader environment changes, he sees a continuing need for Archive Team and its work. It’s been seven years since the Internet atrocity that was the shuttering of GeoCities, but the same problems remain. “I am a tiny cog in this machine now,” Scott says. “This thing has grown far beyond what I ever imagined, and has expanded into areas I would never have expected. It’s a shame it needs to exist. But I’m very glad it does.”

Illustrations by Max Fleishman