Why Were New York Government Websites Hidden From an Internet Archive for 13 Years?
Someone in New York State government apparently didn't want the Wayback Machine archiving their goods.
The nonprofit Internet Archive has an impossibly ambitious mission: to save a copy of every last piece of the public internet, forever, and to make the records freely available for anyone to use.
The archive currently stands at 2 petabytes, which is more than all the text contained in the Library of Congress. They've been treating the internet — the fleeting, here-today-gone-tomorrow internet — as something worth preserving.
Since its inception in 1996, the organization has become a critical resource for academics and researchers interested in the internet as a cultural repository. One part of the project, called the Wayback Machine, has been especially popular. It's like a time capsule for the Web, preserving copies of billions of pages, as they are, at a moment in time.
It couldn't be simpler; type in a URL, and the Wayback Machine will display snapshots of that URL on various dates. It's especially useful for looking back at deleted information, which has made the Wayback Machine an indispensable tool for journalists. Some politician decided to remove a particularly blockheaded press release from his site? The Wayback Machine sees all, and preserves every misstep.
So it's remarkable that for the past 13 years, some of the most important websites of New York State's government have been deliberately excluded from the archive, their records hidden from public view. Sixty-three "state.ny.us" addresses, to be exact, including the site of the New York State Assembly. Even more remarkably, the problem wasn't noticed until now.
The exclusion wasn't an oversight by the archive itself. The group's servers grab virtually every public webpage by default. According to the Archive's Chris Butler, someone within the New York State government requested, way back in 2001, that a broad swath of domains be eliminated from the archiving process.
"It was at a time when the Wayback Machine had only been public for a short time," Butler said, "for less than a year." Butler explained that the situation was particularly odd because government websites are among the group's highest priority for preservation. "They're obviously really important records."
Who requested that the sites be removed? Butler isn't exactly sure. And why? He's not sure about that either.
The archive is a pretty accommodating bunch; they'll exclude any website upon request by the owner or manager, and they don't demand an explanation. They do know it had something to do with the attacks of 9-11 — security concerns, presumably — but other than that, there aren't any notes in their system that would shed more light on the mystery.
How exactly this went unnoticed for 13 years isn't clear, either. Butler said the group has never received a call about it, although someone must have tried to look at an archived version of one of these sites at some point in the past decade.
The Voice has been unable to reach anyone in state government that might be able to clear things up. New York State's Office of Information Technology Services wasn't sure who would have been responsible for such a request. And this many years on, it's possible we'll never know the story.
The good news is that the Internet Archive has been backing up the websites all along — they just haven't been adding them to their public interface — and after a call from a reporter, public access has been restored.
Now you can go back and see what the Assembly's "World Wide Web site" looked like deep in the hazy past of Dec. 8, 1996, when websites were simple and capitalization errant. (There was a new feature in the Assembly directory on that date, since so many members had begun using their "Internet email addresses.") You can also see the messages posted by the Assembly just after 9-11. Or maybe something less depressing. (If you do decide to check it out, you may still get a message that says, "Sorry, this URL has been excluded." Just hit Refresh.)
And if you haven't played with the Wayback Machine before, you should. It's a pretty entertaining window into the days of crackling dial-up connections and stark, bare-bones layouts.
And click here to see the story of the original Wayback Machine.
Send any tips about the Great Wayback Mystery to firstname.lastname@example.org, or @j0ncampbell, if you're into the whole Twittery thing.
Get the This Week's Top Stories Newsletter