The Epstein Library on the Justice Division’s web site is a mannequin of disorganization. In early December, Keller was clicking by way of the tens of 1000’s of pages of paperwork within the library and feeling “annoyed disbelief” on the chaos—recordsdata that might be a whole lot of pages lengthy, textual content that was generally blurry or sideways, a wire switch with no context, an e-mail chain with half the names blacked out, a flight log with solely initials. “It’s disorienting,” he says. “You’re studying fragments of one thing huge and making an attempt to determine which fragments matter and the way they join.”
One night time, he spent about 4 hours making an attempt to hint a single particular person’s title throughout some 30 paperwork within the archive. “I simply stopped and thought, I’m doing by hand what a database may do in milliseconds,” he says. As a builder of database infrastructure at a midsize firm, he knew precisely what to do subsequent. “I opened a code editor and began constructing. By 3 am I had a primary search prototype working towards a number of hundred paperwork,” he says.
Round that point, a web site referred to as Jmail.world was making a splash as a device for individuals to peruse Epstein’s emails as if utilizing a Gmail interface. Launched in mid-November and constructed by a gaggle of tech-savvy volunteers, it has since grown to incorporate, amongst different issues, his pictures, flights, and Amazon buy historical past, additionally displayed as if the reader is viewing Epstein’s personal accounts. Keller used the device and appreciated it. “Jmail was proof that the neighborhood may construct higher instruments than the federal government was offering,” he instructed me.
It additionally helped him hone his personal undertaking. “As a substitute of serious about one class of paperwork, I began serious about the community,” he says. “How do you join an individual who seems in an e-mail to a flight they have been on, to a wire switch, to a deposition they gave? That cross-referencing downside is what I wished to unravel.”
Then, on December 19, the Justice Division launched its first huge tranche, including a whole lot of 1000’s of latest paperwork to the present archive. Instantly, Keller’s workload ballooned to an all-time excessive. The prototype he had constructed earlier within the month grew to become the inspiration for processing all of it.
Most nights he labored till 3 or 4 am, sipping chilly espresso whereas navigating a sea of open tabs.
Due to his childhood, he says, “when the primary paperwork began dropping, I couldn’t look away. I understood at a intestine stage what was being described in these recordsdata.” Within the evenings, he’d return residence from his day job and, as soon as everybody in his household was in mattress, he’d gap up in his residence workplace and spend hours scrolling by way of downloaded PDFs.
Many paperwork have been posted as photos, and he’d run every web page by way of layers of software program to transform them into searchable textual content—generally one system would fail to transform the textual content and he’d run it by way of a second or third. Then he’d use one other system to extract necessary particulars corresponding to names, organizations, dates, and places. He’d carry out hash verification—a course of that checks whether or not the Justice Division’s recordsdata have been tampered with—and redaction evaluation, to scan for inconsistencies in how the federal government blacked out data. He tracked all his work in a meticulous, digital, color-coded ledger. “It’s not importing recordsdata,” he says. “It’s rebuilding against the law scene from 2 million fragments of proof.”

