RYE HOWARD-STONE

Document forensics for investigations too big to read by hand.

I run your corpus — FOIA dumps, court records, leaked archives, public-records scrapes — and return a sourced findings memo with stable document IDs, reproducible queries, and alteration forensics. You keep the byline. You verify everything.

Book a free 30-minute call Free. We scope your corpus and your question.

PROOF What reading the whole corpus surfaces — an actual finding.

$ casestack scan --case doj-epstein-2025 --check gaps
comparing document identifier sequence across federal releases…

  FINDING   ~64,000 identifiers present in the Release 1 index
            are absent from Release 2. Removed without notice.
  SCOPE     1,380,000 documents / ~3,000,000 pages — full read, not sampled.
  METHOD    deterministic identifier-sequence diff. Reproducible:
            casestack scan --check gaps --diff release1 release2

Surfaced on epstein-data.com — the 218 GB DOJ release, made full-text searchable and used by reporters at WIRED and NPR. A person can't read 3 million pages. The machine can. Every claim it returns carries a stable document ID and a query you can re-run.

§ 01 Who this is for

Solo investigative journalists and 2–8 person newsrooms sitting on a document set too large to read before deadline.
Independent and Substack investigators competing against better-resourced outlets.
Anyone holding a FOIA release, court archive, or leak that contains a story they can't physically extract alone.

§ 02 What you get

A sourced findings memo — every claim tied to a stable document ID, a reproducible query, and a page reference.
A searchable full-text database of your corpus — the same engine behind epstein-data.com, pointed at your documents. Built on CaseStack, open source.
Alteration & gap forensics where applicable — what was removed, changed, or is missing, with a methodology you can print.

§ 03 How it works

01 Send me the corpus — or where it lives — and the question you're chasing.
02 I run it and return the findings memo with stable citations.
03 You verify, you publish, you keep the byline. The methodology stays open.

§ 04 Engagements

Search Build

$400 flat

Your document set — up to a few thousand pages — ingested, deduplicated, OCR'd, and handed back as a searchable database, with a 20-minute orientation. The fast one.

Buy now

Findings Memo

from $1,500 scoped on the call

The full forensic pass: entity linking, deduplication, gap and alteration analysis, and a sourced memo where every claim is citable and reproducible. Journalism-grade output.

Book the call

Large corpus / commercial

Custom quoted after scoping

Hundred-gigabyte-class releases, ongoing engagements, or running CaseStack commercially in-house. We scope it together on the call.

Book the call

§ 05 Who's doing the work

I built epstein-data.com — 1.38 million documents, roughly 3 million pages, full-text searchable, used by reporters at WIRED and NPR. I found about 64,000 documents silently removed from a federal release, because I could read the whole corpus and a person can't. I hold a PhD in computer science and have spent a decade on large-scale document and sequence analysis. Everything here is done by me, not outsourced.

§ 06 Questions

How long does it take?: Scoped on the call — it depends on corpus size and the question. Small sets turn around in days.
What do you need from me?: The documents (or portal access) and the question you're chasing.
Can my editor trust the output?: Every claim carries a stable document ID and a reproducible query. You verify before you print; the methodology is open.
What if my document set is huge, weird, or encrypted?: That's the normal case. Book the call and we'll scope it.

If you're sitting on documents you can't read in time, the story doesn't wait.

Book a free 30-minute call