AI Tools

Open-source research infrastructure for AI-assisted empirical economics. The substance lives in arcanum-workspace on GitHub. The descriptions below are high-level overviews; click through to the repository for the working code.

A PDF Processing Protocol for Archival Sources

A pipeline for extracting structured data from scanned PDFs at archival quality, combining direct agent vision for tables, equations, and figures with a multi-engine consensus engine for body text. Designed for the Anwar Shaikh archival project — see TNSE — and reused for scholarly books, regulatory filings, and historical scanned material across the workspace.

Source on GitHub →

A Framework for Reproducible Economic Data Construction

A set of templates for agent-driven empirical data construction projects, designed so that the final pipeline is reproducible by anyone — without an AI agent in the loop. A researcher clones the package, sets API keys, runs the master script, and gets the same data with the same hashes. Used at scale on the Capitalism Data Replication project and on the Shaikh-Tonak national-accounts extension.

Source on GitHub →

A Pipeline Architecture for Empirical Research

A language-agnostic eight-phase architecture for empirical research projects. Script-name prefixes encode the phase (setup, load, process, validate, manual adjust, analyze, output, explore), so anyone reading the file system can reconstruct the dependency graph. Phase-gated execution prevents skipping validation; a single registry file is the source of truth; manual adjustments are explicit and justified rather than silent.

Source on GitHub →

Why Open-Source

Empirical economics done with AI assistance can produce work that other researchers cannot reproduce, audit, or extend. The tools above are an attempt to do AI-augmented research without sacrificing the standards the discipline expects. They are open-source because the underlying research questions are, and the infrastructure required to ask them properly should be too.

Full source on GitHub →