Open-source research infrastructure for AI-assisted empirical
economics. The substance lives in arcanum-workspace
on GitHub. The descriptions below are high-level overviews; click
through to the repository for the working code.
A pipeline for extracting structured data from scanned PDFs at archival quality, combining direct agent vision for tables, equations, and figures with a multi-engine consensus engine for body text. Designed for the Anwar Shaikh archival project — see TNSE — and reused for scholarly books, regulatory filings, and historical scanned material across the workspace.
A set of templates for agent-driven empirical data construction projects, designed so that the final pipeline is reproducible by anyone — without an AI agent in the loop. A researcher clones the package, sets API keys, runs the master script, and gets the same data with the same hashes. Used at scale on the Capitalism Data Replication project and on the Shaikh-Tonak national-accounts extension.
A language-agnostic eight-phase architecture for empirical research projects. Script-name prefixes encode the phase (setup, load, process, validate, manual adjust, analyze, output, explore), so anyone reading the file system can reconstruct the dependency graph. Phase-gated execution prevents skipping validation; a single registry file is the source of truth; manual adjustments are explicit and justified rather than silent.
Empirical economics done with AI assistance can produce work that other researchers cannot reproduce, audit, or extend. The tools above are an attempt to do AI-augmented research without sacrificing the standards the discipline expects. They are open-source because the underlying research questions are, and the infrastructure required to ask them properly should be too.