Open-source research infrastructure for AI-assisted empirical
economics. The substance lives in arcanum-workspace
on GitHub. The descriptions below are high-level overviews; click
through to the repository for the working code.
Accurate extraction of text, tables, equations, and figure descriptions from archival and scanned documents using vision-language (multimodal LLM) architectures. The protocol combines direct agent vision with a retry-and-recovery workflow so that difficult pages are extracted rather than skipped.
Reproducible economic data construction: a staged pipeline that carries a data series from source research through validated replication packages, documentation, and publication. The framework powers the Heterodata data sites, where each constructed series ships with its full provenance trail.
The standardized project and pipeline layout underlying the framework. Staged scripts (load, process, validate, manual adjust, analyze, output) encode the dependency graph in the file system; registries are the single source of truth; provenance records make every value traceable. Anyone reading the directory can reconstruct how the data was built.
The Claude Code-based research workspace and tooling that orchestrates all of the above. It collects the skill templates, workspace standards, and supporting infrastructure used across my research projects.
Empirical economics done with AI assistance can produce work that other researchers cannot reproduce, audit, or extend. The tools above are an attempt to do AI-augmented research without sacrificing the standards the discipline expects. They are open-source because the underlying research questions are, and the infrastructure required to ask them properly should be too.