EOIR FOIA data processing tools
Project description
EOIR FOIA Data Processing Tool
A high-performance tool for downloading, processing, and managing U.S. immigration court data from the Department of Justice's public FOIA releases.
Overview
The Executive Office for Immigration Review (EOIR) releases anonymized immigration court data through FOIA requests. This tool automates the entire pipeline of downloading, extracting, cleaning, and loading this data into a PostgreSQL database for analysis.
Features
- Automated Downloads: Fetch the latest FOIA data releases with progress tracking
- Smart Extraction: Automatically extract and organize ZIP files
- Data Cleaning: Clean and validate CSV files with parallel processing support
- Database Management: Load data into PostgreSQL with versioned table names
- Pipeline Automation: One-command execution of the entire workflow
- Docker Support: Fully containerized with Docker Compose
- Progress Tracking: Real-time progress bars and status updates
- Incremental Updates: Only download new data when available
Requirements
- Python 3.10+
- PostgreSQL database
- Docker and Docker Compose (optional, for containerized deployment)
Installation
Install from PyPI
pip install eoir
Local Development Installation
- Clone the repository:
git clone https://github.com/marrowb/eoir.git
cd eoir
- Install the package in development mode:
pip install -e .
- Copy the environment template and configure:
cp .env.example .env
# Edit .env with your PostgreSQL credentials
Docker Installation
- Clone the repository:
git clone https://github.com/marrowb/eoir.git
cd eoir
- Copy the environment template:
cp .env.example .env
# Edit .env if needed (defaults work for Docker)
- Start the services:
docker-compose up -d
- Access the application:
docker-compose exec app bash
# Or use the run script:
./run shell
Quick Start
Using Docker (Recommended)
# Run the complete pipeline
./run eoir run-pipeline
# Or run individual commands
./run eoir download status # Check for new data
./run eoir download fetch # Download latest data
./run eoir db init # Initialize database
./run eoir clean # Clean CSV files
Local Development
# Run the complete pipeline
eoir run-pipeline
# Or run individual commands
eoir download status # Check for new data
eoir download fetch # Download latest data
eoir db init # Initialize database
eoir clean # Clean CSV files
CLI Commands
eoir download
Manage FOIA data downloads from the DOJ.
# Check if new data is available
eoir download status
# Download the latest FOIA release
eoir download fetch
# Download without extracting
eoir download fetch --no-unzip
eoir db
Database management commands.
# Initialize database and create tables
eoir db init
# Create a database dump
eoir db dump
# Dump with custom output directory
eoir db dump -o /path/to/dumps
eoir clean
Clean and process CSV files.
# Clean all CSV files in the latest download
eoir clean
# Clean with custom worker count
eoir clean --workers 16
# Clean specific input directory
eoir clean --input-dir /path/to/csvs
eoir run-pipeline
Execute the complete data pipeline.
# Run full pipeline with defaults
eoir run-pipeline
# Run with custom settings
eoir run-pipeline --workers 16 --output-dir custom_dumps
# Skip download if data exists
eoir run-pipeline --skip-download
eoir config
View configuration settings.
# Show current configuration
eoir config show
Architecture
Project Structure
eoir/
├── src/eoir/
│ ├── cli/ # Command-line interface modules
│ │ ├── download.py # Download commands
│ │ ├── db.py # Database commands
│ │ ├── clean.py # CSV cleaning commands
│ │ └── pipeline.py # Pipeline orchestration
│ ├── core/ # Core business logic
│ │ ├── download.py # Download functionality
│ │ ├── db.py # Database operations
│ │ ├── clean.py # CSV processing
│ │ └── models.py # Data models
│ ├── metadata/ # Data definitions
│ │ ├── foia_tables.sql # Database schema
│ │ └── json/ # Table and column metadata
│ ├── logging/ # Structured logging
│ └── settings.py # Configuration management
├── docker-compose.yml # Docker services
├── Dockerfile # Container definition
└── run # Development helper script
Data Flow
- Download: Fetches ZIP file from
https://fileshare.eoir.justice.gov/FOIA-TRAC-Report.zip - Extract: Unzips to timestamped directory in
downloads/ - Clean: Processes CSV files to handle encoding and data issues
- Load: Imports cleaned data into PostgreSQL with versioned table names
- Track: Records download history and file metadata
Database Schema
The tool creates versioned tables based on the download date. For example, a download on June 25th creates tables like:
foia_appeal_06_25foia_case_06_25foia_schedule_06_25
See src/eoir/metadata/foia_tables.sql for the complete schema.
Data Reference
Processed Tables
The tool processes 20 different CSV files containing various immigration court records:
| CSV File | Database Table | Description |
|---|---|---|
A_TblCase.csv |
foia_case_XX_XX |
Case information |
tblAppeal.csv |
foia_appeal_XX_XX |
Appeal records |
tbl_schedule.csv |
foia_schedule_XX_XX |
Court schedules |
B_TblProceeding.csv |
foia_proceeding_XX_XX |
Proceeding details |
tbl_EOIR_Attorney.csv |
foia_atty_XX_XX |
Attorney information |
See src/eoir/metadata/json/tables.json for the complete mapping.
Data Format
- Encoding: Latin-1
- Delimiter: Tab (
\t) - Escape Character: Backslash (
\\) - Dialect: Excel-tab
Development
Using the Run Script
The run script provides convenient commands for development:
./run eoir --help # Run EOIR CLI
./run shell # Start interactive shell
./run manage # Database management
./run psql # PostgreSQL console
./run pip install package # Install Python packages
./run yarn # Manage frontend (if applicable)
Environment Variables
Configure the following in your .env file:
# PostgreSQL Configuration
POSTGRES_USER=eoir
POSTGRES_PASSWORD=changeme
POSTGRES_DB=eoir
POSTGRES_HOST=postgres # 'postgres' for Docker, 'localhost' for local
POSTGRES_PORT=5434 # External port (internal always 5432)
# Logging
LOG_LEVEL=INFO # DEBUG, INFO, WARNING, ERROR
Docker Development
# Build and start services
docker-compose up -d --build
# View logs
docker-compose logs -f app
# Stop services
docker-compose down
# Remove all data (including database)
docker-compose down -v
Troubleshooting
Common Issues
- Port conflicts: If port 5434 is in use, change
POSTGRES_PORTin.env - Permission errors: Ensure
downloads/anddumps/directories are writable - Memory issues: Reduce worker count with
--workersflag for large files - Encoding errors: The tool handles Latin-1 encoding automatically
Debug Mode
Enable debug logging for troubleshooting:
export LOG_LEVEL=DEBUG
eoir run-pipeline
Contributing
Contributions are welcome! Please follow these guidelines:
- Fork the repository
- Create a feature branch (
git checkout -b feature-name) - Make your changes
- Run tests (if available)
- Submit a pull request
License
MIT License Copyright (c) 2025 Backlog Immigration LLC
Acknowledgments
This tool processes publicly available FOIA data from the U.S. Department of Justice Executive Office for Immigration Review.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file eoir-0.0.1.tar.gz.
File metadata
- Download URL: eoir-0.0.1.tar.gz
- Upload date:
- Size: 253.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.5.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
75d0cf03b278f357f4d3d85d318e097b102cb769b62ec35c6eddffb68d9fd567
|
|
| MD5 |
e541733a82d62af974d9eb5947d3de3b
|
|
| BLAKE2b-256 |
5ec9e925c5aabadb1fc7ee1d12ddb0bff55b5c928826260212394c7c7dbafc19
|
File details
Details for the file eoir-0.0.1-py3-none-any.whl.
File metadata
- Download URL: eoir-0.0.1-py3-none-any.whl
- Upload date:
- Size: 281.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.5.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
be964ea77f061edf4ce4e7de4396bda477865c48017892faaa9a2ba2d3104b4a
|
|
| MD5 |
6315180c0967c6d27f6ece0b9e4b2430
|
|
| BLAKE2b-256 |
b2acf99832e452cc51fb64205a522a2e4efd9cb11e026e11277873c2d7d4ff8c
|