Skip to main content

Scrapers for comparative water law judicial decisions across Brazil, Canada, and the Netherlands (2016-2026)

Project description

Water Law Judicial Decisions Dataset

A collection of scrapers for building a comparative dataset of water law judicial decisions across Brazil (27 state courts), Canada (federal + provincial courts via CanLII), and the Netherlands (Raad van State + CBb via Rechtspraak.nl).

Scope: 2016–2026 | Cases collected: 8,368 Brazilian decisions (8 courts) + Netherlands


Repository Structure

water-law-dataset/
├── scrapers/
│   ├── brazil/            # One scraper per accessible TJ court
│   │   ├── tjac_scraper.py   (TJAC – Acre,        ESAJ POST)
│   │   ├── tjdft_scraper.py  (TJDFT – Brasília,   Elasticsearch REST)
│   │   ├── tjpi_scraper.py   (TJPI – Piauí,       Rails GET)
│   │   ├── tjrj_scraper.py   (TJRJ – Rio de Janeiro, ASP.NET WebForms)
│   │   ├── tjrr_scraper.py   (TJRR – Roraima,     ESAJ POST)
│   │   ├── tjsc_scraper.py   (TJSC – Santa Catarina, ESAJ AJAX)
│   │   └── tjto_scraper.py   (TJTO – Tocantins,   PHP+Solr GET)
│   ├── canada/
│   │   └── canlii_scraper.py  (CanLII REST API — requires free API key)
│   └── netherlands/
│       └── rechtspraak_scraper.py  (Rechtspraak Open Data — no auth)
├── utils/
│   ├── merge_national.py      # Merges per-court JSON files into national CSV/XLSX
│   └── make_progress_charts.py
├── data/                      # Output directory (gitignored — add your JSON/CSV here)
├── .env.example
├── requirements.txt
└── README.md

Quick Start

1. Clone and configure

git clone https://github.com/YOUR_USERNAME/water-law-dataset.git
cd water-law-dataset
cp .env.example .env
# Edit .env and set OUTPUT_DIR and any API keys you need

2. Run a scraper

All scrapers use only the Python standard library (Python 3.8+). No pip install needed for scraping.

# Set output directory (or edit .env)
export OUTPUT_DIR=./data        # Linux/Mac
set OUTPUT_DIR=.\data           # Windows

# Run any scraper
python scrapers/brazil/tjsc_scraper.py
python scrapers/brazil/tjdft_scraper.py
python scrapers/netherlands/rechtspraak_scraper.py

Output is written to $OUTPUT_DIR/<court>_cases_2016_2026.json.

3. CanLII (Canada) — requires API key

# Register free at https://developer.canlii.org/
export CANLII_API_KEY=your_key_here
python scrapers/canada/canlii_scraper.py

4. Merge into national dataset

pip install pandas openpyxl          # only needed for merge + charts
export DATA_DIR=./data
python utils/merge_national.py
python utils/make_progress_charts.py

Brazilian Courts — Access Status

UF Court Cases Method Status
SP TJSP 574 ESAJ POST ✅ Done
SC TJSC 1,224 ESAJ AJAX ✅ Done
RR TJRR 21 ESAJ POST ✅ Done
AC TJAC 33 ESAJ POST ✅ Done
PI TJPI 15 Rails GET ✅ Done
TO TJTO 17 PHP+Solr GET ✅ Done
DF TJDFT 5,265 Elasticsearch REST ✅ Done
RJ TJRJ 1,219 ASP.NET WebForms ✅ Done
MG TJMG DWR + CAPTCHA ❌ Blocked
BA TJBA GraphQL (server 500) ❌ Blocked
PR TJPR Full-text too broad (334K results) ❌ Blocked
CE TJCE ESAJ TLS error ❌ Blocked
SE TJSE JSF + Turnstile CAPTCHA ❌ Blocked
ES TJES JSF + Turnstile CAPTCHA ❌ Blocked
AM TJAM CAS SSO required ❌ Blocked
GO TJGO React SPA, no public API ❌ Blocked
RO TJRO Angular SPA, no public API ❌ Blocked
MT TJMT SPA, requires JS ❌ Blocked
MS TJMS ESAJ timeout ❌ Blocked
RS TJRS DNS/timeout ❌ Blocked
PB TJPB Cloudflare 520 ❌ Blocked
AP TJAP HTTP 403 ❌ Blocked
RN TJRN HTTP 403 ❌ Blocked
PE TJPE Timeout/DNS ❌ Blocked
PA TJPA Timeout/DNS ❌ Blocked
AL TJAL Timeout/DNS ❌ Blocked
MA TJMA No jurisprudência endpoint ❌ Blocked

Total collected: 8,368 cases from 8 courts (TJSP + TJSC + TJDFT + TJRJ + TJRR + TJAC + TJPI + TJTO)


Search Queries Used

Primary: água abastecimento fornecimento saneamento
Secondary: corte suspensão fornecimento água
Tertiary: proteção manancial recursos hídricos ambiental


Output JSON Schema

Each case record contains:

{
  "tribunal": "TJSC",
  "estado": "SC",
  "num_processo": "0001234-56.2022.8.24.0001",
  "data_julgamento": "2022-03-15",
  "ano": 2022,
  "classe": "Apelação Cível",
  "camara_orgao": "1ª Câmara de Direito Público",
  "relator": "Des. João Silva",
  "ementa": "DIREITO À ÁGUA. Fornecimento. ...",
  "url": "https://..."
}

Legal Note

These scrapers query publicly accessible jurisprudência portals. All decisions are public court records. This dataset is intended for academic comparative law research.


License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

water_law_dataset-0.1.0.tar.gz (27.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

water_law_dataset-0.1.0-py3-none-any.whl (35.4 kB view details)

Uploaded Python 3

File details

Details for the file water_law_dataset-0.1.0.tar.gz.

File metadata

  • Download URL: water_law_dataset-0.1.0.tar.gz
  • Upload date:
  • Size: 27.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for water_law_dataset-0.1.0.tar.gz
Algorithm Hash digest
SHA256 e832635a9f401f19b06a177cc24861c58b1a77f8ad35ed837aa7d56bce058868
MD5 bf6883ea64296baa0d81ab30190a5267
BLAKE2b-256 f839d959b681caf5e78ad9403e2f7c7cb7c420471618a3b1a867ab8e6b4f61ee

See more details on using hashes here.

Provenance

The following attestation bundles were made for water_law_dataset-0.1.0.tar.gz:

Publisher: python-publish.yml on jrklaus8/water-law-dataset

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file water_law_dataset-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for water_law_dataset-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 485b4d759c6764c5ac45a6758460be68ec287ea7755bb58b0d5525f532d34c13
MD5 6194f05e495139b91e9c10990f4ebe9d
BLAKE2b-256 023d15a3607eb15825a2df991ae6a12b5ed458bc72687ab9b83fd05f81332c4e

See more details on using hashes here.

Provenance

The following attestation bundles were made for water_law_dataset-0.1.0-py3-none-any.whl:

Publisher: python-publish.yml on jrklaus8/water-law-dataset

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page