Scrapers for comparative water law judicial decisions across Brazil, Canada, and the Netherlands (2016-2026)
Project description
Water Law Judicial Decisions Dataset
A collection of scrapers for building a comparative dataset of water law judicial decisions across Brazil (27 state courts), Canada (federal + provincial courts via CanLII), and the Netherlands (Raad van State + CBb via Rechtspraak.nl).
Scope: 2016–2026 | Cases collected: 8,368 Brazilian decisions (8 courts) + Netherlands
Repository Structure
water-law-dataset/
├── scrapers/
│ ├── brazil/ # One scraper per accessible TJ court
│ │ ├── tjac_scraper.py (TJAC – Acre, ESAJ POST)
│ │ ├── tjdft_scraper.py (TJDFT – Brasília, Elasticsearch REST)
│ │ ├── tjpi_scraper.py (TJPI – Piauí, Rails GET)
│ │ ├── tjrj_scraper.py (TJRJ – Rio de Janeiro, ASP.NET WebForms)
│ │ ├── tjrr_scraper.py (TJRR – Roraima, ESAJ POST)
│ │ ├── tjsc_scraper.py (TJSC – Santa Catarina, ESAJ AJAX)
│ │ └── tjto_scraper.py (TJTO – Tocantins, PHP+Solr GET)
│ ├── canada/
│ │ └── canlii_scraper.py (CanLII REST API — requires free API key)
│ └── netherlands/
│ └── rechtspraak_scraper.py (Rechtspraak Open Data — no auth)
├── utils/
│ ├── merge_national.py # Merges per-court JSON files into national CSV/XLSX
│ └── make_progress_charts.py
├── data/ # Output directory (gitignored — add your JSON/CSV here)
├── .env.example
├── requirements.txt
└── README.md
Quick Start
1. Clone and configure
git clone https://github.com/YOUR_USERNAME/water-law-dataset.git
cd water-law-dataset
cp .env.example .env
# Edit .env and set OUTPUT_DIR and any API keys you need
2. Run a scraper
All scrapers use only the Python standard library (Python 3.8+). No pip install needed for scraping.
# Set output directory (or edit .env)
export OUTPUT_DIR=./data # Linux/Mac
set OUTPUT_DIR=.\data # Windows
# Run any scraper
python scrapers/brazil/tjsc_scraper.py
python scrapers/brazil/tjdft_scraper.py
python scrapers/netherlands/rechtspraak_scraper.py
Output is written to $OUTPUT_DIR/<court>_cases_2016_2026.json.
3. CanLII (Canada) — requires API key
# Register free at https://developer.canlii.org/
export CANLII_API_KEY=your_key_here
python scrapers/canada/canlii_scraper.py
4. Merge into national dataset
pip install pandas openpyxl # only needed for merge + charts
export DATA_DIR=./data
python utils/merge_national.py
python utils/make_progress_charts.py
Brazilian Courts — Access Status
| UF | Court | Cases | Method | Status |
|---|---|---|---|---|
| SP | TJSP | 574 | ESAJ POST | ✅ Done |
| SC | TJSC | 1,224 | ESAJ AJAX | ✅ Done |
| RR | TJRR | 21 | ESAJ POST | ✅ Done |
| AC | TJAC | 33 | ESAJ POST | ✅ Done |
| PI | TJPI | 15 | Rails GET | ✅ Done |
| TO | TJTO | 17 | PHP+Solr GET | ✅ Done |
| DF | TJDFT | 5,265 | Elasticsearch REST | ✅ Done |
| RJ | TJRJ | 1,219 | ASP.NET WebForms | ✅ Done |
| MG | TJMG | — | DWR + CAPTCHA | ❌ Blocked |
| BA | TJBA | — | GraphQL (server 500) | ❌ Blocked |
| PR | TJPR | — | Full-text too broad (334K results) | ❌ Blocked |
| CE | TJCE | — | ESAJ TLS error | ❌ Blocked |
| SE | TJSE | — | JSF + Turnstile CAPTCHA | ❌ Blocked |
| ES | TJES | — | JSF + Turnstile CAPTCHA | ❌ Blocked |
| AM | TJAM | — | CAS SSO required | ❌ Blocked |
| GO | TJGO | — | React SPA, no public API | ❌ Blocked |
| RO | TJRO | — | Angular SPA, no public API | ❌ Blocked |
| MT | TJMT | — | SPA, requires JS | ❌ Blocked |
| MS | TJMS | — | ESAJ timeout | ❌ Blocked |
| RS | TJRS | — | DNS/timeout | ❌ Blocked |
| PB | TJPB | — | Cloudflare 520 | ❌ Blocked |
| AP | TJAP | — | HTTP 403 | ❌ Blocked |
| RN | TJRN | — | HTTP 403 | ❌ Blocked |
| PE | TJPE | — | Timeout/DNS | ❌ Blocked |
| PA | TJPA | — | Timeout/DNS | ❌ Blocked |
| AL | TJAL | — | Timeout/DNS | ❌ Blocked |
| MA | TJMA | — | No jurisprudência endpoint | ❌ Blocked |
Total collected: 8,368 cases from 8 courts (TJSP + TJSC + TJDFT + TJRJ + TJRR + TJAC + TJPI + TJTO)
Search Queries Used
Primary: água abastecimento fornecimento saneamento
Secondary: corte suspensão fornecimento água
Tertiary: proteção manancial recursos hídricos ambiental
Output JSON Schema
Each case record contains:
{
"tribunal": "TJSC",
"estado": "SC",
"num_processo": "0001234-56.2022.8.24.0001",
"data_julgamento": "2022-03-15",
"ano": 2022,
"classe": "Apelação Cível",
"camara_orgao": "1ª Câmara de Direito Público",
"relator": "Des. João Silva",
"ementa": "DIREITO À ÁGUA. Fornecimento. ...",
"url": "https://..."
}
Legal Note
These scrapers query publicly accessible jurisprudência portals. All decisions are public court records. This dataset is intended for academic comparative law research.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file water_law_dataset-0.1.0.tar.gz.
File metadata
- Download URL: water_law_dataset-0.1.0.tar.gz
- Upload date:
- Size: 27.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e832635a9f401f19b06a177cc24861c58b1a77f8ad35ed837aa7d56bce058868
|
|
| MD5 |
bf6883ea64296baa0d81ab30190a5267
|
|
| BLAKE2b-256 |
f839d959b681caf5e78ad9403e2f7c7cb7c420471618a3b1a867ab8e6b4f61ee
|
Provenance
The following attestation bundles were made for water_law_dataset-0.1.0.tar.gz:
Publisher:
python-publish.yml on jrklaus8/water-law-dataset
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
water_law_dataset-0.1.0.tar.gz -
Subject digest:
e832635a9f401f19b06a177cc24861c58b1a77f8ad35ed837aa7d56bce058868 - Sigstore transparency entry: 1395813134
- Sigstore integration time:
-
Permalink:
jrklaus8/water-law-dataset@1ffd1120565ac4a45b28651a430d47688020eb36 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/jrklaus8
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@1ffd1120565ac4a45b28651a430d47688020eb36 -
Trigger Event:
release
-
Statement type:
File details
Details for the file water_law_dataset-0.1.0-py3-none-any.whl.
File metadata
- Download URL: water_law_dataset-0.1.0-py3-none-any.whl
- Upload date:
- Size: 35.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
485b4d759c6764c5ac45a6758460be68ec287ea7755bb58b0d5525f532d34c13
|
|
| MD5 |
6194f05e495139b91e9c10990f4ebe9d
|
|
| BLAKE2b-256 |
023d15a3607eb15825a2df991ae6a12b5ed458bc72687ab9b83fd05f81332c4e
|
Provenance
The following attestation bundles were made for water_law_dataset-0.1.0-py3-none-any.whl:
Publisher:
python-publish.yml on jrklaus8/water-law-dataset
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
water_law_dataset-0.1.0-py3-none-any.whl -
Subject digest:
485b4d759c6764c5ac45a6758460be68ec287ea7755bb58b0d5525f532d34c13 - Sigstore transparency entry: 1395813166
- Sigstore integration time:
-
Permalink:
jrklaus8/water-law-dataset@1ffd1120565ac4a45b28651a430d47688020eb36 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/jrklaus8
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@1ffd1120565ac4a45b28651a430d47688020eb36 -
Trigger Event:
release
-
Statement type: