A modern EUR-Lex parser for Python - fetch and parse EU legal documents
Project description
eurlxp
A modern EUR-Lex parser for Python. Fetch and parse EU legal documents with async support, type hints, and a CLI.
Note: This is a modern rewrite inspired by kevin91nl/eurlex, built with UV, httpx, Pydantic, and Typer.
Features
- Modern Python - Supports Python 3.10-3.14
- Async support - Fetch multiple documents concurrently
- Type hints - Full type annotations for IDE support
- CLI - Command-line interface with Typer
- Pydantic models - Validated, structured data
- Drop-in compatible - Same API as the original eurlex package
Installation
# Using pip
pip install eurlxp
# Using uv
uv add eurlxp
# With SPARQL support (required for get_celex_dataframe, run_query, get_regulations, etc.)
pip install eurlxp[sparql]
# or
uv add eurlxp[sparql]
Note: SPARQL functions (
get_celex_dataframe,run_query,get_regulations,get_documents,guess_celex_ids_via_eurlex) require the optionalsparqldependencies. If you seeImportError: SPARQL dependencies not installed, install withpip install eurlxp[sparql].
How It Works
This package fetches EU legal documents from EUR-Lex using their public HTML endpoints:
https://eur-lex.europa.eu/legal-content/{LANG}/TXT/HTML/?uri=CELEX:{CELEX_ID}
You can verify this manually with curl:
# Fetch a regulation (EU Drone Regulation 2019/947)
curl -s "https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:32019R0947" | head -50
# Or with a different language (German)
curl -s "https://eur-lex.europa.eu/legal-content/DE/TXT/HTML/?uri=CELEX:32019R0947" | head -50
The equivalent using this package's CLI:
# Fetch as HTML
uvx eurlxp fetch 32019R0947 --format html | head -50
# Fetch and parse to JSON
uvx eurlxp fetch 32019R0947 --format json | head -30
# Fetch and parse to CSV
uvx eurlxp fetch 32019R0947 --format csv | head -10
# Get document info (shows row count, articles, etc.)
uvx eurlxp info 32019R0947
Quick Start
from eurlxp import get_html_by_celex_id, parse_html
# Fetch and parse a regulation
celex_id = "32019R0947"
html = get_html_by_celex_id(celex_id)
df = parse_html(html)
# Get Article 1
df_article_1 = df[df.article == "1"]
print(df_article_1.iloc[0].text)
# "This Regulation lays down detailed provisions for the operation of unmanned aircraft systems..."
Async Usage
import asyncio
from eurlxp import AsyncEURLexClient, parse_html
async def fetch_documents():
async with AsyncEURLexClient() as client:
# Fetch multiple documents concurrently
docs = await client.fetch_multiple(["32019R0947", "32019R0945"])
for celex_id, html in docs.items():
df = parse_html(html)
print(f"{celex_id}: {len(df)} rows")
asyncio.run(fetch_documents())
CLI Usage
# Fetch a document
eurlxp fetch 32019R0947 -o regulation.html
# Parse and convert to CSV
eurlxp fetch 32019R0947 -f csv -o regulation.csv
# Get document info
eurlxp info 32019R0947
# Convert slash notation to CELEX ID
eurlxp celex 2019/947
# Output: 32019R0947
API Reference
Functions
| Function | Description |
|---|---|
get_html_by_celex_id(celex_id, language="en") |
Fetch HTML by CELEX ID |
get_html_by_cellar_id(cellar_id, language="en") |
Fetch HTML by CELLAR ID |
parse_html(html) |
Parse HTML to DataFrame |
get_celex_id(slash_notation, document_type="R", sector_id="3") |
Convert slash notation to CELEX ID |
get_possible_celex_ids(slash_notation) |
Get all possible CELEX IDs |
Classes
| Class | Description |
|---|---|
EURLexClient |
Synchronous HTTP client |
AsyncEURLexClient |
Asynchronous HTTP client |
DataFrame Columns
| Column | Description |
|---|---|
text |
The text content |
type |
Content type (text, link, etc.) |
document |
Document title |
article |
Article number |
article_subtitle |
Article subtitle |
paragraph |
Paragraph number |
group |
Group heading |
section |
Section heading |
ref |
Reference path (e.g., ["(1)", "(a)"]) |
Development
# Clone the repository
git clone https://github.com/morrieinmaas/eurlxp.git
cd eurlxp
# Install with dev dependencies
uv sync --all-extras
# Run tests
uv run pytest
# Run linting
uv run ruff check src tests
uv run ruff format src tests
# Type checking
uv run pyright
Publishing to PyPI
# Build the package
uv build
# Publish to PyPI (requires PYPI_TOKEN)
uv publish --token $PYPI_TOKEN
License
MIT License - see LICENSE for details.
Credits
Inspired by kevin91nl/eurlex.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file eurlxp-0.2.5.tar.gz.
File metadata
- Download URL: eurlxp-0.2.5.tar.gz
- Upload date:
- Size: 13.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.22 {"installer":{"name":"uv","version":"0.9.22","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fba0401391e5ade5cee12d46a42b2f68fbd8a80d4b9a1b0fa85b8571e7eb154a
|
|
| MD5 |
e8760c625bd1e87d803bb8fc8fdacf52
|
|
| BLAKE2b-256 |
5ab3fed79214f65ca66754470f63eb71f14cd6cd0881dc3ba320ab9dc0c5c18f
|
File details
Details for the file eurlxp-0.2.5-py3-none-any.whl.
File metadata
- Download URL: eurlxp-0.2.5-py3-none-any.whl
- Upload date:
- Size: 16.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.22 {"installer":{"name":"uv","version":"0.9.22","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7f4b5b97507402d5f37bd2eafca3d9833a703758c834d73b3b997613020d1683
|
|
| MD5 |
31dbaa0fc9fca4978b6baad756d77325
|
|
| BLAKE2b-256 |
0602247f98a1c24fbfbb0cab96b26e8e7d66be3973c79d549491101b6ced595c
|