Skip to main content

A modern EUR-Lex parser for Python - fetch and parse EU legal documents

Project description

eurlxp

CI PyPI version Ruff Python versions

A modern EUR-Lex parser for Python. Fetch and parse EU legal documents with async support, type hints, and a CLI.

Note: This is a modern rewrite inspired by kevin91nl/eurlex, built with UV, httpx, Pydantic, and Typer.

Features

  • Modern Python - Supports Python 3.10-3.14
  • Async support - Fetch multiple documents concurrently
  • Type hints - Full type annotations for IDE support
  • CLI - Command-line interface with Typer
  • Pydantic models - Validated, structured data
  • Drop-in compatible - Same API as the original eurlex package

Installation

# Using pip
pip install eurlxp

# Using uv
uv add eurlxp

# With SPARQL support
pip install eurlxp[sparql]

How It Works

This package fetches EU legal documents from EUR-Lex using their public HTML endpoints:

https://eur-lex.europa.eu/legal-content/{LANG}/TXT/HTML/?uri=CELEX:{CELEX_ID}

You can verify this manually with curl:

# Fetch a regulation (EU Drone Regulation 2019/947)
curl -s "https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:32019R0947" | head -50

# Or with a different language (German)
curl -s "https://eur-lex.europa.eu/legal-content/DE/TXT/HTML/?uri=CELEX:32019R0947" | head -50

The equivalent using this package's CLI:

# Fetch as HTML
uvx eurlxp fetch 32019R0947 --format html | head -50

# Fetch and parse to JSON
uvx eurlxp fetch 32019R0947 --format json | head -30

# Fetch and parse to CSV
uvx eurlxp fetch 32019R0947 --format csv | head -10

# Get document info (shows row count, articles, etc.)
uvx eurlxp info 32019R0947

Quick Start

from eurlxp import get_html_by_celex_id, parse_html

# Fetch and parse a regulation
celex_id = "32019R0947"
html = get_html_by_celex_id(celex_id)
df = parse_html(html)

# Get Article 1
df_article_1 = df[df.article == "1"]
print(df_article_1.iloc[0].text)
# "This Regulation lays down detailed provisions for the operation of unmanned aircraft systems..."

Async Usage

import asyncio
from eurlxp import AsyncEURLexClient, parse_html

async def fetch_documents():
    async with AsyncEURLexClient() as client:
        # Fetch multiple documents concurrently
        docs = await client.fetch_multiple(["32019R0947", "32019R0945"])
        for celex_id, html in docs.items():
            df = parse_html(html)
            print(f"{celex_id}: {len(df)} rows")

asyncio.run(fetch_documents())

CLI Usage

# Fetch a document
eurlxp fetch 32019R0947 -o regulation.html

# Parse and convert to CSV
eurlxp fetch 32019R0947 -f csv -o regulation.csv

# Get document info
eurlxp info 32019R0947

# Convert slash notation to CELEX ID
eurlxp celex 2019/947
# Output: 32019R0947

API Reference

Functions

Function Description
get_html_by_celex_id(celex_id, language="en") Fetch HTML by CELEX ID
get_html_by_cellar_id(cellar_id, language="en") Fetch HTML by CELLAR ID
parse_html(html) Parse HTML to DataFrame
get_celex_id(slash_notation, document_type="R", sector_id="3") Convert slash notation to CELEX ID
get_possible_celex_ids(slash_notation) Get all possible CELEX IDs

Classes

Class Description
EURLexClient Synchronous HTTP client
AsyncEURLexClient Asynchronous HTTP client

DataFrame Columns

Column Description
text The text content
type Content type (text, link, etc.)
document Document title
article Article number
article_subtitle Article subtitle
paragraph Paragraph number
group Group heading
section Section heading
ref Reference path (e.g., ["(1)", "(a)"])

Development

# Clone the repository
git clone https://github.com/morrieinmaas/eurlxp.git
cd eurlxp

# Install with dev dependencies
uv sync --all-extras

# Run tests
uv run pytest

# Run linting
uv run ruff check src tests
uv run ruff format src tests

# Type checking
uv run pyright

Publishing to PyPI

# Build the package
uv build

# Publish to PyPI (requires PYPI_TOKEN)
uv publish --token $PYPI_TOKEN

License

MIT License - see LICENSE for details.

Credits

Inspired by kevin91nl/eurlex.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

eurlxp-0.2.4.tar.gz (13.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

eurlxp-0.2.4-py3-none-any.whl (16.4 kB view details)

Uploaded Python 3

File details

Details for the file eurlxp-0.2.4.tar.gz.

File metadata

  • Download URL: eurlxp-0.2.4.tar.gz
  • Upload date:
  • Size: 13.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.22 {"installer":{"name":"uv","version":"0.9.22","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for eurlxp-0.2.4.tar.gz
Algorithm Hash digest
SHA256 46e6fe2084324b2a0b0b33f5655257da9776c58c9fc2f7d9288c6bdebd0794fd
MD5 82f129578008b3ae8dc72026b7052679
BLAKE2b-256 84c36537ce25dcf72819dbe918de27d218b2bedb3188a70083ea562a9187f62c

See more details on using hashes here.

File details

Details for the file eurlxp-0.2.4-py3-none-any.whl.

File metadata

  • Download URL: eurlxp-0.2.4-py3-none-any.whl
  • Upload date:
  • Size: 16.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.22 {"installer":{"name":"uv","version":"0.9.22","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for eurlxp-0.2.4-py3-none-any.whl
Algorithm Hash digest
SHA256 e70272408ec35cf213ff0b6b46ea9490fc68edbcb0dcb1e5d14f94563a2bf1e2
MD5 29d6901724b311ed8792107e86de675f
BLAKE2b-256 4fc97f4b2572cdaa03a714ec864e9dddfb11edcab3593dcdfcf5c5ac3a797494

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page