Skip to main content

An EUR-Lex parser for Python.

Project description

EUR-Lex Parser

Building PyPI version License: MIT

An EUR-Lex parser for Python.

Usage

You can install this package as follows:

pip install -U eurlex

After installing this package, you can download and parse any document from EUR-Lex. For example, the 32019R0947 regulation:

from eurlex import get_html_by_celex_id, parse_html

# Retrieve and parse the document with CELEX ID "32019R0947" into a Pandas DataFrame
celex_id = "32019R0947"
html = get_html_by_celex_id(celex_id)
df = parse_html(html)

# Get the first line of Article 1
df_article_1 = df[df.article == "1"]
df_article_1_line_1 = df_article_1.iloc[0]

# Display the subtitle and corresponding text of Article 1
assert df_article_1_line_1.article_subtitle == "Subject matter"
assert df_article_1_line_1.text == (
    "This Regulation lays down detailed provisions for the operation of unmanned aircraft systems as well as for personnel, including remote pilots and organisations involved in those operations."
)

Every document on EUR-Lex displays a CELEX number at the top of the page. More information on CELEX numbers can be found on the EUR-Lex website.

For more information about the methods in this package, see the unit tests and doctests.

Data Structure

The following columns are available in the parsed dataframe:

  • text: The text
  • type: The type of the data
  • document: The document in which the text is found
  • article: The article in which the text is found
  • article_subtitle: The subtitle of the article (when available)
  • ref: The indentation level of the text within the article (e.g. ["(1)", "(a)"] when the text is found under paragraph (1), subparagraph (a))

In some cases, additional fields are available. For example, the group field which contains the bold text under which a text is found.

Architecture

The dependency graph below is generated by import-cruiser and refreshed by the pre-commit hook. It focuses on src/eurlex and its non-dev external dependencies, while keeping the public import surface available through eurlex.

Module map

  • fetch.py: download EUR-Lex HTML and resolve multiple-choice responses
  • parser.py: turn HTML into tabular records
  • sparql.py: build and run SPARQL queries
  • language.py: language-code normalization
  • uri.py: query-parameter and IRI helpers
  • xml.py: XML and tag/class helpers
  • constants.py: prefix and language-code tables

EUR-Lex dependency graph

Contributing

Feel free to send any issues, ideas or pull requests.

Branching and pull requests

Please do your work on a feature branch that follows the feature/* naming pattern, for example feature/my-new-improvement.

When your work is ready, open a pull request from that feature branch to the target branch (typically main) for review.

Local checks

For development, install the project and its hooks, then let pre-commit run the same checks that CI expects:

python -m pip install -e .[dev]
pre-commit install
pre-commit run --all-files

The final hook runs the doctests and enforces 100% coverage for eurlex, so you should see the same failures locally before a commit lands.

The README examples are also exercised automatically through pytest-readme, so they stay in sync with the code instead of becoming decorative fiction.

The runnable examples in examples/ are executed by the test suite as well, so they are part of the coverage target rather than a separate side quest.

CI tests the package on Python 3.11, 3.12, and 3.13, while the pre-commit hooks keep the code quality checks on a single pinned environment.

Version tags that start with v — for example v0.1.8 — now create a GitHub Release, attach the built distributions, and publish the package to PyPI after the checks pass.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

eurlex-0.1.9.tar.gz (17.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

eurlex-0.1.9-py3-none-any.whl (13.4 kB view details)

Uploaded Python 3

File details

Details for the file eurlex-0.1.9.tar.gz.

File metadata

  • Download URL: eurlex-0.1.9.tar.gz
  • Upload date:
  • Size: 17.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for eurlex-0.1.9.tar.gz
Algorithm Hash digest
SHA256 a9b4f9a79318622ea60852173fe1712e45634903ad92b86a2607109d893cdb1d
MD5 c0a8928ee03f6f3fa35c2943fd49f1f7
BLAKE2b-256 950414b9110d80f43d2debbac0b1ce8a0b89419fdea68cc020eb9659f4cf55bd

See more details on using hashes here.

Provenance

The following attestation bundles were made for eurlex-0.1.9.tar.gz:

Publisher: building.yaml on kevin91nl/eurlex

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file eurlex-0.1.9-py3-none-any.whl.

File metadata

  • Download URL: eurlex-0.1.9-py3-none-any.whl
  • Upload date:
  • Size: 13.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for eurlex-0.1.9-py3-none-any.whl
Algorithm Hash digest
SHA256 fa3cecb7b68c98c42c0152e0712a21288c032c9c199a518227d333ccb3d14bf9
MD5 0df5d5fd78bfb8ce383f394558149140
BLAKE2b-256 1bbc8e3e9f7b66029c1081047bff672a76b896b69c27db7408c8793530f93f6b

See more details on using hashes here.

Provenance

The following attestation bundles were made for eurlex-0.1.9-py3-none-any.whl:

Publisher: building.yaml on kevin91nl/eurlex

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page