Skip to main content

An EUR-Lex parser for Python.

Project description

EUR-Lex Parser

Building PyPI version License: MIT

An EUR-Lex parser for Python.

Usage

You can install this package as follows:

pip install -U eurlex

After installing this package, you can download and parse any document from EUR-Lex. For example, the 32019R0947 regulation:

from eurlex import get_html_by_celex_id, parse_html

# Retrieve and parse the document with CELEX ID "32019R0947" into a Pandas DataFrame
celex_id = "32019R0947"
html = get_html_by_celex_id(celex_id)
df = parse_html(html)

# Get the first line of Article 1
df_article_1 = df[df.article == "1"]
df_article_1_line_1 = df_article_1.iloc[0]

# Display the subtitle and corresponding text of Article 1
assert df_article_1_line_1.article_subtitle == "Subject matter"
assert df_article_1_line_1.text == (
    "This Regulation lays down detailed provisions for the operation of unmanned aircraft systems as well as for personnel, including remote pilots and organisations involved in those operations."
)

Every document on EUR-Lex displays a CELEX number at the top of the page. More information on CELEX numbers can be found on the EUR-Lex website.

For more information about the methods in this package, see the unit tests and doctests.

Data Structure

The following columns are available in the parsed dataframe:

  • text: The text
  • type: The type of the data
  • document: The document in which the text is found
  • article: The article in which the text is found
  • article_subtitle: The subtitle of the article (when available)
  • ref: The indentation level of the text within the article (e.g. ["(1)", "(a)"] when the text is found under paragraph (1), subparagraph (a))

In some cases, additional fields are available. For example, the group field which contains the bold text under which a text is found.

Architecture

The dependency graph below is generated by import-cruiser and refreshed by the pre-commit hook. It focuses on src/eurlex and its non-dev external dependencies, while keeping the public import surface available through eurlex.

Module map

  • fetch.py: download EUR-Lex HTML and resolve multiple-choice responses
  • parser.py: turn HTML into tabular records
  • sparql.py: build and run SPARQL queries
  • language.py: language-code normalization
  • uri.py: query-parameter and IRI helpers
  • markup.py: XML and tag/class helpers
  • constants.py: prefix and language-code tables

EUR-Lex dependency graph

Contributing

Feel free to send any issues, ideas or pull requests.

Branching and pull requests

Please do your work on a feature branch that follows the feature/* naming pattern, for example feature/my-new-improvement.

When your work is ready, open a pull request from that feature branch to the target branch (typically main) for review.

Local checks

For development, install the project and its hooks, then let pre-commit run the same checks that CI expects:

python -m pip install -e .[dev]
pre-commit install
pre-commit run --all-files

The final hook runs the doctests and enforces 100% coverage for eurlex, so you should see the same failures locally before a commit lands.

The README examples are also exercised automatically through pytest-readme, so they stay in sync with the code instead of becoming decorative fiction.

The runnable examples in examples/ are executed by the test suite as well, so they are part of the coverage target rather than a separate side quest.

CI tests the package on Python 3.11, 3.12, and 3.13, while the pre-commit hooks keep the code quality checks on a single pinned environment.

Version tags that start with v — for example v0.1.8 — now create a GitHub Release, attach the built distributions, and publish the package to PyPI after the checks pass.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

eurlex-0.1.11.tar.gz (18.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

eurlex-0.1.11-py3-none-any.whl (14.8 kB view details)

Uploaded Python 3

File details

Details for the file eurlex-0.1.11.tar.gz.

File metadata

  • Download URL: eurlex-0.1.11.tar.gz
  • Upload date:
  • Size: 18.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for eurlex-0.1.11.tar.gz
Algorithm Hash digest
SHA256 75b74c349fa2841ad19b9d118c557d21aa9ac0eaf7248aa3cf62359ece82790d
MD5 fe78559186600704dfdc8c988ba66d74
BLAKE2b-256 c70771b9b5c40cd3136f05467f96e324d372339687ed38726c1a106cb2ee5527

See more details on using hashes here.

Provenance

The following attestation bundles were made for eurlex-0.1.11.tar.gz:

Publisher: building.yaml on kevin91nl/eurlex

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file eurlex-0.1.11-py3-none-any.whl.

File metadata

  • Download URL: eurlex-0.1.11-py3-none-any.whl
  • Upload date:
  • Size: 14.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for eurlex-0.1.11-py3-none-any.whl
Algorithm Hash digest
SHA256 63277e9e1b8c0a96566f5a7ace0f6ccf61febb97e98db88f92cbe23dc6129fcf
MD5 8ebdfc823ce6d377201aca4ef6169678
BLAKE2b-256 6174a062916657b8b46b092466a5469462c21cec2c2a92b3e25a7363902e7f73

See more details on using hashes here.

Provenance

The following attestation bundles were made for eurlex-0.1.11-py3-none-any.whl:

Publisher: building.yaml on kevin91nl/eurlex

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page