An EUR-Lex parser for Python.
Project description
EUR-Lex Parser
An EUR-Lex parser for Python.
Usage
You can install this package as follows:
pip install -U eurlex
After installing this package, you can download and parse any document from EUR-Lex. For example, the 32019R0947 regulation:
from eurlex import get_html_by_celex_id, parse_html
# Retrieve and parse the document with CELEX ID "32019R0947" into a Pandas DataFrame
celex_id = "32019R0947"
html = get_html_by_celex_id(celex_id)
df = parse_html(html)
# Get the first line of Article 1
df_article_1 = df[df.article == "1"]
df_article_1_line_1 = df_article_1.iloc[0]
# Display the subtitle and corresponding text of Article 1
assert df_article_1_line_1.article_subtitle == "Subject matter"
assert df_article_1_line_1.text == (
"This Regulation lays down detailed provisions for the operation of unmanned aircraft systems as well as for personnel, including remote pilots and organisations involved in those operations."
)
Every document on EUR-Lex displays a CELEX number at the top of the page. More information on CELEX numbers can be found on the EUR-Lex website.
For more information about the methods in this package, see the unit tests and doctests.
Data Structure
The following columns are available in the parsed dataframe:
text: The texttype: The type of the datadocument: The document in which the text is foundarticle: The article in which the text is foundarticle_subtitle: The subtitle of the article (when available)ref: The indentation level of the text within the article (e.g.["(1)", "(a)"]when the text is found under paragraph(1), subparagraph(a))
In some cases, additional fields are available. For example, the group field which contains the bold text under which a text is found.
Architecture
The dependency graph below is generated by import-cruiser and refreshed by the pre-commit hook. It focuses on src/eurlex and its non-dev external dependencies, while keeping the public import surface available through eurlex.
Module map
fetch.py: download EUR-Lex HTML and resolve multiple-choice responsesparser.py: turn HTML into tabular recordssparql.py: build and run SPARQL querieslanguage.py: language-code normalizationuri.py: query-parameter and IRI helpersxml.py: XML and tag/class helpersconstants.py: prefix and language-code tables
Contributing
Feel free to send any issues, ideas or pull requests.
Branching and pull requests
Please do your work on a feature branch that follows the feature/* naming pattern, for example feature/my-new-improvement.
When your work is ready, open a pull request from that feature branch to the target branch (typically main) for review.
Local checks
For development, install the project and its hooks, then let pre-commit run the same checks that CI expects:
python -m pip install -e .[dev]
pre-commit install
pre-commit run --all-files
The final hook runs the doctests and enforces 100% coverage for eurlex, so you should see the same failures locally before a commit lands.
The README examples are also exercised automatically through pytest-readme, so they stay in sync with the code instead of becoming decorative fiction.
The runnable examples in examples/ are executed by the test suite as well, so they are part of the coverage target rather than a separate side quest.
CI tests the package on Python 3.11, 3.12, and 3.13, while the pre-commit hooks keep the code quality checks on a single pinned environment.
Version tags that start with v — for example v0.1.8 — now create a GitHub Release, attach the built distributions, and publish the package to PyPI after the checks pass.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file eurlex-0.1.8.tar.gz.
File metadata
- Download URL: eurlex-0.1.8.tar.gz
- Upload date:
- Size: 17.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f63da7a0f93fdeca02010a6b51446081bb486e34222b5532a9d41d06541dbc3b
|
|
| MD5 |
9f7ecfbc5673ac2fe774d8f0e314c2b8
|
|
| BLAKE2b-256 |
aaaee98030f8fdc48bb2e03e7b90bba4ed642e8a8bbbe198e7abcfcf9d78d569
|
Provenance
The following attestation bundles were made for eurlex-0.1.8.tar.gz:
Publisher:
building.yaml on kevin91nl/eurlex
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
eurlex-0.1.8.tar.gz -
Subject digest:
f63da7a0f93fdeca02010a6b51446081bb486e34222b5532a9d41d06541dbc3b - Sigstore transparency entry: 1226062595
- Sigstore integration time:
-
Permalink:
kevin91nl/eurlex@f74addbc69d6c2becd1d93022262130f6616b0a2 -
Branch / Tag:
refs/tags/v0.1.8 - Owner: https://github.com/kevin91nl
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
building.yaml@f74addbc69d6c2becd1d93022262130f6616b0a2 -
Trigger Event:
push
-
Statement type:
File details
Details for the file eurlex-0.1.8-py3-none-any.whl.
File metadata
- Download URL: eurlex-0.1.8-py3-none-any.whl
- Upload date:
- Size: 13.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
414414e884904b79167cf9d680d82d37f0e8c338998f8668ffb2742360f8aa5e
|
|
| MD5 |
6f5e5b2bbfdada5499431fc6891c805a
|
|
| BLAKE2b-256 |
54be4915c09d13840fcabf9a1dbd993b27854890bcc016ff0c201abc403a77bd
|
Provenance
The following attestation bundles were made for eurlex-0.1.8-py3-none-any.whl:
Publisher:
building.yaml on kevin91nl/eurlex
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
eurlex-0.1.8-py3-none-any.whl -
Subject digest:
414414e884904b79167cf9d680d82d37f0e8c338998f8668ffb2742360f8aa5e - Sigstore transparency entry: 1226062698
- Sigstore integration time:
-
Permalink:
kevin91nl/eurlex@f74addbc69d6c2becd1d93022262130f6616b0a2 -
Branch / Tag:
refs/tags/v0.1.8 - Owner: https://github.com/kevin91nl
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
building.yaml@f74addbc69d6c2becd1d93022262130f6616b0a2 -
Trigger Event:
push
-
Statement type: