Skip to main content

An EUR-Lex parser for Python.

Project description

EUR-Lex Parser

Building PyPI version

An EUR-Lex parser for Python.

Usage

You can install this package as follows:

pip install -U eurlex

After installing this package, you can download and parse any document from EUR-Lex. For example, the 32019R0947 regulation:

from eurlex import get_html_by_celex_id, parse_html

# Retrieve and parse the document with CELEX ID "32019R0947" into a Pandas DataFrame
celex_id = "32019R0947"
html = get_html_by_celex_id(celex_id)
df = parse_html(html)

# Get the first line of Article 1
df_article_1 = df[df.article == "1"]
df_article_1_line_1 = df_article_1.iloc[0]

# Display the subtitle of Article 1
print(df_article_1_line_1.article_subtitle)
>>> "Subject matter"

# Display the corresponding text
print(df_article_1_line_1.text)
>>> "This Regulation lays down detailed provisions for the operation of unmanned aircraft systems as well as for personnel, including remote pilots and organisations involved in those operations."

Every document on EUR-Lex displays a CELEX number at the top of the page. More information on CELEX numbers can be found on the EUR-Lex website.

For more information about the methods in this package, see the unit tests and doctests.

Data Structure

The following columns are available in the parsed dataframe:

  • text: The text
  • type: The type of the data
  • document: The document in which the text is found
  • article: The article in which the text is found
  • article_subtitle: The subtitle of the article (when available)
  • ref: The indentation level of the text within the article (e.g. ["(1)", "(a)"] when the text is found under paragraph (1), subparagraph (a))

In some cases, additional fields are available. For example, the group field which contains the bold text under which a text is found.

Code Contribution

Feel free to send any issues, ideas or pull requests.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

eurlex-0.1.6.tar.gz (12.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

eurlex-0.1.6-py3-none-any.whl (11.1 kB view details)

Uploaded Python 3

File details

Details for the file eurlex-0.1.6.tar.gz.

File metadata

  • Download URL: eurlex-0.1.6.tar.gz
  • Upload date:
  • Size: 12.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for eurlex-0.1.6.tar.gz
Algorithm Hash digest
SHA256 c4776a7fe85bd0b55441dcddbc80a409aedbb7960ba54978cd5339d8a76bf576
MD5 55e6e1387090523ade014bf794f720a1
BLAKE2b-256 ff527ea55dcf2976c8d4adc71843099758575b48d3d38c3056f172ad304d75de

See more details on using hashes here.

Provenance

The following attestation bundles were made for eurlex-0.1.6.tar.gz:

Publisher: building.yaml on kevin91nl/eurlex

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file eurlex-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: eurlex-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 11.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for eurlex-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 4432c0b2756cb9b3e13c8c610d15614f5ebf64c3ac582c791deb3bd87f07aea4
MD5 9b4f38256764951dd277a5b62838f80c
BLAKE2b-256 4ba5d26d2743964d27aa85061ffda75182e48974449deec2eae948c5c4aae456

See more details on using hashes here.

Provenance

The following attestation bundles were made for eurlex-0.1.6-py3-none-any.whl:

Publisher: building.yaml on kevin91nl/eurlex

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page