Skip to main content

An EUR-Lex parser for Python.

Project description

EUR-Lex Parser

Building PyPI version

An EUR-Lex parser for Python.

Usage

You can install this package as follows:

pip install -U eurlex

After installing this package, you can download and parse any document from EUR-Lex. For example, the 32019R0947 regulation:

from eurlex import get_html_by_celex_id, parse_html

# Retrieve and parse the document with CELEX ID "32019R0947" into a Pandas DataFrame
celex_id = "32019R0947"
html = get_html_by_celex_id(celex_id)
df = parse_html(html)

# Get the first line of Article 1
df_article_1 = df[df.article == "1"]
df_article_1_line_1 = df_article_1.iloc[0]

# Display the subtitle of Article 1
print(df_article_1_line_1.article_subtitle)
>>> "Subject matter"

# Display the corresponding text
print(df_article_1_line_1.text)
>>> "This Regulation lays down detailed provisions for the operation of unmanned aircraft systems as well as for personnel, including remote pilots and organisations involved in those operations."

Every document on EUR-Lex displays a CELEX number at the top of the page. More information on CELEX numbers can be found on the EUR-Lex website.

For more information about the methods in this package, see the unit tests and doctests.

Data Structure

The following columns are available in the parsed dataframe:

  • text: The text
  • type: The type of the data
  • document: The document in which the text is found
  • article: The article in which the text is found
  • article_subtitle: The subtitle of the article (when available)
  • ref: The indentation level of the text within the article (e.g. ["(1)", "(a)"] when the text is found under paragraph (1), subparagraph (a))

In some cases, additional fields are available. For example, the group field which contains the bold text under which a text is found.

Code Contribution

Feel free to send any issues, ideas or pull requests.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

eurlex-0.1.4.tar.gz (10.7 kB view details)

Uploaded Source

Built Distribution

eurlex-0.1.4-py3-none-any.whl (9.9 kB view details)

Uploaded Python 3

File details

Details for the file eurlex-0.1.4.tar.gz.

File metadata

  • Download URL: eurlex-0.1.4.tar.gz
  • Upload date:
  • Size: 10.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.17

File hashes

Hashes for eurlex-0.1.4.tar.gz
Algorithm Hash digest
SHA256 7485c5009a3ff8bb608697e2526862fe68e6fe192c5c3cb04ea8981345a8dd9c
MD5 497dd3ae80b45bae164c5e814a0e70a3
BLAKE2b-256 26ab1d900878ab466f2be3126aafdb669663e87984c48606495ad34c0d8eee23

See more details on using hashes here.

File details

Details for the file eurlex-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: eurlex-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 9.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.17

File hashes

Hashes for eurlex-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 5eb6bbb809ec57345f81e05f7f43e439ea2888877e11c688a09ada536f1ea5c4
MD5 68b637214d1ab8160799db1dd39a6e7c
BLAKE2b-256 ed3cc869754cd09e38f698e0813261c7638ae454bef51c1702df004168d79c83

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page