Skip to main content

Eurlex parser for fetching and parsing Eurlex data.

Project description

Eurlex Parser

This Python package fetches and parses data(regulations, directives and proposals) from Eurlex, the official website for European Union law. It extracts various parts of legal documents by their CELEX IDs and supports exporting the data in JSON and Pandas DataFrame formats.

Installation

pip install eurlex-parser

Usage

Functions

  • get_data_by_celex_id(celex_id: str, language: str = "en") -> dict: Fetches and parses the data for the given CELEX ID. Returns a dictionary with the document's title, preamble, articles, final part, and annexes.

  • get_json_by_celex_id(celex_id: str) -> str: Fetches and parses the data for the given CELEX ID and returns it in JSON format.

  • get_articles_by_celex_id(celex_id: str) -> pd.DataFrame: Fetches and parses the articles for the given CELEX ID and returns them as a Pandas DataFrame.

  • get_summary_by_celex_id(celex_id: str, language: str = "en") -> dict: Fetches and parses the summary for the given CELEX ID and returns it as a dictionary containing the document's title, chapters, and the last modified date. (Note: The summary is not available for all documents.)

Examples

Following are some examples of how to use the functions to fetch and parse data from a CELEX ID. For example, the CELEX ID 32013R0575 corresponds to the following URL: https://eur-lex.europa.eu/legal-content/en/TXT/?uri=celex:32013R0575

  1. Fetch and print data for a given CELEX ID:

    from eurlex import get_data_by_celex_id
    
    data = get_data_by_celex_id('32013R0575')
    print(data)
    
  2. Save data as a JSON file:

    from eurlex import get_json_by_celex_id
    
    json_data = get_json_by_celex_id('32013R0575')
    with open('32013R0575.json', 'w', encoding='utf-8') as f:
        f.write(json_data)
    
  3. Load articles into a Pandas DataFrame:

    from eurlex import get_articles_by_celex_id
    
    df = get_articles_by_celex_id('32013R0575')
    print(df.head())
    
  4. Fetch and print summary for a given CELEX ID:

    from eurlex import get_summary_by_celex_id
    
    summary = get_summary_by_celex_id('32013R0575')
    print(summary)
    

You can find some generated JSON files in the examples directory.

Data Structure

The main data structure returned by get_data_by_celex_id is a dictionary with the following format:

{
  "title": "Document Title",
  "preamble": {
    "text": "Preamble text",
    "notes": [
      {
        "id": "1",
        "text": "Note text",
        "url": "https://eur-lex.europa.eu/...",
        "reference": null
      }
    ]
  },
  "articles": [
    {
      "id": "Article ID",
      "title": "Article Title",
      "text": "Article text",
      "metadata": {
        "parent_title1": "Parent Title 1",
        "parent_title2": "Parent Title 2",
      },
      "notes": [
        {
          "id": "1",
          "text": "Note text",
          "url": "https://eur-lex.europa.eu/...",
          "reference": null
        }
      ],
      "references": [
        "Directive ..../../..",
        "Regulation (EU) No .../....",
      ]
    }
  ],
  "notes": [
    {
      "id": "1",
      "text": "Note text",
      "url": "https://eur-lex.europa.eu/...",
      "reference": null
    }
  ],  
  "references": [
    "Directive ..../../..",
    "Regulation (EU) No .../....",
  ],
  "final_part": "Final part text",
  "annexes": [
    {
      "id": "Annex ID",
      "title": "Annex Title",
      "text": "Annex text",
      "table": "Markdown table text"
    }
  ],
  "summary": {
    "title": "Document Title",
    "chapters": {
      "Chapter Title 1": "Chapter content 1",
      "Chapter Title 2": "Chapter content 2"
    },
    "last_modified": "Last modified date"
  },
  "related_documents": {
    "modifies": [
      {
        "Relation": "Modifies",
        "Act": {
            "celex": "CELEX Number",
            "url": "https://eur-lex.europa.eu/..."
        },
        "Comment": "Addition",
        "Subdivision concerned": "Article number/paragraph",
        "From": "date",
        "To": "date"
      }
    ],
    "modified_by": [
      {
        "Relation": "Corrected by",
        "Act": {
            "celex": "CELEX Number",
            "url": "https://eur-lex.europa.eu/..."
        },
        "Comment": "",
        "Subdivision concerned": "Article number/paragraph",
        "From": "date",
        "To": "date"
      }
    ],
  }
}

Notes

  • The script currently supports fetching data in English (en) only.

License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

eurlex-parser-0.0.13.tar.gz (11.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

eurlex_parser-0.0.13-py3-none-any.whl (13.7 kB view details)

Uploaded Python 3

File details

Details for the file eurlex-parser-0.0.13.tar.gz.

File metadata

  • Download URL: eurlex-parser-0.0.13.tar.gz
  • Upload date:
  • Size: 11.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.10.7

File hashes

Hashes for eurlex-parser-0.0.13.tar.gz
Algorithm Hash digest
SHA256 e5985797abc71e456255f35e4c21ab44231830cb9bef832f7e4298cd84dd9a91
MD5 00d7cb8559d3c2de67f40d5b6c9ed061
BLAKE2b-256 4d91c4f918f4fd0493daf9452d56ef2bd44ebb8ce7f856ff357cdb801287c34a

See more details on using hashes here.

File details

Details for the file eurlex_parser-0.0.13-py3-none-any.whl.

File metadata

  • Download URL: eurlex_parser-0.0.13-py3-none-any.whl
  • Upload date:
  • Size: 13.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.10.7

File hashes

Hashes for eurlex_parser-0.0.13-py3-none-any.whl
Algorithm Hash digest
SHA256 94d0a4c84fdf17de6378a26b95492a5e19cc343efc2e2fdae076c36aad52ae1d
MD5 190934ad8fe6323f55d484cb827faa38
BLAKE2b-256 e36144fc44f68f9047d28f948560acac8163e4fd13fbf11c0f88a04dda4b0092

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page