Library for extracting cellar data

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

shashankmc

These details have not been verified by PyPI

Project description

Cellar extractor

This library contains two functions to get cellar case law data from eurlex.

Version

Python 3.9+

Tests

Workflow Status

Contributors

_{Pranav Bapat}

_{Piotr Lewandowski}

_shashankmc

_gijsvd

_venvis

How to install?

pip install cellar-extractor

What are the functions?

get_cellar
Gets all the ECLI data from the eurlex sparql endpoint and saves them in the CSV or JSON format, in-memory or as a saved file.
get_cellar_extra
Gets all the ECLI data from the eurlex sparql endpoint, and on top of that scrapes the eurlex websites to acquire the full text, keywords, case law directory code and eurovoc identifiers. If the user does have an eurlex account with access to the eurlex webservices, he can also pass his webservices login credentials to the method, in order to extract data about works citing work and works being cited by work. The full text is returned as a JSON file, rest of data as a CSV. Can be in-memory or as saved files.
get_nodes_and_edges_lists
Gets 2 list objects, one for the nodes and edges of the citations within the passed dataframe. Allows the creation of a network graph of the citations. Can only be returned in-memory.
filter_subject_matter
Returns a dataframe of cases only containing a certain phrase in the column containing the subject of cases.

What are the classes?

FetchOperativePart
A class whose instance(declaration) when called returns a list of the all the text contained within the operative part for each European Court of Justice (CJEU, formerly known as European Court of Justice (ECJ)) judgement (English only). The FetchOperativePart class has eleven functions - each function scrapes for the operative part depending on the html structure of the page :
- html_page_structure_one - This function retreives operative part from documents of the respected celex id's. This function scrapes/parse the operative part from a nested table structure . The relevant text lies inside the coj-bold class of the span tag.
- html_page_structure_two - This function retreives operative part from documents of the respected celex id's. This function scrapes/parse the operative part from a paragraph (p) structure. The relevant text lies inside the normal class of the p tag which comes after the keyword operative of the previous span tag.
- structure_three - This function retreives operative part from documents of the respected celex id's. This function scrapes/parse the operative part from a nested table structure. The relevant text lies inside the coj-bold class of the span tag.
- structure_four - This function retrieves the operative part from documents of the respected celex ids. This function scrapes/parses the operative part from a paragraph (p) structure. The relevant text lies inside the p tag which comes after the keyword operative of the previous span tag.
- structure_five - This function retrieves the operative part from documents of the respected celex ids. This function scrapes/parses the operative part from a paragraph (p) structure. The relevant text lies inside the normal class of the p tag which comes after the keyword operative of the previous span tag.
- structure_six - This function retrieves the operative part from documents of the respected celex ids. This function scrapes/parses the operative part from a h2 (header) structure. The relevant text lies inside the p tag which comes after the keyword operative part of the respective h2 tag.
- structure_seven - This function retrieves the operative part from documents of the respected celex ids. This function scrapes/parses the operative part from a table (table) structure. The relevant text lies inside the span tag which comes after the p tag, with the class name=normal.
- structure_eight - This function retrieves the operative part from documents of the respected celex ids. The text is extracted from the span tag nested inside the tbody tag. Returns a list as output.
- structure_nine - This function retrieves the operative part from documents of the respected celex ids. The operative part is under the bold (b) tag after the p tag where the keywords "on those grounds" exist.
- structure_ten - This function retrieves the operative part from documents of the respected celex ids. Since the content is preloaded using js/client server side functions, the text from the current page is retrieved and the operative part is scraped after the occurrence of the phrase "On those grounds".
- structure_eleven - This function retrieves the operative part from documents of the respected celex ids. The operative part is under the paragraph (p) tag after the b tag where the keywords "operative part" exist.
Writing
A class which writes the text for the operative part for each European Case law case(En-English only) into csv,json and txt files(Generated upon initialization). The Writing class has three functions:
- to_csv() - Writes the operative part along with celex id into a csv file.
- to_json() - Writes the operative part along with celex id into a json file.
- to_txt() - Writes the operative part along with celex id into a txt file
CellarSparqlQuery A class which includes methods to extract extra data for each court case using a sparql query.
- get_endorsements - Fetches endorsements of the judgement
- get_subjects - Fetches subjects of the judgement
- get_parties - Fetches parties of the judgement
- get_keywords - Fetches keywords of the judgement
- get_citations - Fetches court cases cited by the source judgement
- get_grounds - Fetches grounds of the judgement

What are the parameters?

get_cellar
Parameters:
- max_ecli: int, optional, default 100
  Maximum number of ECLIs to retrieve.
- sd: date, optional, default '2022-05-01'
  The start last modification date (yyyy-mm-dd).
- ed: date, optional, default current date
  The end last modification date (yyyy-mm-dd).
- save_file: ['y', 'n'],optional, default 'y'
  Save data in a data folder, or return in-memory.
- file_format: ['csv', 'json'],optional, default 'csv'
  Returns the data as a JSON/dictionary, or as a CSV/Pandas Dataframe object.
get_cellar_extra
- max_ecli: int, optional, default 100
  Maximum number of ECLIs to retrieve.
- sd: date, optional, default '2022-05-01'
  The start last modification date (yyyy-mm-dd).
- ed: date, optional, default current date
  The end last modification date (yyyy-mm-dd).
- save_file: ['y', 'n'],optional, default 'y'
  Save the full text of cases as JSON file / return as a dictionary and save the rest ofthe data as a CSV file / return as a Pandas Dataframe object.
- threads: int ,optional, default 10
  Extracting the additional data takes a lot of time. The use of multi-threading can cut down this time. Even with this, the method may take a couple of minutes for a couple of hundred cases. A maximum number of 10 recommended, as this method may also affect the device's internet connection.
- username: string, optional, default empty string
  The username to the eurlex webservices.
- password: string, optional, default empty string
  The password to the eurlex webservices.
get_nodes_and_edges_lists
- df: DataFrame object, required, default None
  DataFrame of cellar metadata acquired from the get_cellar_extra method with eurlex webservice credentials passed. This method will only work on dataframes with citations data.
- only_local: boolean, optional, default False
  Flag for nodes and edges generation. If set to True, the network created will only include nodes and edges between cases exclusively inside the given dataframe.
filter_subject_matter
- df: DataFrame object, required, default None
  DataFrame of cellar metadata acquired from any of the cellar extraction methods listed above.
- phrase: string, required, default None
  The phrase which has to be present in the subject matter of cases. Case insensitive.
Analyzer
- celex id: str, required
  - Pass as a constructor upon initializing the class
Writing
- celex id: str, required
  - Pass as a constructor upon initializing the class

Examples

import cellar_extractor as cell

# Below are examples for in-file saving:

cell.get_cellar(save_file='y', max_ecli=200, sd='2022-01-01', file_format='csv')
cell.get_cellar_extra(max_ecli=100, sd='2022-01-01', threads=10)

# Below are examples for in-memory saving:

df = cell.get_cellar(save_file='n', file_format='csv', sd='2022-01-01', max_ecli=1000)
df,json = cell.get_cellar_extra(save_file='n', max_ecli=100, sd='2022-01-01', threads=10)

Create a callback of the instance of the class initiated and pass a list as it's value.

import cellar_extractor as cell
instance=cell.FetchOperativePart(celex_id:str)
output_list=instance()
print(output_list) # prints operative part of the Case as a list

The Writing Class also takes a celex id , upon initializing the class , through the means of the constructor and writes the content of its operative part into different files, depending on the function called.

import cellar_extractor as cell
instance=cell.Writing(celex_id:str)
output=instance.to_csv() # for csv
output=instance.to_txt() # for txt
output=instance.to_json() # for json

License

Previously under the MIT License, as of 28/10/2022 this work is licensed under a Apache License, Version 2.0.

Apache License, Version 2.0

Copyright (c) 2022 Maastricht Law & Tech Lab

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
    
    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

shashankmc

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

1.2.3

Apr 2, 2025

1.2.2

Apr 2, 2025

1.2.1

Oct 24, 2024

1.2.0

Oct 24, 2024

1.1.3

Jul 8, 2024

1.1.2

Jul 8, 2024

1.1.1

Jul 6, 2024

1.0.62

Oct 18, 2023

1.0.61

Oct 3, 2023

1.0.60

Sep 27, 2023

1.0.59

Aug 15, 2023

1.0.58

Aug 5, 2023

1.0.57

Aug 5, 2023

1.0.56

Aug 5, 2023

1.0.55

Aug 4, 2023

1.0.54

Aug 3, 2023

1.0.53

Jul 15, 2023

1.0.52

Jul 15, 2023

1.0.51

Jul 15, 2023

1.0.50

Jun 30, 2023

1.0.49

Apr 11, 2023

1.0.48

Apr 11, 2023

1.0.47

Apr 11, 2023

1.0.46

Apr 11, 2023

1.0.45

Apr 10, 2023

1.0.44

Mar 24, 2023

1.0.43

Mar 22, 2023

1.0.42

Mar 20, 2023

1.0.41

Mar 10, 2023

1.0.40

Feb 28, 2023

1.0.39

Feb 8, 2023

1.0.38

Feb 8, 2023

1.0.37

Jan 9, 2023

1.0.36

Jan 4, 2023

1.0.35

Dec 16, 2022

1.0.34

Dec 16, 2022

1.0.33

Dec 13, 2022

1.0.32

Dec 12, 2022

1.0.31

Dec 6, 2022

1.0.30

Dec 6, 2022

1.0.29

Nov 28, 2022

1.0.28

Nov 28, 2022

1.0.27

Nov 18, 2022

1.0.26

Nov 17, 2022

1.0.25

Nov 14, 2022

1.0.24

Nov 12, 2022

1.0.23

Nov 12, 2022

1.0.22

Nov 11, 2022

1.0.20

Nov 11, 2022

1.0.19

Nov 11, 2022

1.0.18

Nov 11, 2022

1.0.17

Nov 11, 2022

1.0.16

Nov 11, 2022

1.0.15

Nov 7, 2022

1.0.14

Nov 4, 2022

1.0.13

Nov 4, 2022

1.0.12

Nov 4, 2022

1.0.11

Nov 3, 2022

1.0.10

Nov 3, 2022

1.0.3

Nov 3, 2022

1.0.2

Nov 3, 2022

1.0.1

Nov 3, 2022

1.0.0

Nov 3, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cellar_extractor-1.2.3.tar.gz (37.9 kB view details)

Uploaded Apr 2, 2025 Source

Built Distribution

cellar_extractor-1.2.3-py3-none-any.whl (37.5 kB view details)

Uploaded Apr 2, 2025 Python 3

File details

Details for the file cellar_extractor-1.2.3.tar.gz.

File metadata

Download URL: cellar_extractor-1.2.3.tar.gz
Upload date: Apr 2, 2025
Size: 37.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for cellar_extractor-1.2.3.tar.gz
Algorithm	Hash digest
SHA256	`5a8ef7f38a7257f20bec55e2cd826052b193963593b72182120e671159fb8394`
MD5	`c9d01b920aff5ca37a3824ba4d4cf648`
BLAKE2b-256	`e41cfbe6c446f960bebdfa36f08368770395fe5b4599140e1d110b9596518c47`

See more details on using hashes here.

Provenance

The following attestation bundles were made for cellar_extractor-1.2.3.tar.gz:

Publisher: github-actions.yml on maastrichtlawtech/extraction_libraries

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: cellar_extractor-1.2.3.tar.gz
- Subject digest: 5a8ef7f38a7257f20bec55e2cd826052b193963593b72182120e671159fb8394
- Sigstore transparency entry: 191467178
- Sigstore integration time: Apr 2, 2025
Source repository:
- Permalink: maastrichtlawtech/extraction_libraries@734912c94f7930a60956ec2f2acce84b2a37f5e2
- Branch / Tag: refs/heads/cellar
- Owner: https://github.com/maastrichtlawtech
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: github-actions.yml@734912c94f7930a60956ec2f2acce84b2a37f5e2
- Trigger Event: push

File details

Details for the file cellar_extractor-1.2.3-py3-none-any.whl.

File metadata

Download URL: cellar_extractor-1.2.3-py3-none-any.whl
Upload date: Apr 2, 2025
Size: 37.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for cellar_extractor-1.2.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`eefd30ef7523637b3c1b7c964f52d94eef1151a5c19bcec7bdab5280d0704a4f`
MD5	`46f68679c9cd88f8cd2ee8de094c383e`
BLAKE2b-256	`f08d33214ec4138d7f6a8beeb9602f0768a24930275e923abe152668837130c2`

See more details on using hashes here.

Provenance

The following attestation bundles were made for cellar_extractor-1.2.3-py3-none-any.whl:

Publisher: github-actions.yml on maastrichtlawtech/extraction_libraries

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: cellar_extractor-1.2.3-py3-none-any.whl
- Subject digest: eefd30ef7523637b3c1b7c964f52d94eef1151a5c19bcec7bdab5280d0704a4f
- Sigstore transparency entry: 191467182
- Sigstore integration time: Apr 2, 2025
Source repository:
- Permalink: maastrichtlawtech/extraction_libraries@734912c94f7930a60956ec2f2acce84b2a37f5e2
- Branch / Tag: refs/heads/cellar
- Owner: https://github.com/maastrichtlawtech
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: github-actions.yml@734912c94f7930a60956ec2f2acce84b2a37f5e2
- Trigger Event: push

cellar-extractor 1.2.3

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Project description

Cellar extractor

Version

Tests

Contributors

How to install?

What are the functions?

What are the classes?

What are the parameters?

Examples

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance