Library for extracting cellar data
Project description
Cellar extractor
This library contains two functions to get cellar case law data from eurlex.
Version
Python 3.9
Contributors
Pranav Bapat |
Piotr Lewandowski |
shashankmc |
gijsvd |
How to install?
pip install cellar-extractor
What are the functions?
get_cellar
Gets all the ECLI data from the eurlex sparql endpoint and saves them in the CSV or JSON format, in-memory or as a saved file.
get_cellar_extra
Gets all the ECLI data from the eurlex sparql endpoint, and on top of that scrapes the eurlex websites to acquire
the full text, keywords, case law directory code and eurovoc identifiers. If the user does have an eurlex account with access to the eurlex webservices, he can also
pass his webservices login credentials to the method, in order to extract data about works citing work and works
being cited by work. The full text is returned as a JSON file, rest of data as a CSV. Can be in-memory or as saved files.
What are the parameters?
get_cellar
Parameters:
- max_ecli: int, optional Maximum number of ECLIs to retrieve
- sd: date, optional, default '2022-05-01' The start last modification date (yyyy-mm-dd)
- ed: date, optional, default current date The end last modification date (yyyy-mm-dd)
- save_file: ['y', 'n'],optional, default 'y' Save data in a data folder, or return in-memory
- file_format: ['csv', 'json'],optional, default 'csv' Returns the data as a JSON/dictionary, or as a CSV/Pandas Dataframe object.
get_cellar_extra
- max_ecli: int, optional Maximum number of ECLIs to retrieve
- sd: date, optional, default '2022-05-01' The start last modification date (yyyy-mm-dd)
- ed: date, optional, default current date The end last modification date (yyyy-mm-dd)
- save_file: ['y', 'n'],optional, default 'y' Save the full text of cases as JSON file / return as a dictionary and save the rest of the data as a CSV file / return as a Pandas Dataframe object
- threads: int ,optional, default 10 Extracting the additional data takes a lot of time. The use of multi-threading can cut down this time. Even with this, the method may take a couple of minutes for a couple of hundred cases. A maximum number of 10 recommended, as this method may also affect the device's internet connection.
- username: string, optional, default empty string The username to the eurlex webservices.
- password: string, optional, default empty string The password to the eurlex webservices.
Default: 100
Default: 100
Examples
import cellar_extractor as cell
Below are examples for in-file saving:
cell.get_cellar(save_file='y', max_ecli=200, sd='2022-01-01', file_format='csv')
cell.get_cellar_extra(max_ecli=100, sd='2022-01-01', threads=10)
Below are examples for in-memory saving:
df = cell.get_cellar(save_file='n', file_format='csv', sd='2022-01-01', max_ecli=1000)
df,json = cell.get_cellar_extra(save_file='n', max_ecli=100, sd='2022-01-01', threads=10)
License
Previously under the MIT License, as of 28/10/2022 this work is licensed under a Apache License, Version 2.0.
Apache License, Version 2.0
Copyright (c) 2022 Maastricht Law & Tech Lab
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
cellar_extractor-1.0.44.tar.gz
(20.9 kB
view hashes)