Theolex document processing
Project description
# Legal-doc-processing
[![Python 3.8](https://img.shields.io/badge/python-3.8-blue.svg)](https://www.python.org/downloads/release/python-380/) [![codecov](https://codecov.io/gh/THEOLEX-IO/legal_doc_processing/branch/master/graph/badge.svg)](https://codecov.io/gh/THEOLEX-IO/legal_doc_processing) [![Build Status](https://travis-ci.org/mtchavez/python-package-boilerplate.png?branch=master)](https://travis-ci.org/mtchavez/python-package-boilerplate) [![Requires.io](https://requires.io/github/mtchavez/python-package-boilerplate/requirements.svg?branch=master)](https://requires.io/github/mtchavez/python-package-boilerplate/requirements?branch=master) [![DeepSource](https://deepsource.io/gh/THEOLEX-IO/legal_doc_processing.svg/?label=active+issues&show_trend=true)](https://deepsource.io/gh/THEOLEX-IO/legal_doc_processing/?ref=repository-badge)
## What is it ?
<br>
legal-doc-processing is an open source NLP library dedicated to legal documents. It offers a large and various tools to analyse, structure and extract information from legal documents surch as orders, complaints, press release etc etc. <br> <br>
## Installation
<br>
go in your project directory and activate virtual environnement ` cd my-project python3 -m venv env source ./env/bin/activate `
then install with pypi ` pip install legal-doc-processing `
or install with git ` git clone https://github.com/THEOLEX-IO/legal_doc_processing.git pip install -r requirements.txt `
at fris usage please write following commands to boot strap the package ` python -c "from legal-doc-processing import boot; boot()" ` this comman will download data collections and mandory web assets. (it can take 1 minute depends of your connection) <br> <br>
## Usage
<br>
there is 2 main modules in legal-doc-processing : - ld for LegalDoc objects ie order, complaint, etc etc official documents - pr for PressRelease objects for legal press release related to each case
so you can ` from legal-doc-processing import * # import all from legal-doc-processing import ld # import legal document module from legal-doc-processing import pr # import press release module ` <br>
### Instanciation youn can init an object in 2 ways. * with text directly ` from legal-doc-processing import ld doc = ld.LegalDoc("this is a document") ` * with path to a file ` doc = ld.read_LegalDoc("this/is/my/file.txt") `
for press release, same pattern: ` from legal-doc-processing import pr press = ld.PressRelease("this is a press release") ` * with path to a file ` press = ld.read_PressRelease("this/is/my/file.txt") `
once instanciated, you can print : ` print(doc) ` <br>
all interessing features are in feature_dict attribute ` print(doc.feature_dict) ` <br>
### Predictions
you can now make predictions : ` defendant = doc.predict("defendant") print(defendant) case = doc.predict("case") print(case) ` <br>
predict method return a something but alson work on the object itself : ` print(doc) print(doc.feature_dict) ` <br>
of course most easy is to predict all : ` features = doc.predict("all") print(features) print(doc) print(doc.feature_dict) `
## Requirements
Package requirements are handled using pip. To install them do
` pip install -r requirements.txt `
## Tests
Testing is set up using [pytest](http://pytest.org) and coverage is handled with the pytest-cov plugin.
Run your tests with `pytest` in the root directory.
Coverage is ran by default and is set in the `pytest.ini` file. To see an html output of coverage open `htmlcov/index.html` after running the tests.
## Pipe steps:
Cleaning and feature engineering –> segmentation –> classification –> information extraction
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for legal_doc_processing-1.1.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | a785d99944c9df3ffa83ab16e8fa3cbf9ed98961a22d9e1cb107b7b1c465427f |
|
MD5 | aac8b0716576a2e7b7044f8e405a58cb |
|
BLAKE2b-256 | e7ae7d2555d428f87e890bcc30898ad8c93634eef13db08e18e7f7ebc1398f80 |
Hashes for legal_doc_processing-1.1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 917ddcb6d5d259eaa18b7b1fe160036d414172545df54ea452b867fe6af55b36 |
|
MD5 | 65276a8163a42da58fe76278e5faba5f |
|
BLAKE2b-256 | d2871da8e3592c1cb30d6bb3986fafb0cc22f095b3e4dd790a4e9b0bd9cfc3f1 |