Alacorder collects and processes case detail PDFs into data tables suitable for research purposes. Alacorder also generates compressed text archives from the source PDFs to speed future data collection from the same set of cases. Google Chrome required for direct case PDF retrieval from Alacourt.com. `alacorder.snowpalace` requires additional packages `polars`, `xlsxwriter`, and `xlsx2csv`.
Project description
___ __ __
/ | / /___ _________ _________/ /__ _____
/ /| | / / __ `/ ___/ __ \/ ___/ __ / _ \/ ___/
/ ___ |/ / /_/ / /__/ /_/ / / / /_/ / __/ /
/_/ |_/_/\__,_/\___/\____/_/ \__,_/\___/_/
ALACORDER 78.6
Getting Started with Alacorder
Alacorder collects and processes case detail PDFs into data tables suitable for research purposes.
Preview
alacorderonpolarswithalacorder.snowpalace. Make a copy of your data before trying outsnowpalace.
GitHub | PyPI | Report an issue
Installation
Alacorder can run on most devices. If your device can run Python 3.9 or later, it can run Alacorder.
- To skip installation, download prebuilt executable for your OS (MacOS or Windows) from GitHub.
- If you already have Python installed, open Command Prompt or Terminal and enter
pip install alacorderorpip3 install alacorder. - Install Anaconda Distribution to install Alacorder if the above methods do not work.
- After installation, create a virtual environment, open a terminal, and then repeat these instructions.
Usage: python -m alacorder [OPTIONS] COMMAND [ARGS]...
Alacorder retrieves case detail PDFs from Alacourt.com and processes them
into text archives and data tables suitable for research purposes. Invoke
`python -m alacorder start`) to launch graphical user interface.
Options:
--version Show the version and exit.
--help Show this message and exit.
Commands:
append Append one case text archive to another
archive Create full text archive from case PDFs
fetch Fetch cases from Alacourt.com with input query spreadsheet...
mark Mark query template sheet with cases found in archive or PDF...
start Launch graphical user interface
table Export data tables from archive or directory
The alacorder package includes a desktop interface, a command line interface, and a python module for parsing case PDFs.
Once you have a Python environment up and running, you can launch the guided interface in two ways:
-
Utilize the graphical interface: Use the command line tool
python -m alacorder start, orpython3 -m alacorder start. -
Use the command line interface: Add the flag
--helpor simply runpython -m alacorderto access list of subcommands for command line interface.
Alacorder can be used without writing any code, and exports to common formats like Excel (.xls, .xlsx), Stata (.dta), CSV (.csv), JSON (.json), and Apache Parquet (.parquet).
- Alacorder compresses case text into case archives (
.pkl.xz,.parquet) to save storage and processing time.
Special Queries with alac
from alacorder import alac
For more advanced queries, the alac module can extract fields and tables from case records with just a few lines of code.
-
Call
alac.setinputs("/pdf/dir/")andalac.setoutputs("/to/table.xlsx")to configure your input and output paths. Then callalac.set(input_conf, output_conf, **kwargs)to complete the configuration process. Feed the output to any of thealac.write...()functions to start a task. -
Call
alac.archive(config)to export a full text archive. It's recommended that you create a full text archive (.pkl.xz) file before making tables from your data. Full text archives can be scanned faster than PDF directories and require less storage. Full text archives can be imported to Alacorder the same way as PDF directories. -
Call
alac.tables(config)to export detailed case information tables. If export type is.xlsor.xlsx, thecases,fees, andchargestables will be exported. -
Call
alac.charges(config)to exportchargestable only. -
Call
alac.fees(config)to exportfeestable only. -
Call
alac.cases(config)to exportcasestable orallif output extension supportsmultitableexport.
import warnings
warnings.filterwarnings('ignore')
from alacorder import alac
pdf_directory = "/Users/crimson/Desktop/Tutwiler/"
archive = "/Users/crimson/Desktop/Tutwiler.pkl.xz"
tables = "/Users/crimson/Desktop/Tutwiler.xlsx"
pdfconf = alac.setinputs(pdf_directory)
arcconf = alac.setoutputs(archive)
# write archive to Tutwiler.pkl.xz
c = alac.set(pdfconf, arcconf)
alac.archive(c)
print("Full text archive complete. Now processing case information into tables at " + tables)
d = alac.setpaths(archive, tables) # runs setinputs(), setoutputs() and set() at once
alac.tables(d)
# write tables to Tutwiler.xlsx
alac.tables(tabconf)
Custom Parsing with alac.map()
If you need to conduct a custom search of case records, Alacorder has the tools you need to extract custom fields from case PDFs without any fuss. Try out alac.map() to search thousands of cases in seconds.
from alacorder import alac
import re
archive = "/Users/crimson/Desktop/Tutwiler.pkl.xz"
tables = "/Users/crimson/Desktop/Tutwiler.xlsx"
def findName(text):
name = ""
if bool(re.search(r'(?a)(VS\.|V\.)(.+)(Case)*', text, re.MULTILINE)) == True:
name = re.search(r'(?a)(VS\.|V\.)(.+)(Case)*', text, re.MULTILINE).group(2).replace("Case Number:","").strip()
else:
if bool(re.search(r'(?:DOB)(.+)(?:Name)', text, re.MULTILINE)) == True:
name = re.search(r'(?:DOB)(.+)(?:Name)', text, re.MULTILINE).group(1).replace(":","").replace("Case Number:","").strip()
return name
c = alac.setpaths(archive, tables, count=2000) # set configuration
alac.map(c, findName, alac.getPaymentToRestore) # Name, Convictions table
Working with case data in Python
Out of the box, Alacorder exports to .xlsx, .xls, .csv, .json, .dta, and .parquet. But you can use alac, pandas, and other python libraries to create your own data collection workflows and customize Alacorder exports.
The snippet below prints the fee sheets from a directory of case PDFs as it reads them.
from alacorder import alac
c = alac.setpaths("/Users/crimson/Desktop/Tutwiler/","/Users/crimson/Desktop/Tutwiler.xls")
for path in c['contents']:
text = alac.getPDFText(path)
cnum = alac.getCaseNumber(text)
charges_outputs = alac.getCharges(text, cnum)
if len(charges_outputs[0]) > 1:
print(charges_outputs[0])
Extending Alacorder with pandas and other tools
Alacorder runs on pandas, a python library you can use to work with and analyze tabular data. pandas can read from and write to all major data storage formats. It can connect to a wide variety of services to provide for easy export.
import pandas as pd
contents = pd.read_pickle("/path/to/pkl")
If you would like to visualize data without exporting to Excel or another format, create a jupyter notebook and import a data visualization library like matplotlib to get started. The resources below can help you get started. jupyter is a Python kernel you can use to create interactive notebooks for data analysis and other purposes. It can be installed using pip install jupyter or pip3 install jupyter and launched using jupyter notebook. Your device may already be equipped to view .ipynb notebooks.
Resources
pandascheat sheet- regex cheat sheet
- anaconda (tutorials on python data analysis)
- The Python Tutorial
jupyterintroduction
© 2023 Sam Robson
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file alacorder-78.6.0.5.tar.gz.
File metadata
- Download URL: alacorder-78.6.0.5.tar.gz
- Upload date:
- Size: 80.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2aacd332d38471d03b4f21f74e241566865599dc2bf8f1c1d35dcc3fc28053a8
|
|
| MD5 |
cea9fdd83efb9ecf973266ce4fe42698
|
|
| BLAKE2b-256 |
7998dc437533341fea034ca8999e46c52eced78492df0104ea49b29d064a0364
|
File details
Details for the file alacorder-78.6.0.5-py3-none-any.whl.
File metadata
- Download URL: alacorder-78.6.0.5-py3-none-any.whl
- Upload date:
- Size: 81.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8ca41290f2cb828a3d3be9698ad3cc8c84850ba8cd9b1853f332eca2ba5103af
|
|
| MD5 |
453484c591f80dd1845a695394bc3476
|
|
| BLAKE2b-256 |
68f9fad9b21aeaf8f18fa77942a74d9caba06c0cf8fd57c03490156d19456ec4
|