Skip to main content

A citation extraction tool.

Project description

eyecite

eyecite is an open source tool for extracting legal citations from text strings. Originally built for use with [Courtlistener.com](https://www.courtlistener.com/), it is now a freestanding package.

Its main purpose is to facilitate the conversion of raw text into structured citation entities. It includes mechanisms to recognize and extract “full” citation references (e.g., Bush v. Gore, 531 U.S. 98), “short form” references (e.g., 531 U.S., at 99), “supra” references (e.g., Bush, supra, at 100), “id.” references (e.g., Id., at 101), and “ibid.” references (e.g., Ibid.).

Further development is intended and all contributors, corrections, and additions are welcome.

Background

This project is the culmination of [years](https://free.law/2012/05/11/building-a-citator-on-courtlistener/) [of](https://free.law/2015/11/30/our-new-citation-finder/) [work](https://free.law/2020/03/05/citation-data-gets-richer/) to build a citator within Courtlistener.com. This project represents the next step in that development: Decoupling the parsing logic and exposing it for third-party use as a standalone Python package.

Quickstart

Simply feed in a raw string of text (or HTML), and receive a list of structured citation objects, ordered in the sequence that they appear in the text.

from eyecite.find_citations import get_citations

text = 'bob lissner v. test 1 U.S. 12, 347-348 (4th Cir. 1982)'
found_citations = get_citations(text)

returns:
[FullCitation(plaintiff='lissner', defendant='test', volume=1,
           reporter='U.S.', page='12', year=1982,
           extra='347-348', court='ca4',
           canonical_reporter='U.S.', lookup_index=0,
           reporter_index=5, reporter_found='U.S.')]

Once these Citation objects are obtained, you can find them in the original text by calling their as_regex() methods, which return a bespoke regex representation for each extracted citation.

citation_regex = found_citations[0].as_regex()

returns:
'1(\s+)U\.S\.(\s+)12(\s?)'
import re

match = re.search(citation_regex, text)

returns:
<re.Match object; span=(20, 29), match='1 U.S. 12'>

Options

get_citations(), the main executable function, takes several parameters.

  1. html ==> bool; whether the passed string is HTML or not

  2. do_post_citation ==> bool; whether additional, post-citation information should be extracted (e.g., the court, year, and/or date range of the citation)

  3. do_defendant ==> bool; whether the pre-citation defendant (and possibily plaintiff) reference should be extracted

  4. disambiguate ==> bool; whether each citation’s (possibly ambiguous) reporter should be resolved to its (unambiguous) form

Some notes

Some things to keep in mind are:

  1. This project depends on information made available in two other Free Law Project packages, [reporters-db](https://github.com/freelawproject/reporters-db) and [courts-db](https://github.com/freelawproject/courts-db).

  2. This package performs no matching or resolution action. In other words, it is up to the user to decide what to do with the “short form,” “supra,” “id.,” and “ibid.” citations that this tool extracts. In theory, these citations are all references to “full” citations also mentioned in the text – and are therefore in principle resolvable to those citations – but this task is beyond the scope of this parsing package. See [here](https://github.com/freelawproject/courtlistener/tree/master/cl/citations) for an example of how Courtlistener implements this package and handles this problem.

Installation

Installing eyecite is easy.

sh
pip install eyecite

Or install the latest dev version from github

sh
pip install git+https://github.com/freelawproject/eyecite.git@master

Deployment

  1. Update version info in setup.py and in pyproject.toml.

For an automated deployment, tag the commit with vx.y.z, and push it to master. An automated deploy will be completed for you.

For a manual deployment, follow these steps:

  1. Install the requirements using poetry install

  2. Set up a config file at ~/.pypirc

  3. Generate a universal distribution that works in py2 and py3 (see setup.cfg)

sh
python setup.py sdist bdist_wheel

5. Upload the distributions

sh
twine upload dist/* -r pypi (or pypitest)

Testing

eyecite comes with a robust test suite of different citation strings that it is equipped to handle. Run these tests as follows:

python3 -m unittest discover -s tests -p 'test_*.py'

License

This repository is available under the permissive BSD license, making it easy and safe to incorporate in your own libraries.

Pull and feature requests welcome. Online editing in GitHub is possible (and easy!).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

eyecite-1.0.0.tar.gz (17.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

eyecite-1.0.0-py3-none-any.whl (17.6 kB view details)

Uploaded Python 3

File details

Details for the file eyecite-1.0.0.tar.gz.

File metadata

  • Download URL: eyecite-1.0.0.tar.gz
  • Upload date:
  • Size: 17.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/51.0.0 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.7.9

File hashes

Hashes for eyecite-1.0.0.tar.gz
Algorithm Hash digest
SHA256 980c53b202314fd26d1980130073ad2cc5bffba7765a4f63b3245986056c5033
MD5 9ca87215f60c7d91aa029167f38c0543
BLAKE2b-256 e79947eb3b1139a93b0a3eae19386cc07c7f229f50a21a6166c99f455fc4d208

See more details on using hashes here.

File details

Details for the file eyecite-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: eyecite-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 17.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/51.0.0 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.7.9

File hashes

Hashes for eyecite-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c23eb87f0c771a3b9b7df1c90dab45fc223e3bf97e57b126abfcd1e1da3dedec
MD5 b2337671efc0ffafd24b1666b610c6eb
BLAKE2b-256 43e0718c7a43663d7ec949bb2d29f248cdea8755e473658bc63a61b560f18de2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page