Skip to main content

Flair tokenizer adapted to French court decisions using spacy tokenization

Project description

JuriSpacyTokenizer

Description

Tokenizer(s) used in our NLP projects. Built using Flair and spaCy

Installation

pip install jurispacy-tokenizer
python -m spacy download fr_core_news_sm-3.6.0

Usage

Tokenize strings

You can use this library to tokenize a string into a list of strings representing tokens:

from jurispacy_tokenizer import JuriSpacyTokenizer

tokenizer = JuriSpacyTokenizer()
text = "M.Paul et Jean-Pierre sont heureux."

tokens = tokenizer.tokenize(text)

for token in tokens:
    print(token)

This should ouptut:

M.
Paul
et
Jean-Pierre
sont
heureux
.

Tokenize longer text into sentences

You can also parse longer text to create Flair Sentence objects:

from jurispacy_tokenizer import JuriSpacyTokenizer

tokenizer = JuriSpacyTokenizer()

text = """Bonjour tout le monde! Je m'appelle Amaury.

Je travaille avec Paul."""

sentences = tokenizer.get_tokenized_sentences(text)

for s in sentences:
    print(s)

This should output:

Sentence[5]: "Bonjour tout le monde!"
Sentence[5]: "Je m'appelle Amaury."
Sentence[5]: "Je travaille avec Paul."

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jurispacy_tokenizer-1.2.0.tar.gz (5.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

jurispacy_tokenizer-1.2.0-py3-none-any.whl (4.6 kB view details)

Uploaded Python 3

File details

Details for the file jurispacy_tokenizer-1.2.0.tar.gz.

File metadata

  • Download URL: jurispacy_tokenizer-1.2.0.tar.gz
  • Upload date:
  • Size: 5.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.32.5

File hashes

Hashes for jurispacy_tokenizer-1.2.0.tar.gz
Algorithm Hash digest
SHA256 9152d7dcf7e56d02648f61655163f56a8733b6001076cac654715767354f0508
MD5 8a5210b93db7d4258c346bb2cad40e96
BLAKE2b-256 c08e566dec4055b650973d119be3ac89e9da60d3df8f2356a4bec178715653e4

See more details on using hashes here.

File details

Details for the file jurispacy_tokenizer-1.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for jurispacy_tokenizer-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2e2bd8ba19202ebd29e38d8f9d3c39ed520532f10ef85277f81f6620998ce5f3
MD5 7b92fb260336d04a2e820d255dad2e4d
BLAKE2b-256 459407fd233a55fff6235c154db2cbef66680b60df278d13c8d76a762aa208fa

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page