Skip to main content

Flair tokenizer adapted to French court decisions using spacy tokenization

Project description

JuriSpacyTokenizer

Description

Tokenizer(s) used in our NLP projects. Built using Flair and spaCy

Installation

pip install jurispacy-tokenizer
python -m spacy download fr_core_news_sm-3.8.0

Usage

Tokenize strings

You can use this library to tokenize a string into a list of strings representing tokens:

from jurispacy_tokenizer import JuriSpacyTokenizer

tokenizer = JuriSpacyTokenizer()
text = "M.Paul et Jean-Pierre sont heureux."

tokens = tokenizer.tokenize(text)

for token in tokens:
    print(token)

This should ouptut:

M.
Paul
et
Jean-Pierre
sont
heureux
.

Tokenize longer text into sentences

You can also parse longer text to create Flair Sentence objects:

from jurispacy_tokenizer import JuriSpacyTokenizer

tokenizer = JuriSpacyTokenizer()

text = """Bonjour tout le monde! Je m'appelle Amaury.

Je travaille avec Paul."""

sentences = tokenizer.get_tokenized_sentences(text)

for s in sentences:
    print(s)

This should output:

Sentence[5]: "Bonjour tout le monde!"
Sentence[5]: "Je m'appelle Amaury."
Sentence[5]: "Je travaille avec Paul."

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jurispacy_tokenizer-1.2.1.tar.gz (4.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

jurispacy_tokenizer-1.2.1-py3-none-any.whl (4.6 kB view details)

Uploaded Python 3

File details

Details for the file jurispacy_tokenizer-1.2.1.tar.gz.

File metadata

  • Download URL: jurispacy_tokenizer-1.2.1.tar.gz
  • Upload date:
  • Size: 4.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for jurispacy_tokenizer-1.2.1.tar.gz
Algorithm Hash digest
SHA256 3733ee99e3b4fd67fe79fba517747034514d39ed9b7e54f9d372c12d49a7c60f
MD5 2b0a0b862e7fd909226845d614e711cc
BLAKE2b-256 3a9b6752ea0864f11bd13486406c705f47c157045c4f3c132ced9c261fffdfcd

See more details on using hashes here.

File details

Details for the file jurispacy_tokenizer-1.2.1-py3-none-any.whl.

File metadata

File hashes

Hashes for jurispacy_tokenizer-1.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 60b281d124f32fd0b2a3c0cc3cb5e4f381f4f3803177959d126f06a95caa09e6
MD5 d70022d6cebb578f2a28df634c3e66dd
BLAKE2b-256 5a5b7d0d14ef244a3338b22611d3f2cf34c6fe8835559f4b0368c44c719d42c7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page