Skip to main content

Parser for entity/address free text (based on libpostal/spacy)

Project description

Payment text parser

Description

Inputs a text string and parses entity/address free text input to:

  • Flag entity fields ('ORG' for companies, 'PER' for individuals, 'PER_ORG' for uncertain decision) <= Based on Spacy
  • Flag address components of the address ('house number', 'town', 'country', ...) <= Based on Libpostal
  • Flag other fields (i.e. not entity nor address) with POS tags ('NE', 'ADJ', 'NN', ..) <= Based on CoreNLP

More generally, the package includes following features:

  • A data generator for entity/address fields and free text fields (based on open data)
  • A model distinguishing entity/address field from free text (using Keras/TensorFlow and CoreNLP) to apply dedicated heuristics
  • A series of cleaning_postprocessing steps including true case recognition (CoreNLP)
  • A parser of entity/address/other fields as described above using re-trained vanilla Spacy model (based on labeled open data)
  • Simple heuristics and metrics applied after the parsing to improve accuracy
  • Part-of-speech (POS) tagging of the remaining flags (Spacy and/or CoreNLP) for downstream processing

This package is specifically intended to be used together with the upstream Swiftflow pipeline that parses all fields from the SWIFT MT messages, including the entity/address and free text fields, which are decisive for inter-banking transactional communication.

Installation

The package uses essentally Libpostal and Spacy. Also, it uses Keras on Tensorflow to recognize if the text input is a free text or an entity/address text.

Pre-requiste: Libpostal

Refer to Libpostal installation. Once Libpostal is installed, the Python binder postal will be installed as part of the package with pip (see below)

Payment_text_parser

The other dependencies, including Spacy, will be installed via pip on the present package:

Create environment

One recommends to use Python 3.7.

Native Python:

/usr/local/bin/python3 -m venv <my_env>
source <my_env>/bin/activate`

Conda:

conda create --name <my_env> python=3.7` conda activate <my_env>

From pip
pip install payment-text-parser --use-feature=2020-resolver
python -m spacy download de_core_news_sm
From git
pip install git+https://gitlab.com/alpina-analytics/payment_text_parser.git
python -m spacy download de_core_news_sm

From requirements.txt

git clone https://gitlab.com/alpina-analytics/payment_text_parser.git
cd payment_text_parser
pip install -r requirements.txt
python -m spacy download de_core_news_sm
export PYTHONPATH=$(pwd)

Usage

Script

from payment_text_parser.entity_extractor.entity_extractor import ExtractorClass
e = ExtractorClass(text)
d_res = e.d_res

Webserver

# Launch
python main.py

# Test
curl -H "Content-type: application/json" -X POST http://127.0.0.1:5000/parse -d '{"text":"John Deere Les Abues 2 75000 Paris"}'

Optional : start Stanford NLP server

Required if :

  • Field type detection enabled by ExtractorClass(text,check_field_type=True)
  • POS-tagging of rest fields enabled by ExtractorClass(text,create_nlp_tags_rest_text=True) If not started, an warning message will be prompted, however full processing can still take place.

CoreNLP server can be started as follow:

cd ./core_nlp/stanford-corenlp-full-2018-10-05
java -Xmx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer \
-serverProperties StanfordCoreNLP-german.properties \
-preload tokenize,ssplit,pos,parse \
-status_port 9000  -port 9000 -timeout 15000

References

Spacy

https://spacy.io/

Libpostal

https://github.com/openvenues/libpostal

CoreNLP

https://stackoverflow.com/questions/33259191/installing-libicu-dev-on-mac https://stackoverflow.com/questions/50217214/import-error-for-icu-in-mac-and-ubuntu-although-pyicu-is-installed-correctly/50364835#50364835 https://www.khalidalnajjar.com/setup-use-stanford-corenlp-server-python/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

payment_text_parser-0.0.9.tar.gz (29.3 MB view details)

Uploaded Source

Built Distribution

payment_text_parser-0.0.9-py3-none-any.whl (29.7 MB view details)

Uploaded Python 3

File details

Details for the file payment_text_parser-0.0.9.tar.gz.

File metadata

  • Download URL: payment_text_parser-0.0.9.tar.gz
  • Upload date:
  • Size: 29.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.2.0.post20200714 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.8.3

File hashes

Hashes for payment_text_parser-0.0.9.tar.gz
Algorithm Hash digest
SHA256 a4fe18fe3e91d7f220f361b081820d17051b49a6564b2f570d80a8b1abb096c9
MD5 2975cf65e789058495aba21a8e7ae8d5
BLAKE2b-256 e72a20b4ed82bf9c72dab152312b0455fcfda15ac11c653195a5cfa15b4c7b64

See more details on using hashes here.

File details

Details for the file payment_text_parser-0.0.9-py3-none-any.whl.

File metadata

  • Download URL: payment_text_parser-0.0.9-py3-none-any.whl
  • Upload date:
  • Size: 29.7 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.2.0.post20200714 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.8.3

File hashes

Hashes for payment_text_parser-0.0.9-py3-none-any.whl
Algorithm Hash digest
SHA256 a394085a157070f71259e48e03a57dd6ac275b739d54aac14007a0b67667c087
MD5 bf6746af81911293053d4605728a5c91
BLAKE2b-256 ed6a23c67d2f0629e34d9aa550c63e368fa0907e87724ebd89eefe8493577734

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page