Skip to main content

No project description provided

Project description

Installation from pip3

pip3 install --verbose plaintext_analyzer 
python -m spacy download en_core_web_trf
python -m spacy download es_dep_news_trf

Usage

Please refer to api docs.

Excutable usage

  • Get vocabularies from plaintext file
pta_vocab --source en_plaintext.txt --stype FILE --lang en  
  • Get vocabularies from text
pta_vocab --source "The typical Bangladeshi breakfast consists of flour-based flatbreads such as chapati, roti or paratha, served with a curry. Usually the curry can be vegetable, home-fried potatoes, or scrambled eggs. The breakfast varies according to location and the eater's income. In villages and rural areas, rice with curry (potato mash, dal ) is mostly preferred by day laborers. In the city, sliced bread with jam or jelly is chosen due to time efficiency. In Bangladesh tea is preferred to coffee and is an essential part of most breakfasts. Having toasted biscuits, bread or puffed rice with tea is also very popular." --stype RAW --lang en  
  • Get vocabularies from plaintext file, and write to csv files
pta_vocab --source en_plaintext.txt --stype FILE --lang en --dstname en_vocab
  • Get vocabularies from text, and write to csv file
pta_vocab --source "The typical Bangladeshi breakfast consists of flour-based flatbreads such as chapati, roti or paratha, served with a curry. Usually the curry can be vegetable, home-fried potatoes, or scrambled eggs. The breakfast varies according to location and the eater's income. In villages and rural areas, rice with curry (potato mash, dal ) is mostly preferred by day laborers. In the city, sliced bread with jam or jelly is chosen due to time efficiency. In Bangladesh tea is preferred to coffee and is an essential part of most breakfasts. Having toasted biscuits, bread or puffed rice with tea is also very popular." --stype RAW --lang en --dstname en_vocab 
  • Get phrases from plaintext file
pta_phrase --source en_plaintext.txt --stype FILE --lang en  
  • Get phrases from text
pta_phrase --source "The typical Bangladeshi breakfast consists of flour-based flatbreads such as chapati, roti or paratha, served with a curry. Usually the curry can be vegetable, home-fried potatoes, or scrambled eggs. The breakfast varies according to location and the eater's income. In villages and rural areas, rice with curry (potato mash, dal ) is mostly preferred by day laborers. In the city, sliced bread with jam or jelly is chosen due to time efficiency. In Bangladesh tea is preferred to coffee and is an essential part of most breakfasts. Having toasted biscuits, bread or puffed rice with tea is also very popular." --stype RAW --lang en  
  • Get phrases from plaintext file, and write to csv files
pta_phrase --source en_plaintext.txt --stype FILE --lang en --dstname en_phrase
  • Get phrases from text, and write to csv file
pta_phrase --source "The typical Bangladeshi breakfast consists of flour-based flatbreads such as chapati, roti or paratha, served with a curry. Usually the curry can be vegetable, home-fried potatoes, or scrambled eggs. The breakfast varies according to location and the eater's income. In villages and rural areas, rice with curry (potato mash, dal ) is mostly preferred by day laborers. In the city, sliced bread with jam or jelly is chosen due to time efficiency. In Bangladesh tea is preferred to coffee and is an essential part of most breakfasts. Having toasted biscuits, bread or puffed rice with tea is also very popular." --stype RAW --lang en --dstname en_phrase 

Package usage

def parser_vocab(source, stype, lang):

  sf = PlaintextReader(source, stype, lang)
  sens = sf.sentences

  analyzer = VocabAnalyzer(lang)
  exs = analyzer.overview_vocabs(sens)

  print(exs)

def parser_phrase(source, stype, lang):

  sf = PlaintextReader(source, stype, lang)
  sens = sf.sentences

  analyzer = PhraseAnalyzer(lang)
  exs = analyzer.overview_phrases(sens)

  print(exs)

Development

Clone project

git clone https://github.com/qishe-nlp/plaintext-analyzer.git

Install poetry

Install dependencies

poetry update

Test

poetry run pytest -rP

which run tests under tests/*

Execute

poetry run pta_vocab --help
poetry run pta_phrase --help

Create sphinx docs

poetry shell
cd apidocs
sphinx-apidoc -f -o source ../plaintext_analyzer
make html
python -m http.server -d build/html

Host docs on github pages

cp -rf apidocs/build/html/* docs/

Build

  • Change version in pyproject.toml and plaintext_analyzer/__init__.py
  • Build python package by poetry build

Git commit and push

Publish from local dev env

  • Set pypi test environment variables in poetry, refer to poetry doc
  • Publish to pypi test by poetry publish -r test

Publish through CI

git tag [x.x.x]
git push origin master

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

plaintext-analyzer-0.1.9.tar.gz (7.3 kB view details)

Uploaded Source

Built Distribution

plaintext_analyzer-0.1.9-py3-none-any.whl (8.3 kB view details)

Uploaded Python 3

File details

Details for the file plaintext-analyzer-0.1.9.tar.gz.

File metadata

  • Download URL: plaintext-analyzer-0.1.9.tar.gz
  • Upload date:
  • Size: 7.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.12 CPython/3.10.0 Linux/5.11.0-1022-azure

File hashes

Hashes for plaintext-analyzer-0.1.9.tar.gz
Algorithm Hash digest
SHA256 1f5aa8c82feccfac9c84472a47d227267b059291013564b19359ec4c095da9d8
MD5 8437ba1bf9110186baebb4f021da3c9e
BLAKE2b-256 6b631bee02a437da521f60ef02f346ded7e1277c46ccec330f622e03c5c05741

See more details on using hashes here.

Provenance

File details

Details for the file plaintext_analyzer-0.1.9-py3-none-any.whl.

File metadata

  • Download URL: plaintext_analyzer-0.1.9-py3-none-any.whl
  • Upload date:
  • Size: 8.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.12 CPython/3.10.0 Linux/5.11.0-1022-azure

File hashes

Hashes for plaintext_analyzer-0.1.9-py3-none-any.whl
Algorithm Hash digest
SHA256 869874eb9dd3a2e3c4cb7817025f8e0f0ca569f26a366c6080351d13014e9fcd
MD5 722d51e6e73430c9e62147644515ea40
BLAKE2b-256 d3d04b8f92af552de27cb7b423acc9fc47a4621ca228856e6453ec1ab2fa044e

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page