Skip to main content

No project description provided

Project description

Installation from pip3

pip3 install --verbose plaintext_analyzer 
python -m spacy download en_core_web_trf
python -m spacy download es_dep_news_trf

Usage

Please refer to api docs.

Excutable usage

  • Get vocabularies from plaintext file
pta_vocab --source en_plaintext.txt --stype FILE --lang en  
  • Get vocabularies from text
pta_vocab --source "The typical Bangladeshi breakfast consists of flour-based flatbreads such as chapati, roti or paratha, served with a curry. Usually the curry can be vegetable, home-fried potatoes, or scrambled eggs. The breakfast varies according to location and the eater's income. In villages and rural areas, rice with curry (potato mash, dal ) is mostly preferred by day laborers. In the city, sliced bread with jam or jelly is chosen due to time efficiency. In Bangladesh tea is preferred to coffee and is an essential part of most breakfasts. Having toasted biscuits, bread or puffed rice with tea is also very popular." --stype RAW --lang en  
  • Get vocabularies from plaintext file, and write to csv files
pta_vocab --source en_plaintext.txt --stype FILE --lang en --dstname en_vocab
  • Get vocabularies from text, and write to csv file
pta_vocab --source "The typical Bangladeshi breakfast consists of flour-based flatbreads such as chapati, roti or paratha, served with a curry. Usually the curry can be vegetable, home-fried potatoes, or scrambled eggs. The breakfast varies according to location and the eater's income. In villages and rural areas, rice with curry (potato mash, dal ) is mostly preferred by day laborers. In the city, sliced bread with jam or jelly is chosen due to time efficiency. In Bangladesh tea is preferred to coffee and is an essential part of most breakfasts. Having toasted biscuits, bread or puffed rice with tea is also very popular." --stype RAW --lang en --dstname en_vocab 
  • Get phrases from plaintext file
pta_phrase --source en_plaintext.txt --stype FILE --lang en  
  • Get phrases from text
pta_phrase --source "The typical Bangladeshi breakfast consists of flour-based flatbreads such as chapati, roti or paratha, served with a curry. Usually the curry can be vegetable, home-fried potatoes, or scrambled eggs. The breakfast varies according to location and the eater's income. In villages and rural areas, rice with curry (potato mash, dal ) is mostly preferred by day laborers. In the city, sliced bread with jam or jelly is chosen due to time efficiency. In Bangladesh tea is preferred to coffee and is an essential part of most breakfasts. Having toasted biscuits, bread or puffed rice with tea is also very popular." --stype RAW --lang en  
  • Get phrases from plaintext file, and write to csv files
pta_phrase --source en_plaintext.txt --stype FILE --lang en --dstname en_phrase
  • Get phrases from text, and write to csv file
pta_phrase --source "The typical Bangladeshi breakfast consists of flour-based flatbreads such as chapati, roti or paratha, served with a curry. Usually the curry can be vegetable, home-fried potatoes, or scrambled eggs. The breakfast varies according to location and the eater's income. In villages and rural areas, rice with curry (potato mash, dal ) is mostly preferred by day laborers. In the city, sliced bread with jam or jelly is chosen due to time efficiency. In Bangladesh tea is preferred to coffee and is an essential part of most breakfasts. Having toasted biscuits, bread or puffed rice with tea is also very popular." --stype RAW --lang en --dstname en_phrase 

Package usage

def parser_vocab(source, stype, lang):

  sf = PlaintextReader(source, stype, lang)
  sens = sf.sentences

  analyzer = VocabAnalyzer(lang)
  exs = analyzer.overview_vocabs(sens)

  print(exs)

def parser_phrase(source, stype, lang):

  sf = PlaintextReader(source, stype, lang)
  sens = sf.sentences

  analyzer = PhraseAnalyzer(lang)
  exs = analyzer.overview_phrases(sens)

  print(exs)

Development

Clone project

git clone https://github.com/qishe-nlp/plaintext-analyzer.git

Install poetry

Install dependencies

poetry update

Test

poetry run pytest -rP

which run tests under tests/*

Execute

poetry run pta_vocab --help
poetry run pta_phrase --help

Create sphinx docs

poetry shell
cd apidocs
sphinx-apidoc -f -o source ../plaintext_analyzer
make html
python -m http.server -d build/html

Host docs on github pages

cp -rf apidocs/build/html/* docs/

Build

  • Change version in pyproject.toml and plaintext_analyzer/__init__.py
  • Build python package by poetry build

Git commit and push

Publish from local dev env

  • Set pypi test environment variables in poetry, refer to poetry doc
  • Publish to pypi test by poetry publish -r test

Publish through CI

git tag [x.x.x]
git push origin master

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

plaintext-analyzer-0.1.5.tar.gz (5.9 kB view details)

Uploaded Source

Built Distribution

plaintext_analyzer-0.1.5-py3-none-any.whl (7.1 kB view details)

Uploaded Python 3

File details

Details for the file plaintext-analyzer-0.1.5.tar.gz.

File metadata

  • Download URL: plaintext-analyzer-0.1.5.tar.gz
  • Upload date:
  • Size: 5.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.9 CPython/3.8.2 Linux/5.8.0-1041-azure

File hashes

Hashes for plaintext-analyzer-0.1.5.tar.gz
Algorithm Hash digest
SHA256 16d466ae30b9843b98cb6becea2227b617428b736f854a62e10cf318ab31ce7d
MD5 e363c2e77695e3a55278c7b84a26ac9f
BLAKE2b-256 b05dbd481b6c3dc4cec059a779b7051735661cb927403b43c2d64ae0bf4e8aac

See more details on using hashes here.

Provenance

File details

Details for the file plaintext_analyzer-0.1.5-py3-none-any.whl.

File metadata

File hashes

Hashes for plaintext_analyzer-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 9f7d66df9684042428c385dbc72df31712801c4be9bc5bfe7615e182b6e3cddd
MD5 90f32c6bf7fe43219f598fb73e818a8a
BLAKE2b-256 e53a0544bf4125bca5b00a8362ed8c0565e0950ac301b5134a66e955695636c1

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page