Skip to main content

No project description provided

Project description

Installation from pip3

pip3 install --verbose plaintext_analyzer 
python -m spacy download en_core_web_trf
python -m spacy download es_dep_news_trf

Usage

Please refer to api docs.

Excutable usage

  • Get vocabularies from plaintext file
pta_vocab --source en_plaintext.txt --stype FILE --lang en  
  • Get vocabularies from text
pta_vocab --source "The typical Bangladeshi breakfast consists of flour-based flatbreads such as chapati, roti or paratha, served with a curry. Usually the curry can be vegetable, home-fried potatoes, or scrambled eggs. The breakfast varies according to location and the eater's income. In villages and rural areas, rice with curry (potato mash, dal ) is mostly preferred by day laborers. In the city, sliced bread with jam or jelly is chosen due to time efficiency. In Bangladesh tea is preferred to coffee and is an essential part of most breakfasts. Having toasted biscuits, bread or puffed rice with tea is also very popular." --stype RAW --lang en  
  • Get vocabularies from plaintext file, and write to csv files
pta_vocab --source en_plaintext.txt --stype FILE --lang en --dstname en_vocab
  • Get vocabularies from text, and write to csv file
pta_vocab --source "The typical Bangladeshi breakfast consists of flour-based flatbreads such as chapati, roti or paratha, served with a curry. Usually the curry can be vegetable, home-fried potatoes, or scrambled eggs. The breakfast varies according to location and the eater's income. In villages and rural areas, rice with curry (potato mash, dal ) is mostly preferred by day laborers. In the city, sliced bread with jam or jelly is chosen due to time efficiency. In Bangladesh tea is preferred to coffee and is an essential part of most breakfasts. Having toasted biscuits, bread or puffed rice with tea is also very popular." --stype RAW --lang en --dstname en_vocab 
  • Get phrases from plaintext file
pta_phrase --source en_plaintext.txt --stype FILE --lang en  
  • Get phrases from text
pta_phrase --source "The typical Bangladeshi breakfast consists of flour-based flatbreads such as chapati, roti or paratha, served with a curry. Usually the curry can be vegetable, home-fried potatoes, or scrambled eggs. The breakfast varies according to location and the eater's income. In villages and rural areas, rice with curry (potato mash, dal ) is mostly preferred by day laborers. In the city, sliced bread with jam or jelly is chosen due to time efficiency. In Bangladesh tea is preferred to coffee and is an essential part of most breakfasts. Having toasted biscuits, bread or puffed rice with tea is also very popular." --stype RAW --lang en  
  • Get phrases from plaintext file, and write to csv files
pta_phrase --source en_plaintext.txt --stype FILE --lang en --dstname en_phrase
  • Get phrases from text, and write to csv file
pta_phrase --source "The typical Bangladeshi breakfast consists of flour-based flatbreads such as chapati, roti or paratha, served with a curry. Usually the curry can be vegetable, home-fried potatoes, or scrambled eggs. The breakfast varies according to location and the eater's income. In villages and rural areas, rice with curry (potato mash, dal ) is mostly preferred by day laborers. In the city, sliced bread with jam or jelly is chosen due to time efficiency. In Bangladesh tea is preferred to coffee and is an essential part of most breakfasts. Having toasted biscuits, bread or puffed rice with tea is also very popular." --stype RAW --lang en --dstname en_phrase 

Package usage

def parser_vocab(source, stype, lang):

  sf = PlaintextReader(source, stype, lang)
  sens = sf.sentences

  analyzer = VocabAnalyzer(lang)
  exs = analyzer.overview_vocabs(sens)

  print(exs)

def parser_phrase(source, stype, lang):

  sf = PlaintextReader(source, stype, lang)
  sens = sf.sentences

  analyzer = PhraseAnalyzer(lang)
  exs = analyzer.overview_phrases(sens)

  print(exs)

Development

Clone project

git clone https://github.com/qishe-nlp/plaintext-analyzer.git

Install poetry

Install dependencies

poetry update

Test

poetry run pytest -rP

which run tests under tests/*

Execute

poetry run pta_vocab --help
poetry run pta_phrase --help

Create sphinx docs

poetry shell
cd apidocs
sphinx-apidoc -f -o source ../plaintext_analyzer
make html
python -m http.server -d build/html

Host docs on github pages

cp -rf apidocs/build/html/* docs/

Build

  • Change version in pyproject.toml and plaintext_analyzer/__init__.py
  • Build python package by poetry build

Git commit and push

Publish from local dev env

  • Set pypi test environment variables in poetry, refer to poetry doc
  • Publish to pypi test by poetry publish -r test

Publish through CI

git tag [x.x.x]
git push origin master

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

plaintext-analyzer-0.1.4.tar.gz (5.9 kB view details)

Uploaded Source

Built Distribution

plaintext_analyzer-0.1.4-py3-none-any.whl (7.1 kB view details)

Uploaded Python 3

File details

Details for the file plaintext-analyzer-0.1.4.tar.gz.

File metadata

  • Download URL: plaintext-analyzer-0.1.4.tar.gz
  • Upload date:
  • Size: 5.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.6 CPython/3.8.2 Linux/5.4.0-1047-azure

File hashes

Hashes for plaintext-analyzer-0.1.4.tar.gz
Algorithm Hash digest
SHA256 5bbb3a30f485f123c1175d49d6f7a40bef29d048bcf442d6774418e3f65ab044
MD5 f1fe08ee11f1fc7449761ce389b9bde3
BLAKE2b-256 40e882235550d48c35b54809d72ecaa9105c0c461fd8841f47a50d0436b9348b

See more details on using hashes here.

Provenance

File details

Details for the file plaintext_analyzer-0.1.4-py3-none-any.whl.

File metadata

File hashes

Hashes for plaintext_analyzer-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 886ae6fe7330ee16dcb01992a220cda2405d26491893cbfc3348c1f36bc28388
MD5 0139fe8f06e20a0252dd8a38bcf0da8b
BLAKE2b-256 517982b2f019aaf4e86c7e241ad0d433df6609ef8c2e9a98d22283b426f735c1

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page