Skip to main content

No project description provided

Project description

Installation from pip3

pip3 install --verbose plaintext_analyzer 
python -m spacy download en_core_web_trf
python -m spacy download es_dep_news_trf

Usage

Please refer to api docs.

Excutable usage

  • Get vocabularies from plaintext file
pta_vocab --source en_plaintext.txt --stype FILE --lang en  
  • Get vocabularies from text
pta_vocab --source "The typical Bangladeshi breakfast consists of flour-based flatbreads such as chapati, roti or paratha, served with a curry. Usually the curry can be vegetable, home-fried potatoes, or scrambled eggs. The breakfast varies according to location and the eater's income. In villages and rural areas, rice with curry (potato mash, dal ) is mostly preferred by day laborers. In the city, sliced bread with jam or jelly is chosen due to time efficiency. In Bangladesh tea is preferred to coffee and is an essential part of most breakfasts. Having toasted biscuits, bread or puffed rice with tea is also very popular." --stype RAW --lang en  
  • Get vocabularies from plaintext file, and write to csv files
pta_vocab --source en_plaintext.txt --stype FILE --lang en --dstname en_vocab
  • Get vocabularies from text, and write to csv file
pta_vocab --source "The typical Bangladeshi breakfast consists of flour-based flatbreads such as chapati, roti or paratha, served with a curry. Usually the curry can be vegetable, home-fried potatoes, or scrambled eggs. The breakfast varies according to location and the eater's income. In villages and rural areas, rice with curry (potato mash, dal ) is mostly preferred by day laborers. In the city, sliced bread with jam or jelly is chosen due to time efficiency. In Bangladesh tea is preferred to coffee and is an essential part of most breakfasts. Having toasted biscuits, bread or puffed rice with tea is also very popular." --stype RAW --lang en --dstname en_vocab 
  • Get phrases from plaintext file
pta_phrase --source en_plaintext.txt --stype FILE --lang en  
  • Get phrases from text
pta_phrase --source "The typical Bangladeshi breakfast consists of flour-based flatbreads such as chapati, roti or paratha, served with a curry. Usually the curry can be vegetable, home-fried potatoes, or scrambled eggs. The breakfast varies according to location and the eater's income. In villages and rural areas, rice with curry (potato mash, dal ) is mostly preferred by day laborers. In the city, sliced bread with jam or jelly is chosen due to time efficiency. In Bangladesh tea is preferred to coffee and is an essential part of most breakfasts. Having toasted biscuits, bread or puffed rice with tea is also very popular." --stype RAW --lang en  
  • Get phrases from plaintext file, and write to csv files
pta_phrase --source en_plaintext.txt --stype FILE --lang en --dstname en_phrase
  • Get phrases from text, and write to csv file
pta_phrase --source "The typical Bangladeshi breakfast consists of flour-based flatbreads such as chapati, roti or paratha, served with a curry. Usually the curry can be vegetable, home-fried potatoes, or scrambled eggs. The breakfast varies according to location and the eater's income. In villages and rural areas, rice with curry (potato mash, dal ) is mostly preferred by day laborers. In the city, sliced bread with jam or jelly is chosen due to time efficiency. In Bangladesh tea is preferred to coffee and is an essential part of most breakfasts. Having toasted biscuits, bread or puffed rice with tea is also very popular." --stype RAW --lang en --dstname en_phrase 

Package usage

def parser_vocab(source, stype, lang):

  sf = PlaintextReader(source, stype, lang)
  sens = sf.sentences

  analyzer = VocabAnalyzer(lang)
  exs = analyzer.overview_vocabs(sens)

  print(exs)

def parser_phrase(source, stype, lang):

  sf = PlaintextReader(source, stype, lang)
  sens = sf.sentences

  analyzer = PhraseAnalyzer(lang)
  exs = analyzer.overview_phrases(sens)

  print(exs)

Development

Clone project

git clone https://github.com/qishe-nlp/plaintext-analyzer.git

Install poetry

Install dependencies

poetry update

Test

poetry run pytest -rP

which run tests under tests/*

Execute

poetry run pta_vocab --help
poetry run pta_phrase --help

Create sphinx docs

poetry shell
cd apidocs
sphinx-apidoc -f -o source ../plaintext_analyzer
make html
python -m http.server -d build/html

Host docs on github pages

cp -rf apidocs/build/html/* docs/

Build

  • Change version in pyproject.toml and plaintext_analyzer/__init__.py
  • Build python package by poetry build

Git commit and push

Publish from local dev env

  • Set pypi test environment variables in poetry, refer to poetry doc
  • Publish to pypi test by poetry publish -r test

Publish through CI

git tag [x.x.x]
git push origin master

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

plaintext_analyzer-0.1.13.tar.gz (6.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

plaintext_analyzer-0.1.13-py3-none-any.whl (8.3 kB view details)

Uploaded Python 3

File details

Details for the file plaintext_analyzer-0.1.13.tar.gz.

File metadata

  • Download URL: plaintext_analyzer-0.1.13.tar.gz
  • Upload date:
  • Size: 6.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.2 CPython/3.11.0 Linux/5.15.0-1037-azure

File hashes

Hashes for plaintext_analyzer-0.1.13.tar.gz
Algorithm Hash digest
SHA256 12a13a9490ab206e0f5328adb9bb32c5b1e98e9c8b22c3121398bb9d3e201cfd
MD5 45f6a844b7401088267cc43aa17129a8
BLAKE2b-256 967fb24abc74f452c1b92ce59bcfe2d546b357a5b9a5acac09f57dffb3c1170e

See more details on using hashes here.

File details

Details for the file plaintext_analyzer-0.1.13-py3-none-any.whl.

File metadata

  • Download URL: plaintext_analyzer-0.1.13-py3-none-any.whl
  • Upload date:
  • Size: 8.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.2 CPython/3.11.0 Linux/5.15.0-1037-azure

File hashes

Hashes for plaintext_analyzer-0.1.13-py3-none-any.whl
Algorithm Hash digest
SHA256 69c693ebc58dce423b0b5b0a5d93373442e9fd38bd812e56550dad76422c9b69
MD5 15147bb0f06346f53d0759bdafc617d4
BLAKE2b-256 594e9cd181ace81c07c59b4d09defa18db556e41912e5ba04383beccf5a433eb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page