No project description provided
Project description
Installation from pip3
pip3 install --verbose plaintext_analyzer
python -m spacy download en_core_web_trf
python -m spacy download es_dep_news_trf
Usage
Please refer to api docs.
Excutable usage
- Get vocabularies from plaintext file
pta_vocab --source en_plaintext.txt --stype FILE --lang en
- Get vocabularies from text
pta_vocab --source "The typical Bangladeshi breakfast consists of flour-based flatbreads such as chapati, roti or paratha, served with a curry. Usually the curry can be vegetable, home-fried potatoes, or scrambled eggs. The breakfast varies according to location and the eater's income. In villages and rural areas, rice with curry (potato mash, dal ) is mostly preferred by day laborers. In the city, sliced bread with jam or jelly is chosen due to time efficiency. In Bangladesh tea is preferred to coffee and is an essential part of most breakfasts. Having toasted biscuits, bread or puffed rice with tea is also very popular." --stype RAW --lang en
- Get vocabularies from plaintext file, and write to csv files
pta_vocab --source en_plaintext.txt --stype FILE --lang en --dstname en_vocab
- Get vocabularies from text, and write to csv file
pta_vocab --source "The typical Bangladeshi breakfast consists of flour-based flatbreads such as chapati, roti or paratha, served with a curry. Usually the curry can be vegetable, home-fried potatoes, or scrambled eggs. The breakfast varies according to location and the eater's income. In villages and rural areas, rice with curry (potato mash, dal ) is mostly preferred by day laborers. In the city, sliced bread with jam or jelly is chosen due to time efficiency. In Bangladesh tea is preferred to coffee and is an essential part of most breakfasts. Having toasted biscuits, bread or puffed rice with tea is also very popular." --stype RAW --lang en --dstname en_vocab
- Get phrases from plaintext file
pta_phrase --source en_plaintext.txt --stype FILE --lang en
- Get phrases from text
pta_phrase --source "The typical Bangladeshi breakfast consists of flour-based flatbreads such as chapati, roti or paratha, served with a curry. Usually the curry can be vegetable, home-fried potatoes, or scrambled eggs. The breakfast varies according to location and the eater's income. In villages and rural areas, rice with curry (potato mash, dal ) is mostly preferred by day laborers. In the city, sliced bread with jam or jelly is chosen due to time efficiency. In Bangladesh tea is preferred to coffee and is an essential part of most breakfasts. Having toasted biscuits, bread or puffed rice with tea is also very popular." --stype RAW --lang en
- Get phrases from plaintext file, and write to csv files
pta_phrase --source en_plaintext.txt --stype FILE --lang en --dstname en_phrase
- Get phrases from text, and write to csv file
pta_phrase --source "The typical Bangladeshi breakfast consists of flour-based flatbreads such as chapati, roti or paratha, served with a curry. Usually the curry can be vegetable, home-fried potatoes, or scrambled eggs. The breakfast varies according to location and the eater's income. In villages and rural areas, rice with curry (potato mash, dal ) is mostly preferred by day laborers. In the city, sliced bread with jam or jelly is chosen due to time efficiency. In Bangladesh tea is preferred to coffee and is an essential part of most breakfasts. Having toasted biscuits, bread or puffed rice with tea is also very popular." --stype RAW --lang en --dstname en_phrase
Package usage
def parser_vocab(source, stype, lang):
sf = PlaintextReader(source, stype, lang)
sens = sf.sentences
analyzer = VocabAnalyzer(lang)
exs = analyzer.overview_vocabs(sens)
print(exs)
def parser_phrase(source, stype, lang):
sf = PlaintextReader(source, stype, lang)
sens = sf.sentences
analyzer = PhraseAnalyzer(lang)
exs = analyzer.overview_phrases(sens)
print(exs)
Development
Clone project
git clone https://github.com/qishe-nlp/plaintext-analyzer.git
Install poetry
Install dependencies
poetry update
Test
poetry run pytest -rP
which run tests under tests/*
Execute
poetry run pta_vocab --help
poetry run pta_phrase --help
Create sphinx docs
poetry shell
cd apidocs
sphinx-apidoc -f -o source ../plaintext_analyzer
make html
python -m http.server -d build/html
Host docs on github pages
cp -rf apidocs/build/html/* docs/
Build
- Change
version
inpyproject.toml
andplaintext_analyzer/__init__.py
- Build python package by
poetry build
Git commit and push
Publish from local dev env
- Set pypi test environment variables in poetry, refer to poetry doc
- Publish to pypi test by
poetry publish -r test
Publish through CI
- Github action build and publish package to test pypi repo
git tag [x.x.x]
git push origin master
- Manually publish to pypi repo through github action
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for plaintext_analyzer-0.1.13.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 12a13a9490ab206e0f5328adb9bb32c5b1e98e9c8b22c3121398bb9d3e201cfd |
|
MD5 | 45f6a844b7401088267cc43aa17129a8 |
|
BLAKE2b-256 | 967fb24abc74f452c1b92ce59bcfe2d546b357a5b9a5acac09f57dffb3c1170e |
Close
Hashes for plaintext_analyzer-0.1.13-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 69c693ebc58dce423b0b5b0a5d93373442e9fd38bd812e56550dad76422c9b69 |
|
MD5 | 15147bb0f06346f53d0759bdafc617d4 |
|
BLAKE2b-256 | 594e9cd181ace81c07c59b4d09defa18db556e41912e5ba04383beccf5a433eb |