A minimalist collection of text processing tools for Python 3

These details have not been verified by PyPI

Project links

Project description

ChirpText is a collection of text processing tools for Python 3.

It is not meant to be a powerful tank like the popular NTLK but a small package which you can pip-install anywhere and write a few lines of code to process textual data.

Main features

Simple file data manipulation using an enhanced open() function (txt, gz, binary, etc.)
CSV helper functions
Parse Japanese text with mecab library (Does not require mecab-python3 package even on Windows, only a binary release (i.e. mecab.exe) is required)
Built-in "lite" text annotation formats (texttaglib TTL/CSV and TTL/JSON)
Helper functions and useful data for processing English, Japanese, Chinese and Vietnamese.
Application configuration files management which can make educated guess about config files' whereabouts
Quick text-based report generation

Installation

chirptext is available on PyPI and can be installed using pip

pip install chirptext

Note: chirptext library does not support Python 2 anymore. Please update to Python 3 to use this package.

Sample codes

Using MeCab on Windows

You can download mecab binary package from http://taku910.github.io/mecab/#download and install it. After installed you can try:

>>> from chirptext import deko
>>> sent = deko.parse('猫が好きです。')
>>> sent.tokens
[[猫(名詞-一般/*/*|猫|ネコ|ネコ)], [が(助詞-格助詞/一般/*|が|ガ|ガ)], [好き(名詞-形容動詞語幹/*/*|好き|スキ|スキ)], [です(助動詞-*/*/*|です|デス|デス)], [。(記号-句点/*/*|。|。|。)], [EOS(-//|||)]]
>>> sent.words
['猫', 'が', '好き', 'です', '。']
>>> sent[0].pos
'名詞'
>>> sent[0].root
'猫'
>>> sent[0].reading
'ネコ'

If you installed MeCab to a custom location, for example C:\mecab\bin\mecab.exe, try

>>> deko.set_mecab_bin("C:\\mecab\\bin\\mecab.exe")
>>> deko.get_mecab_bin()
'C:\\mecab\\bin\\mecab.exe'

# Just that & now you can use mecab
>>> deko.parse('雨が降る。').words
['雨', 'が', '降る', '。']

Convenient IO APIs

>>> from chirptext import chio
>>> chio.write_tsv('data/test.tsv', [['a', 'b'], ['c', 'd']])
>>> chio.read_tsv('data/tes.tsv')
[['a', 'b'], ['c', 'd']]

>>> chio.write_file('data/content.tar.gz', 'Support writing to .tar.gz file')
>>> chio.read_file('data/content.tar.gz')
'Support writing to .tar.gz file'

>>> for row in chio.read_tsv_iter('data/test.tsv'):
...     print(row)
... 
['a', 'b']
['c', 'd']

Sample TextReport

# a string report
rp = TextReport()  # by default, TextReport will write to standard output, i.e. terminal
rp = TextReport(TextReport.STDOUT)  # same as above
rp = TextReport('~/tmp/my-report.txt')  # output to a file
rp = TextReport.null()  # ouptut to /dev/null, i.e. nowhere
rp = TextReport.string()  # output to a string. Call rp.content() to get the string
rp = TextReport(TextReport.STRINGIO)  # same as above

# TextReport will close the output stream automatically by using the with statement
with TextReport.string() as rp:
    rp.header("Lorem Ipsum Analysis", level="h0")
    rp.header("Raw", level="h1")
    rp.print(LOREM_IPSUM)
    rp.header("Top 5 most common letters")
    ct.summarise(report=rp, limit=5)
    print(rp.content())

Output

+---------------------------------------------------------------------------------- 
| Lorem Ipsum Analysis 
+---------------------------------------------------------------------------------- 
 
Raw 
------------------------------------------------------------ 
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. 
 
Top 5 most common letters
------------------------------------------------------------ 
i: 42 
e: 37 
t: 32 
o: 29 
a: 29

Useful links

Documentation: https://chirptext.readthedocs.io
Source code: https://github.com/letuananh/chirptext/
PyPI: https://pypi.org/project/chirptext/

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2a6.post2 pre-release

Oct 4, 2022

0.2a6.post1 pre-release

Mar 15, 2022

0.2a6 pre-release

Mar 15, 2022

0.2a5 pre-release

Mar 15, 2022

0.2a4.post2 pre-release

May 30, 2021

0.2a4.post1 pre-release

May 27, 2021

0.2a4 pre-release

May 20, 2021

0.2a3.post3 pre-release

May 20, 2021

0.2a3 pre-release

May 20, 2021

0.2a2 pre-release

May 20, 2021

0.2a1 pre-release

May 17, 2021

This version

0.1.2

May 20, 2021

0.1.1

May 17, 2021

0.1

May 13, 2021

0.1rc2 pre-release

May 9, 2021

0.1rc1.post1 pre-release

May 2, 2021

0.1a21 pre-release

Apr 23, 2021

0.1a20 pre-release

Apr 7, 2021

0.1a19 pre-release

Jun 1, 2020

0.1a18 pre-release

Jul 18, 2018

0.1a17 pre-release

May 30, 2018

0.1a16 pre-release

Apr 16, 2018

0.1a15 pre-release

Apr 11, 2018

0.1a14 pre-release

Apr 11, 2018

0.1a13 pre-release

Apr 4, 2018

0.1a12 pre-release

Apr 3, 2018

0.1a11 pre-release

Apr 2, 2018

0.1a10 pre-release

Mar 29, 2018

0.1a9 pre-release

Mar 28, 2018

0.1a8 pre-release

Feb 26, 2018

0.1a7 pre-release

Feb 22, 2018

0.1a6 pre-release

Feb 7, 2018

0.1a5 pre-release

Feb 7, 2018

0.1a4 pre-release

Feb 5, 2018

0.1a3 pre-release

Feb 5, 2018

0.1a2 pre-release

Jan 24, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chirptext-0.1.2.tar.gz (70.6 kB view details)

Uploaded May 20, 2021 Source

File details

Details for the file chirptext-0.1.2.tar.gz.

File metadata

Download URL: chirptext-0.1.2.tar.gz
Upload date: May 20, 2021
Size: 70.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.5

File hashes

Hashes for chirptext-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`643c9c11508f509b37257e3f3a629a668c0908ae5f582d7b9f131234a7379303`
MD5	`620afb738e18470d86ce2a2868ad9b75`
BLAKE2b-256	`661f7da1ed147b9d8687283dfea4f8413ab9067242591978db493fe3a8d4b96c`

See more details on using hashes here.

chirptext 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Main features

Installation

Sample codes

Using MeCab on Windows

Convenient IO APIs

Sample TextReport

Output

Useful links

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes