Python wrapper for the CETEMPublico corpus
Project description
cetem-publico is a Python wrapper for the CETEMPublico corpus. It takes care of downloading, storing and importing the corpus into NLTK.
THIS IS STILL A WORK IN PROGRESS, API MIGHT BREAK WITHOUT WARNING.
Installing
Install and update using pip:
pip install [--user] cetem-publico
A Simple Example
import CETEMPublico
cp = CETEMPublico.load() # loads a small 10KB sample
# or
cp = CETEMPublico.load(full=True) # loads the full 12GB
print(cp.tagged_sents())
Acknowledgements
This module only exists thanks to the Publico newspaper and the team responsible for the CETEMPublico corpus.
Bugs and stuff
Open a GitHub issue or, preferably, send me a pull request.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
cetem-publico-0.0.16.tar.gz
(4.0 kB
view details)
Built Distribution
File details
Details for the file cetem-publico-0.0.16.tar.gz
.
File metadata
- Download URL: cetem-publico-0.0.16.tar.gz
- Upload date:
- Size: 4.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.38.0 CPython/3.8.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7d72b1a0344ce810630e7c4c519d3789aedff6a232591c4265da0c8d2c51d4b7 |
|
MD5 | 315a9a9d3851c71ec3d949cc59926a89 |
|
BLAKE2b-256 | b49422ffcc8179320b8d1c525d3b31b7eb12ea123ba843a85c5d30e9f17549c8 |
File details
Details for the file cetem_publico-0.0.16-py3-none-any.whl
.
File metadata
- Download URL: cetem_publico-0.0.16-py3-none-any.whl
- Upload date:
- Size: 4.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.38.0 CPython/3.8.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 11b151fedc2cf416920b657c9cb6016afbee03fb1aa0c9b7a0c695a1fcbad2d4 |
|
MD5 | ff0fa27e153e979e7ee20e71eff5ef42 |
|
BLAKE2b-256 | 54ff9fd84e02d4219b9d7142599f85a99366936f143baed51636944a295c86fb |