Pre-processing of NLP training corpora
Project description
Protogenie
How to cite
@software{thibault_clerice_2020_3883586,
author = {Thibault Clérice},
title = {Protogenie, post-processing for NLP dataset},
month = jun,
year = 2020,
publisher = {Zenodo},
doi = {10.5281/zenodo.3883585},
url = {https://doi.org/10.5281/zenodo.3883585}
}
Install from release
pip install protogenie
Install unstable
pip install --upgrade https://github.com/hipster-philology/protogenie/archive/master.zip
Install from source
Start by cloning the repository, and moving inside the created folder
git clone https://github.com/hipster-philology/protogenie.git
cd protogenie/
Create a virtual environment, source it and run
pip install -r requirements.txt
Configuration file
To configurate, you can have a look at the examples in ./tests/test_config but more generally you can and should use the schema: ./ppa_splitter/schema.rng
Workflow
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file protogenie-0.0.7.tar.gz.
File metadata
- Download URL: protogenie-0.0.7.tar.gz
- Upload date:
- Size: 21.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ebe84235365a35e0db00b456c10b054b7dd86f92f87a47ef4a0a8312a1dba747
|
|
| MD5 |
b78eeb60ec945f21d2a04fe837b8a36e
|
|
| BLAKE2b-256 |
c863c818d254373aef85e49399c2b018a4dcab7a2b15f308a9bfed5893ebef54
|
File details
Details for the file protogenie-0.0.7-py2.py3-none-any.whl.
File metadata
- Download URL: protogenie-0.0.7-py2.py3-none-any.whl
- Upload date:
- Size: 23.5 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2c5f04437489214d9d7b65d07e0677f3a804a2ee3f45dd293e0c4281f9d6dff1
|
|
| MD5 |
2ace173b79d0ed56eba7b538c5f0b5d6
|
|
| BLAKE2b-256 |
1e0c14900861ced5e358d82b12195b7d5d6b6d0c955e140bf9edb8d0182fdfa4
|