Iterator for prevert files
Project description
Prevert iterator
To use the prevert parser, copy the file prevert.py
in your directory.
Use
# import libraries
from prevert import dataset
import pandas as pd
If you are using the MaCoCu corpora in the XML format, the method dataset() needs only the path of the file as the argument:
# Open the dataset with the prevert parser
dset = dataset("/data/monolingual/mk.xml")
dset
consists of docs where you can access the metadata by doc.meta['attribute_name']
. Docs consist of paragraphs where you can access the metadata by par.meta['attribute_name']
.
Basic use:
for doc in dset: # iterating through documents of a dataset
print(doc.meta) # all attributes
print(eval(doc.meta['lang_distr'])[0][0]) # most prominent language in the document
print(str(doc)) # whole document text
for par in doc: # iterating through paragraphs of a document
print(par.meta['id']) # specific attribute
print(str(par)) # whole paragraph text
print(doc.to_prevert()) # obtaining the original format
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
prevert-1.0.2.tar.gz
(8.2 kB
view details)
Built Distribution
prevert-1.0.2-py3-none-any.whl
(10.6 kB
view details)
File details
Details for the file prevert-1.0.2.tar.gz
.
File metadata
- Download URL: prevert-1.0.2.tar.gz
- Upload date:
- Size: 8.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 293babf3d98ff6e1212870409cfb6c81aace64012f3da784d297adb835748965 |
|
MD5 | a4bc40312361f00fcd8c8f41f352cc80 |
|
BLAKE2b-256 | 5c8ddf2373be62b8a18870464559e69a7baeb5f8533e3729dd2616a745ea5404 |
File details
Details for the file prevert-1.0.2-py3-none-any.whl
.
File metadata
- Download URL: prevert-1.0.2-py3-none-any.whl
- Upload date:
- Size: 10.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ee6aa5862a51cf78514c7a7a7fe98916d049e32fd253f16c83afa3a6d616f00e |
|
MD5 | 3e929b3e54037f2036bc1dca367a6e66 |
|
BLAKE2b-256 | 06345cc3087e96b103cc7783963a448cf612dd4cd4eaace31e1f51e439452e22 |