No project description provided
Project description
PyCIFF
This package provides a python interface to PISA's Common Index File Format Import/Export toolset, which is written in Rust.
Usage
Converting CIFF to PISA:
pyciff.ciff_to_pisa(input_file, output, generate_lexicons)
input_fileis the input CIFF file.outputis the basename of the output PISA canonical files.generate_lexiconsis a Boolean flag; if True, the.termlexand.doclexfiles will be created.
Example (using the toy CIFF file stored in this repo):
$> cd tests
$> python -c "import pyciff; pyciff.ciff_to_pisa('toy-complete-20200309.ciff', 'my-pisa-files', False)"
----- CIFF HEADER -----
Version: 1
No. Postings Lists: 9
Total Postings Lists: 9
No. Documents: 3
Total Documents: 3
Total Terms in Collection 16
Average Document Length: 5.333333333333333
Description: Export of toy 3-document collection from Anserini's io.anserini.integration.TrecEndToEndTest test case
-----------------------
Processing postings
[00:00:00] [========================================] / (0s)
Processing document lengths
[00:00:00] [========================================] / (0s)
$> ls my-pisa-files.*
my-pisa-files.docs my-pisa-files.documents my-pisa-files.freqs my-pisa-files.sizes my-pisa-files.terms
Converting PISA to CIFF:
pyciff.pisa_to_ciff(collection_input, terms_input, titles_input, output, description)
collection_inputis the basename of the (canonical PISA) collection.terms_inputis a newline delimited file containing a single term per line (the first line is the 0-th postings list).titles_inputis a newline delimited file containing a single document identifier per line (the first line is the 0-th document identifier).outputis the name of the CIFF file to output.descriptionis stored inside the CIFF blob, and can be used to describe the collection/parsing/etc.
Example using the example files created previously:
# Still working in `tests` directory
$> python3 -c "import pyciff; pyciff.pisa_to_ciff('my-pisa-files', 'my-pisa-files.terms', 'my-pisa-files.documents', 'my-new.ciff', 'My example description')"
Collecting posting lists statistics
[00:00:00] [========================================] / (0s)
Computing average document length
[00:00:00] [========================================] / (0s)
Writing postings
[00:00:00] [========================================] / (0s)
$> ls my-new.ciff
my-new.ciff
Deployment
To upload to Pypi:
docker run --rm -v (pwd):/io konstin2/maturin publish -r https://test.pypi.org/legacy/ -u USER -p PASSWORD
To upload to Test Pypi:
docker run --rm -v (pwd):/io konstin2/maturin publish -u USER -p PASSWORD
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pyciff-0.2.0.tar.gz.
File metadata
- Download URL: pyciff-0.2.0.tar.gz
- Upload date:
- Size: 8.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/0.12.10-beta.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
49a8d9d34eeef0dbedcd64d23c0376c2046e636fb340da500eac6ab65f9a018d
|
|
| MD5 |
72745ba5267901c9f3af01db2342954a
|
|
| BLAKE2b-256 |
d239f884e3b9ea50d764c552cf0e37977124daed0b36a2717def71754c2d5a02
|
File details
Details for the file pyciff-0.2.0-pp37-pypy37_pp73-manylinux_2_5_x86_64.manylinux1_x86_64.whl.
File metadata
- Download URL: pyciff-0.2.0-pp37-pypy37_pp73-manylinux_2_5_x86_64.manylinux1_x86_64.whl
- Upload date:
- Size: 449.6 kB
- Tags: PyPy, manylinux: glibc 2.5+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/0.12.10-beta.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b7aecb54fd6fd5b00c16d553a0a4202aa45b12c98cfc40ced8f65643fb65c0f9
|
|
| MD5 |
b93b6c2f53c1deb5b2710ef2cff9644d
|
|
| BLAKE2b-256 |
0687c9bc9e0f8a04d3899af6375d371b1cc04cb788974b1c973019aeca2e7d35
|
File details
Details for the file pyciff-0.2.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.whl.
File metadata
- Download URL: pyciff-0.2.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.whl
- Upload date:
- Size: 449.4 kB
- Tags: CPython 3.10, manylinux: glibc 2.5+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/0.12.10-beta.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d802dc5513aefb015a0781bd8ef91325f6ff1195527cd74654566383a0d111e6
|
|
| MD5 |
aa07a037641aa31f4f57e530b95033a3
|
|
| BLAKE2b-256 |
2340b33908d2ca4bdc04da9ef3e5ea39881c17c7fe57d9da2f4dc9fe4010fb07
|
File details
Details for the file pyciff-0.2.0-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.whl.
File metadata
- Download URL: pyciff-0.2.0-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.whl
- Upload date:
- Size: 449.4 kB
- Tags: CPython 3.9, manylinux: glibc 2.5+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/0.12.10-beta.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2b6f42c61dc37451d3308532c28dc0b6b3601954ebb39a52b9aac10c20dc5e25
|
|
| MD5 |
72623dfdb62557ae8369b4f00b75d8a6
|
|
| BLAKE2b-256 |
9a57767ecb6e033951125179b601feef3c6df6b25af42445c18a33b8aae768de
|
File details
Details for the file pyciff-0.2.0-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl.
File metadata
- Download URL: pyciff-0.2.0-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl
- Upload date:
- Size: 449.4 kB
- Tags: CPython 3.8, manylinux: glibc 2.5+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/0.12.10-beta.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
41e4aaa619d075961a50774d2ae185556461a12d115103670b326fc5b90cd674
|
|
| MD5 |
597a40a0ef3bcf5064c29f8fbc832d99
|
|
| BLAKE2b-256 |
853a5080798eebec88e66b92ddb814caef919565566df2446abb3c7354418a11
|
File details
Details for the file pyciff-0.2.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl.
File metadata
- Download URL: pyciff-0.2.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl
- Upload date:
- Size: 449.4 kB
- Tags: CPython 3.7m, manylinux: glibc 2.5+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/0.12.10-beta.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
842e6dc3f6dd44ec8af306085aa4aabf3522c45f2d5b6b25914d79000442214d
|
|
| MD5 |
7b85db07cfac2b8ff99f0d8db9c19fc5
|
|
| BLAKE2b-256 |
f0079b52ac2828ccfbf71bcedaed840b9d66d8a1f4acc3e3202eb013a329bdb2
|
File details
Details for the file pyciff-0.2.0-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.whl.
File metadata
- Download URL: pyciff-0.2.0-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.whl
- Upload date:
- Size: 449.5 kB
- Tags: CPython 3.6m, manylinux: glibc 2.5+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/0.12.10-beta.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a2b3c63eb00bd36d2e5543d1840438795708764cc09497d8a2e3d3dd1ef6cc8c
|
|
| MD5 |
e71f646dde960f3092b2a61aa749a9ea
|
|
| BLAKE2b-256 |
63c13822983e45b8f2fe20605ba5ea4f85155b12216372380ab5b50d95bb94dd
|