No project description provided
Project description
PyCIFF
This package provides a python interface to PISA's Common Index File Format Import/Export toolset, which is written in Rust.
Usage
Converting CIFF to PISA:
pyciff.ciff_to_pisa(input_file, output, generate_lexicons)
input_file
is the input CIFF file.output
is the basename of the output PISA canonical files.generate_lexicons
is a Boolean flag; if True, the.termlex
and.doclex
files will be created.
Example (using the toy CIFF file stored in this repo):
$> cd tests
$> python -c "import pyciff; pyciff.ciff_to_pisa('toy-complete-20200309.ciff', 'my-pisa-files', False)"
----- CIFF HEADER -----
Version: 1
No. Postings Lists: 9
Total Postings Lists: 9
No. Documents: 3
Total Documents: 3
Total Terms in Collection 16
Average Document Length: 5.333333333333333
Description: Export of toy 3-document collection from Anserini's io.anserini.integration.TrecEndToEndTest test case
-----------------------
Processing postings
[00:00:00] [========================================] / (0s)
Processing document lengths
[00:00:00] [========================================] / (0s)
$> ls my-pisa-files.*
my-pisa-files.docs my-pisa-files.documents my-pisa-files.freqs my-pisa-files.sizes my-pisa-files.terms
Converting PISA to CIFF:
pyciff.pisa_to_ciff(collection_input, terms_input, titles_input, output, description)
collection_input
is the basename of the (canonical PISA) collection.terms_input
is a newline delimited file containing a single term per line (the first line is the 0-th postings list).titles_input
is a newline delimited file containing a single document identifier per line (the first line is the 0-th document identifier).output
is the name of the CIFF file to output.description
is stored inside the CIFF blob, and can be used to describe the collection/parsing/etc.
Example using the example files created previously:
# Still working in `tests` directory
$> python3 -c "import pyciff; pyciff.pisa_to_ciff('my-pisa-files', 'my-pisa-files.terms', 'my-pisa-files.documents', 'my-new.ciff', 'My example description')"
Collecting posting lists statistics
[00:00:00] [========================================] / (0s)
Computing average document length
[00:00:00] [========================================] / (0s)
Writing postings
[00:00:00] [========================================] / (0s)
$> ls my-new.ciff
my-new.ciff
Deployment
To upload to Pypi:
docker run --rm -v (pwd):/io konstin2/maturin publish -r https://test.pypi.org/legacy/ -u USER -p PASSWORD
To upload to Test Pypi:
docker run --rm -v (pwd):/io konstin2/maturin publish -u USER -p PASSWORD
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
File details
Details for the file pyciff-0.2.0.tar.gz
.
File metadata
- Download URL: pyciff-0.2.0.tar.gz
- Upload date:
- Size: 8.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/0.12.10-beta.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 49a8d9d34eeef0dbedcd64d23c0376c2046e636fb340da500eac6ab65f9a018d |
|
MD5 | 72745ba5267901c9f3af01db2342954a |
|
BLAKE2b-256 | d239f884e3b9ea50d764c552cf0e37977124daed0b36a2717def71754c2d5a02 |
File details
Details for the file pyciff-0.2.0-pp37-pypy37_pp73-manylinux_2_5_x86_64.manylinux1_x86_64.whl
.
File metadata
- Download URL: pyciff-0.2.0-pp37-pypy37_pp73-manylinux_2_5_x86_64.manylinux1_x86_64.whl
- Upload date:
- Size: 449.6 kB
- Tags: PyPy, manylinux: glibc 2.5+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/0.12.10-beta.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b7aecb54fd6fd5b00c16d553a0a4202aa45b12c98cfc40ced8f65643fb65c0f9 |
|
MD5 | b93b6c2f53c1deb5b2710ef2cff9644d |
|
BLAKE2b-256 | 0687c9bc9e0f8a04d3899af6375d371b1cc04cb788974b1c973019aeca2e7d35 |
File details
Details for the file pyciff-0.2.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.whl
.
File metadata
- Download URL: pyciff-0.2.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.whl
- Upload date:
- Size: 449.4 kB
- Tags: CPython 3.10, manylinux: glibc 2.5+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/0.12.10-beta.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d802dc5513aefb015a0781bd8ef91325f6ff1195527cd74654566383a0d111e6 |
|
MD5 | aa07a037641aa31f4f57e530b95033a3 |
|
BLAKE2b-256 | 2340b33908d2ca4bdc04da9ef3e5ea39881c17c7fe57d9da2f4dc9fe4010fb07 |
File details
Details for the file pyciff-0.2.0-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.whl
.
File metadata
- Download URL: pyciff-0.2.0-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.whl
- Upload date:
- Size: 449.4 kB
- Tags: CPython 3.9, manylinux: glibc 2.5+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/0.12.10-beta.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2b6f42c61dc37451d3308532c28dc0b6b3601954ebb39a52b9aac10c20dc5e25 |
|
MD5 | 72623dfdb62557ae8369b4f00b75d8a6 |
|
BLAKE2b-256 | 9a57767ecb6e033951125179b601feef3c6df6b25af42445c18a33b8aae768de |
File details
Details for the file pyciff-0.2.0-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl
.
File metadata
- Download URL: pyciff-0.2.0-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl
- Upload date:
- Size: 449.4 kB
- Tags: CPython 3.8, manylinux: glibc 2.5+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/0.12.10-beta.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 41e4aaa619d075961a50774d2ae185556461a12d115103670b326fc5b90cd674 |
|
MD5 | 597a40a0ef3bcf5064c29f8fbc832d99 |
|
BLAKE2b-256 | 853a5080798eebec88e66b92ddb814caef919565566df2446abb3c7354418a11 |
File details
Details for the file pyciff-0.2.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl
.
File metadata
- Download URL: pyciff-0.2.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl
- Upload date:
- Size: 449.4 kB
- Tags: CPython 3.7m, manylinux: glibc 2.5+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/0.12.10-beta.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 842e6dc3f6dd44ec8af306085aa4aabf3522c45f2d5b6b25914d79000442214d |
|
MD5 | 7b85db07cfac2b8ff99f0d8db9c19fc5 |
|
BLAKE2b-256 | f0079b52ac2828ccfbf71bcedaed840b9d66d8a1f4acc3e3202eb013a329bdb2 |
File details
Details for the file pyciff-0.2.0-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.whl
.
File metadata
- Download URL: pyciff-0.2.0-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.whl
- Upload date:
- Size: 449.5 kB
- Tags: CPython 3.6m, manylinux: glibc 2.5+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/0.12.10-beta.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a2b3c63eb00bd36d2e5543d1840438795708764cc09497d8a2e3d3dd1ef6cc8c |
|
MD5 | e71f646dde960f3092b2a61aa749a9ea |
|
BLAKE2b-256 | 63c13822983e45b8f2fe20605ba5ea4f85155b12216372380ab5b50d95bb94dd |