Skip to main content

No project description provided

Project description

PyCIFF

This package provides a python interface to PISA's Common Index File Format Import/Export toolset, which is written in Rust.

Usage

Converting CIFF to PISA:

pyciff.ciff_to_pisa(input_file, output, generate_lexicons)
  • input_file is the input CIFF file.
  • output is the basename of the output PISA canonical files.
  • generate_lexicons is a Boolean flag; if True, the .termlex and .doclex files will be created.

Example (using the toy CIFF file stored in this repo):

$> cd tests

$> python -c "import pyciff; pyciff.ciff_to_pisa('toy-complete-20200309.ciff', 'my-pisa-files', False)"

----- CIFF HEADER -----
Version: 1
No. Postings Lists: 9
Total Postings Lists: 9
No. Documents: 3
Total Documents: 3
Total Terms in Collection 16
Average Document Length: 5.333333333333333
Description: Export of toy 3-document collection from Anserini's io.anserini.integration.TrecEndToEndTest test case
-----------------------
Processing postings
  [00:00:00] [========================================] / (0s)
Processing document lengths
  [00:00:00] [========================================] / (0s)

$> ls my-pisa-files.*
my-pisa-files.docs  my-pisa-files.documents  my-pisa-files.freqs  my-pisa-files.sizes  my-pisa-files.terms

Converting PISA to CIFF:

 pyciff.pisa_to_ciff(collection_input, terms_input, titles_input, output, description)
  • collection_input is the basename of the (canonical PISA) collection.
  • terms_input is a newline delimited file containing a single term per line (the first line is the 0-th postings list).
  • titles_input is a newline delimited file containing a single document identifier per line (the first line is the 0-th document identifier).
  • output is the name of the CIFF file to output.
  • description is stored inside the CIFF blob, and can be used to describe the collection/parsing/etc.

Example using the example files created previously:

# Still working in `tests` directory

$> python3 -c "import pyciff; pyciff.pisa_to_ciff('my-pisa-files', 'my-pisa-files.terms', 'my-pisa-files.documents', 'my-new.ciff', 'My example description')"

Collecting posting lists statistics
  [00:00:00] [========================================] / (0s)
Computing average document length
  [00:00:00] [========================================] / (0s)
Writing postings
  [00:00:00] [========================================] / (0s)

$> ls my-new.ciff
my-new.ciff

Deployment

To upload to Pypi:

docker run --rm -v (pwd):/io konstin2/maturin publish -r https://test.pypi.org/legacy/ -u USER -p PASSWORD

To upload to Test Pypi:

docker run --rm -v (pwd):/io konstin2/maturin publish -u USER -p PASSWORD

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyciff-0.2.0.tar.gz (8.3 kB view details)

Uploaded Source

Built Distributions

pyciff-0.2.0-pp37-pypy37_pp73-manylinux_2_5_x86_64.manylinux1_x86_64.whl (449.6 kB view details)

Uploaded PyPy manylinux: glibc 2.5+ x86-64

pyciff-0.2.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.whl (449.4 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.5+ x86-64

pyciff-0.2.0-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.whl (449.4 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.5+ x86-64

pyciff-0.2.0-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl (449.4 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.5+ x86-64

pyciff-0.2.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (449.4 kB view details)

Uploaded CPython 3.7m manylinux: glibc 2.5+ x86-64

pyciff-0.2.0-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (449.5 kB view details)

Uploaded CPython 3.6m manylinux: glibc 2.5+ x86-64

File details

Details for the file pyciff-0.2.0.tar.gz.

File metadata

  • Download URL: pyciff-0.2.0.tar.gz
  • Upload date:
  • Size: 8.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/0.12.10-beta.8

File hashes

Hashes for pyciff-0.2.0.tar.gz
Algorithm Hash digest
SHA256 49a8d9d34eeef0dbedcd64d23c0376c2046e636fb340da500eac6ab65f9a018d
MD5 72745ba5267901c9f3af01db2342954a
BLAKE2b-256 d239f884e3b9ea50d764c552cf0e37977124daed0b36a2717def71754c2d5a02

See more details on using hashes here.

File details

Details for the file pyciff-0.2.0-pp37-pypy37_pp73-manylinux_2_5_x86_64.manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for pyciff-0.2.0-pp37-pypy37_pp73-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 b7aecb54fd6fd5b00c16d553a0a4202aa45b12c98cfc40ced8f65643fb65c0f9
MD5 b93b6c2f53c1deb5b2710ef2cff9644d
BLAKE2b-256 0687c9bc9e0f8a04d3899af6375d371b1cc04cb788974b1c973019aeca2e7d35

See more details on using hashes here.

File details

Details for the file pyciff-0.2.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for pyciff-0.2.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 d802dc5513aefb015a0781bd8ef91325f6ff1195527cd74654566383a0d111e6
MD5 aa07a037641aa31f4f57e530b95033a3
BLAKE2b-256 2340b33908d2ca4bdc04da9ef3e5ea39881c17c7fe57d9da2f4dc9fe4010fb07

See more details on using hashes here.

File details

Details for the file pyciff-0.2.0-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for pyciff-0.2.0-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 2b6f42c61dc37451d3308532c28dc0b6b3601954ebb39a52b9aac10c20dc5e25
MD5 72623dfdb62557ae8369b4f00b75d8a6
BLAKE2b-256 9a57767ecb6e033951125179b601feef3c6df6b25af42445c18a33b8aae768de

See more details on using hashes here.

File details

Details for the file pyciff-0.2.0-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for pyciff-0.2.0-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 41e4aaa619d075961a50774d2ae185556461a12d115103670b326fc5b90cd674
MD5 597a40a0ef3bcf5064c29f8fbc832d99
BLAKE2b-256 853a5080798eebec88e66b92ddb814caef919565566df2446abb3c7354418a11

See more details on using hashes here.

File details

Details for the file pyciff-0.2.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for pyciff-0.2.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 842e6dc3f6dd44ec8af306085aa4aabf3522c45f2d5b6b25914d79000442214d
MD5 7b85db07cfac2b8ff99f0d8db9c19fc5
BLAKE2b-256 f0079b52ac2828ccfbf71bcedaed840b9d66d8a1f4acc3e3202eb013a329bdb2

See more details on using hashes here.

File details

Details for the file pyciff-0.2.0-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for pyciff-0.2.0-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 a2b3c63eb00bd36d2e5543d1840438795708764cc09497d8a2e3d3dd1ef6cc8c
MD5 e71f646dde960f3092b2a61aa749a9ea
BLAKE2b-256 63c13822983e45b8f2fe20605ba5ea4f85155b12216372380ab5b50d95bb94dd

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page