Skip to main content

Python-first active learning engine backed by libhicalengine

Project description

hical

CI Package PyPI

hical helps you review large document collections more efficiently. Instead of working through a corpus uniformly, you give it a few relevant examples and review feedback, and it keeps reranking the remaining documents so the next batch is more likely to matter.

Its primary use is helping reviewers find more relevant documents faster. It is also useful for building labeled datasets and evaluation or test collections without judging the entire corpus by hand.

In practice, hical gives you a Python workflow to build a corpus dataset from raw documents or ir_datasets, open it, and run interactive review sessions.

The source lives in the CALEngine repository. The intended user-facing surface is the hical Python package.

Installation

Install the package from PyPI:

python -m pip install hical

If you want ir_datasets support:

python -m pip install "hical[datasets]"

If you are building from a source checkout, see docs/developer/development.md.

Quickstart

Create a tiny JSONL corpus:

cat > docs.jsonl <<'EOF'
{"id": "doc-1", "title": "Florida citrus", "body": "Oranges and groves across central Florida."}
{"id": "doc-2", "title": "Coastal cleanup", "body": "Shoreline cleanup and beach restoration projects."}
EOF

Create a minimal config:

cat > corpus.yaml <<'EOF'
input:
  format: jsonl
  path: ./docs.jsonl
  doc_id_field: id
  text_fields:
    - title
    - body
output:
  path: ./docs.bin
  min_df: 1
  optimize_for_fast_load: true
EOF

Build the corpus:

hical-build-corpus --config corpus.yaml

Open the dataset and start reviewing:

import hical

dataset = hical.open_dataset("docs.bin")
session = dataset.start_session(relevant_seeds=["florida oranges"], batch_size=2)

batch = session.next_batch(2)
for item in batch:
    print(item.doc_id, item.score)
    if item.original_text:
        print(item.original_text)

session.judge_relevant(batch[0])

The normal flow is:

  1. build a .bin
  2. open it as a Dataset
  3. start a Session
  4. fetch documents and record judgments

For the purpose and workflow at a higher level, see docs/overview.md.

Common Tasks

Build your own corpus

Use hical-build-corpus with JSONL, CSV, TSV, archive, or ir_datasets input. For working configs and sample inputs, see:

Use with ir_datasets

Build directly from a dataset id:

hical-build-ir-dataset --dataset-id cranfield --output ./cranfield.bin

Inspect fields first when the document type has multiple useful fields:

hical-build-ir-dataset --dataset-id beir/msmarco --list-fields

Then choose specific fields to combine:

hical-build-ir-dataset \
  --dataset-id beir/msmarco \
  --text-field title \
  --text-field text \
  --output ./msmarco.bin

For more, see docs/ir-datasets.md.

Use the Python API

The main public entry points are:

  • hical.build_corpus
  • hical.build_ir_dataset_corpus
  • hical.inspect_ir_dataset
  • hical.open_dataset
  • dataset.start_session
  • session.next_batch

For the fuller dataset/session workflow, see docs/python-api.md.

Documentation

Supported Platforms

Published wheels are smoke-tested on:

  • Linux x86_64
  • macOS x86_64
  • macOS arm64

Contributing

If you want to work on the repository internals rather than just use the Python package, start here:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

hical-0.2.1-cp313-cp313-manylinux_2_28_x86_64.whl (13.1 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ x86-64

hical-0.2.1-cp313-cp313-macosx_14_0_x86_64.whl (15.2 MB view details)

Uploaded CPython 3.13macOS 14.0+ x86-64

hical-0.2.1-cp313-cp313-macosx_14_0_arm64.whl (15.1 MB view details)

Uploaded CPython 3.13macOS 14.0+ ARM64

hical-0.2.1-cp312-cp312-manylinux_2_28_x86_64.whl (13.1 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

hical-0.2.1-cp312-cp312-macosx_14_0_x86_64.whl (15.2 MB view details)

Uploaded CPython 3.12macOS 14.0+ x86-64

hical-0.2.1-cp312-cp312-macosx_14_0_arm64.whl (15.1 MB view details)

Uploaded CPython 3.12macOS 14.0+ ARM64

hical-0.2.1-cp311-cp311-manylinux_2_28_x86_64.whl (13.1 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

hical-0.2.1-cp311-cp311-macosx_14_0_x86_64.whl (15.2 MB view details)

Uploaded CPython 3.11macOS 14.0+ x86-64

hical-0.2.1-cp311-cp311-macosx_14_0_arm64.whl (15.1 MB view details)

Uploaded CPython 3.11macOS 14.0+ ARM64

hical-0.2.1-cp310-cp310-manylinux_2_28_x86_64.whl (13.1 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.28+ x86-64

hical-0.2.1-cp310-cp310-macosx_14_0_x86_64.whl (15.2 MB view details)

Uploaded CPython 3.10macOS 14.0+ x86-64

hical-0.2.1-cp310-cp310-macosx_14_0_arm64.whl (15.1 MB view details)

Uploaded CPython 3.10macOS 14.0+ ARM64

File details

Details for the file hical-0.2.1-cp313-cp313-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for hical-0.2.1-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 56c0085e2fc75022f3dd798dee296e2c2a3e9a02675c46099d92b6905ac0beca
MD5 2b85938675f2df44b1a71ae0d539cd8e
BLAKE2b-256 d59569ced280b39fc450aef351a5ac2564d58221db5db2e02ae659c94dfb35d1

See more details on using hashes here.

Provenance

The following attestation bundles were made for hical-0.2.1-cp313-cp313-manylinux_2_28_x86_64.whl:

Publisher: package.yml on gathera/CALEngine

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hical-0.2.1-cp313-cp313-macosx_14_0_x86_64.whl.

File metadata

File hashes

Hashes for hical-0.2.1-cp313-cp313-macosx_14_0_x86_64.whl
Algorithm Hash digest
SHA256 4fe503895e1252ebd371ec4478a8572d7038b65ac727bdfb7dac1e065af907f6
MD5 a63f9f082338a95a1835d9c4a6f4b9d2
BLAKE2b-256 69cecb46a9735aa268324d0565274a99cdeabad7e70b9f48095007fe9f9cbe41

See more details on using hashes here.

Provenance

The following attestation bundles were made for hical-0.2.1-cp313-cp313-macosx_14_0_x86_64.whl:

Publisher: package.yml on gathera/CALEngine

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hical-0.2.1-cp313-cp313-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for hical-0.2.1-cp313-cp313-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 c14dbe0e39e3a9fbc30a3156a0ea550506298828d6398a6c6057fbc05aa9aa6b
MD5 3d5a7783d6e3555214d8edd78423b1f5
BLAKE2b-256 247a5f4e792c111c932a711efd27d4eff25604260c663edf5bb99fa56ce8b1ef

See more details on using hashes here.

Provenance

The following attestation bundles were made for hical-0.2.1-cp313-cp313-macosx_14_0_arm64.whl:

Publisher: package.yml on gathera/CALEngine

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hical-0.2.1-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for hical-0.2.1-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 e478a26ea1ce12785287e37c3441a48603764cd510692be829c8e6995ef19dd0
MD5 db76de4f4fdc0691cd3e8a48f10ff335
BLAKE2b-256 ead1ed9566e78f521316af3ef7c7340e6bce90eb8a2486d4b22215e432971926

See more details on using hashes here.

Provenance

The following attestation bundles were made for hical-0.2.1-cp312-cp312-manylinux_2_28_x86_64.whl:

Publisher: package.yml on gathera/CALEngine

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hical-0.2.1-cp312-cp312-macosx_14_0_x86_64.whl.

File metadata

File hashes

Hashes for hical-0.2.1-cp312-cp312-macosx_14_0_x86_64.whl
Algorithm Hash digest
SHA256 a660f76de56a922689b8951f2e632593791c97c5a2b9eb4356b693dec577951e
MD5 16ab84446f5fdcde9a36815f1e9f2dc6
BLAKE2b-256 21e0080cb9a8f8fd3e3f118882d043e0076929695e8c96f9cf5f5c071c21032d

See more details on using hashes here.

Provenance

The following attestation bundles were made for hical-0.2.1-cp312-cp312-macosx_14_0_x86_64.whl:

Publisher: package.yml on gathera/CALEngine

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hical-0.2.1-cp312-cp312-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for hical-0.2.1-cp312-cp312-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 0f79b305a9aaafa72da5a78ede32ffa32f78df0b3b047606b644acfc4f712472
MD5 d93fd3f3a61bfc84997293ee8bf65b74
BLAKE2b-256 80f91878fc7be478d1a6aea369b7dabbae2f0b5661b5c7cbbc44b06b7b07f715

See more details on using hashes here.

Provenance

The following attestation bundles were made for hical-0.2.1-cp312-cp312-macosx_14_0_arm64.whl:

Publisher: package.yml on gathera/CALEngine

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hical-0.2.1-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for hical-0.2.1-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 1fa2a407c52622a70c195d49d7c58936ae9ff8e963077bdc6e405f6080c7495c
MD5 794e4c57a7c4883ba79411928f72aa59
BLAKE2b-256 b6a9ce06fcdb434afdc28c5681d552aadfc565421e85408d78b6c240c3ad6c6e

See more details on using hashes here.

Provenance

The following attestation bundles were made for hical-0.2.1-cp311-cp311-manylinux_2_28_x86_64.whl:

Publisher: package.yml on gathera/CALEngine

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hical-0.2.1-cp311-cp311-macosx_14_0_x86_64.whl.

File metadata

File hashes

Hashes for hical-0.2.1-cp311-cp311-macosx_14_0_x86_64.whl
Algorithm Hash digest
SHA256 dfcdf9f9edbac8cea4565e4acd98306931847fe2dcd7f3329536faced5d55959
MD5 d947b539104a31196614b4597ac5565f
BLAKE2b-256 28a88fe6e2da2732d755e330e19102b48e6f98b21ab9cd58f027f8f1b02f97e6

See more details on using hashes here.

Provenance

The following attestation bundles were made for hical-0.2.1-cp311-cp311-macosx_14_0_x86_64.whl:

Publisher: package.yml on gathera/CALEngine

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hical-0.2.1-cp311-cp311-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for hical-0.2.1-cp311-cp311-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 d38297fe8fb618c18f396de0f42ac92fd98f9c212ce5b5de0bb7b090ce2a1d3b
MD5 5eed369e82dea6bd09da29f09e8975cf
BLAKE2b-256 350fff88ae0215558cf4bd91909988b7da6a66b968d62012d7dfc6643f3b0176

See more details on using hashes here.

Provenance

The following attestation bundles were made for hical-0.2.1-cp311-cp311-macosx_14_0_arm64.whl:

Publisher: package.yml on gathera/CALEngine

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hical-0.2.1-cp310-cp310-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for hical-0.2.1-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 460c555bcc07a4d3ad104becba208a7b5e0c738a449706cf75563cf15fa0688d
MD5 e30e1a9cde6c4dd4e012826730712434
BLAKE2b-256 818f2801296015141d6d96b77df986ec78a8328873bbcd503b42f3abdf7ade63

See more details on using hashes here.

Provenance

The following attestation bundles were made for hical-0.2.1-cp310-cp310-manylinux_2_28_x86_64.whl:

Publisher: package.yml on gathera/CALEngine

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hical-0.2.1-cp310-cp310-macosx_14_0_x86_64.whl.

File metadata

File hashes

Hashes for hical-0.2.1-cp310-cp310-macosx_14_0_x86_64.whl
Algorithm Hash digest
SHA256 c1d4bed16e09e209614b58d511bba64fb6bf76d840e5b69fa41d092bc4a3efa4
MD5 30863ada74563f6e762450ae52693f62
BLAKE2b-256 862dd1db614f01c69fd59590d84124220b0c966421b7dbc0b20b10499b37b3f7

See more details on using hashes here.

Provenance

The following attestation bundles were made for hical-0.2.1-cp310-cp310-macosx_14_0_x86_64.whl:

Publisher: package.yml on gathera/CALEngine

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hical-0.2.1-cp310-cp310-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for hical-0.2.1-cp310-cp310-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 54ff96f945e025ccfda1138e4b34fb82877704cae3e43acbb8bbd281ca0e1624
MD5 0c9f4e10e57330af30ffc77239329a06
BLAKE2b-256 e35e74913419c0eec959d701ebfceb8e2d839fa3f54d73252b3587d9ff6a3c04

See more details on using hashes here.

Provenance

The following attestation bundles were made for hical-0.2.1-cp310-cp310-macosx_14_0_arm64.whl:

Publisher: package.yml on gathera/CALEngine

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page