Skip to main content

LociSimiles is a Python package for finding intertextual links in Latin literature using pre-trained language models.

Project description

Loci Similes

LociSimiles is a Python package for finding intertextual links in Latin literature using pre-trained language models.

Basic Usage

# Load example query and source documents
query_doc = Document("../data/hieronymus_samples.csv")
source_doc = Document("../data/vergil_samples.csv")

# Load the pipeline with pre-trained models
pipeline = ClassificationPipelineWithCandidategeneration(
    classification_name="...",
    embedding_model_name="...",
    device="cpu",
)

# Run the pipeline with the query and source documents
results = pipeline.run(
    query=query_doc,    # Query document
    source=source_doc,  # Source document
    top_k=3             # Number of top similar candidates to classify
)

pretty_print(results)

# Save results to CSV or JSON
pipeline.to_csv("results.csv")
pipeline.to_json("results.json")

Command-Line Interface

LociSimiles provides a command-line tool for running the pipeline directly from the terminal:

Basic Usage

locisimiles query.csv source.csv -o results.csv

Advanced Usage

locisimiles query.csv source.csv -o results.csv \
  --classification-model julian-schelb/xlm-roberta-large-class-lat-intertext-v1 \
  --embedding-model julian-schelb/multilingual-e5-large-emb-lat-intertext-v1 \
  --top-k 20 \
  --threshold 0.7 \
  --device cuda \
  --verbose

Options

  • Input/Output:

    • query: Path to query document CSV file (columns: seg_id, text)
    • source: Path to source document CSV file (columns: seg_id, text)
    • -o, --output: Path to output CSV file for results (required)
  • Models:

    • --classification-model: HuggingFace model for classification (default: xlm-roberta-large-class-lat-intertext-v1)
    • --embedding-model: HuggingFace model for embeddings (default: multilingual-e5-large-emb-lat-intertext-v1)
  • Pipeline Parameters:

    • -k, --top-k: Number of top candidates to retrieve per query segment (default: 10)
    • -t, --threshold: Classification probability threshold for filtering results (default: 0.85)
  • Device:

    • --device: Choose auto, cuda, mps, or cpu (default: auto-detect)
  • Other:

    • -v, --verbose: Enable detailed progress output
    • -h, --help: Show help message

Output Format

The CLI saves results to a CSV file with the following columns:

  • query_id: Query segment identifier
  • query_text: Query text content
  • source_id: Source segment identifier
  • source_text: Source text content
  • similarity: Cosine similarity score (0-1)
  • probability: Classification confidence (0-1)
  • above_threshold: "Yes" if probability ≥ threshold, otherwise "No"

Optional Gradio GUI

Install the optional GUI extra to experiment with a minimal Gradio front end:

pip install locisimiles[gui]

Launch the interface from the command line:

locisimiles-gui

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

locisimiles-1.4.0.tar.gz (53.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

locisimiles-1.4.0-py3-none-any.whl (68.4 kB view details)

Uploaded Python 3

File details

Details for the file locisimiles-1.4.0.tar.gz.

File metadata

  • Download URL: locisimiles-1.4.0.tar.gz
  • Upload date:
  • Size: 53.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for locisimiles-1.4.0.tar.gz
Algorithm Hash digest
SHA256 547e00f38e459bab47316de4cd6391f415d2789a9001c2f096271e602711ab34
MD5 cc0f7136a5f3d6799f36951e36e56b0e
BLAKE2b-256 866ca70eeca6cb3f45a07fe972f2b8746a2dd63212bab86296f3703c14c835f7

See more details on using hashes here.

Provenance

The following attestation bundles were made for locisimiles-1.4.0.tar.gz:

Publisher: release.yml on julianschelb/locisimiles

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file locisimiles-1.4.0-py3-none-any.whl.

File metadata

  • Download URL: locisimiles-1.4.0-py3-none-any.whl
  • Upload date:
  • Size: 68.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for locisimiles-1.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 115095936cca1809c39256a8c99335dc732935437a55ba8cd51e2ad10a8feba5
MD5 11e76f2606c78270dc28a7cbf3d6284f
BLAKE2b-256 63d47a4800c65baf6444efb3af4be937ef96e78216ddc0b4ccdfa5f0a2fd72ee

See more details on using hashes here.

Provenance

The following attestation bundles were made for locisimiles-1.4.0-py3-none-any.whl:

Publisher: release.yml on julianschelb/locisimiles

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page