Skip to main content

Deterministic, config-driven job-data runtime with Polars and explicit plugin registries.

Project description

HonestRoles

HonestRoles is a deterministic, config-driven pipeline runtime for job data with Polars and explicit plugin manifests.

Start With the App

Use the HonestRoles app first: honestroles.com.

Choose Your Path

  • App users: start in the browser at honestroles.com
  • Developers and integrators: use the CLI/SDK sections below

Install (Developer)

$ python -m venv .venv
$ . .venv/bin/activate
$ python -m pip install --upgrade pip
$ pip install honestroles

5-Minute First Run (Developer)

From the repository root:

$ python examples/create_sample_dataset.py
$ honestroles run --pipeline-config examples/sample_pipeline.toml --plugins examples/sample_plugins.toml
$ ls -lh examples/jobs_scored.parquet

Expected CLI diagnostics include stage_rows, plugin_counts, and final_rows.

CLI

$ honestroles ingest sync --source greenhouse --source-ref stripe --quality-policy ingest_quality.toml --strict-quality --merge-policy updated_hash --retain-snapshots 30 --prune-inactive-days 90 --format table
$ honestroles ingest validate --source greenhouse --source-ref stripe --quality-policy ingest_quality.toml --strict-quality --format table
$ honestroles ingest sync-all --manifest ingest.toml --format table
$ honestroles recommend build-index --input-parquet dist/ingest/greenhouse/stripe/jobs.parquet --policy recommendation.toml --format table
$ honestroles recommend match --index-dir dist/recommend/index/<index_id> --candidate-json examples/candidate.json --top-k 25 --include-excluded --format table
$ honestroles recommend evaluate --index-dir dist/recommend/index/<index_id> --golden-set examples/recommend_golden_set.json --thresholds recommend_eval.toml --format table
$ honestroles recommend feedback add --profile-id jane_doe --job-id 12345 --event interviewed --format table
$ honestroles publish neondb migrate --database-url-env NEON_DATABASE_URL --schema honestroles_api --format table
$ honestroles publish neondb sync --database-url-env NEON_DATABASE_URL --schema honestroles_api --jobs-parquet dist/ingest/greenhouse/stripe/jobs.parquet --index-dir dist/recommend/index/<index_id> --sync-report dist/ingest/greenhouse/stripe/sync_report.json --require-quality-pass --format table
$ honestroles publish neondb verify --database-url-env NEON_DATABASE_URL --schema honestroles_api --format table
$ honestroles init --input-parquet data/jobs.parquet --pipeline-config pipeline.toml --plugins-manifest plugins.toml
$ honestroles doctor --pipeline-config pipeline.toml --plugins plugins.toml --format table
$ honestroles reliability check --pipeline-config pipeline.toml --plugins plugins.toml --strict --format table
$ honestroles run --pipeline-config pipeline.toml --plugins plugins.toml
$ honestroles plugins validate --manifest plugins.toml
$ honestroles config validate --pipeline pipeline.toml
$ honestroles report-quality --pipeline-config pipeline.toml
$ honestroles runs list --limit 10 --command ingest.sync --format table
$ honestroles scaffold-plugin --name my-plugin --output-dir .

Python API

from honestroles import (
    HonestRolesRuntime,
    build_retrieval_index,
    evaluate_relevance,
    migrate_neondb,
    match_jobs,
    publish_neondb_sync,
    record_feedback_event,
    sync_source,
    sync_sources_from_manifest,
    summarize_feedback,
    validate_ingestion_source,
    verify_neondb_contract,
)

ingest = sync_source(
    source="greenhouse",
    source_ref="stripe",
    quality_policy_file="ingest_quality.toml",
    strict_quality=False,
    merge_policy="updated_hash",
    retain_snapshots=30,
    prune_inactive_days=90,
)
print(ingest.rows_written, ingest.output_parquet)

validation = validate_ingestion_source(
    source="greenhouse",
    source_ref="stripe",
    quality_policy_file="ingest_quality.toml",
    strict_quality=True,
)
print(validation.report.status, validation.rows_evaluated)

batch = sync_sources_from_manifest(manifest_path="ingest.toml")
print(batch.status, batch.total_sources, batch.fail_count)

index = build_retrieval_index(
    input_parquet="dist/ingest/greenhouse/stripe/jobs.parquet",
    policy_file="recommendation.toml",
)
matches = match_jobs(
    index_dir=index.index_dir,
    candidate_json="examples/candidate.json",
    top_k=25,
    include_excluded=True,
)
print(matches.status, len(matches.results))

evaluation = evaluate_relevance(
    index_dir=index.index_dir,
    golden_set="examples/recommend_golden_set.json",
    thresholds_file="recommend_eval.toml",
)
print(evaluation.status, evaluation.metrics)

record_feedback_event(profile_id="jane_doe", job_id="12345", event="interviewed")
print(summarize_feedback(profile_id="jane_doe").weights)

print(migrate_neondb(database_url_env="NEON_DATABASE_URL").status)
publish_result = publish_neondb_sync(
    database_url_env="NEON_DATABASE_URL",
    jobs_parquet="dist/ingest/greenhouse/stripe/jobs.parquet",
    index_dir=index.index_dir,
    sync_report="dist/ingest/greenhouse/stripe/sync_report.json",
)
print(publish_result.batch_id, verify_neondb_contract(database_url_env="NEON_DATABASE_URL").status)

runtime = HonestRolesRuntime.from_configs(
    pipeline_config_path="pipeline.toml",
    plugin_manifest_path="plugins.toml",
)
result = runtime.run()

print(result.diagnostics)
print(result.dataset.to_polars().head())
print(result.application_plan[:3])

Documentation

Development

$ pip install -e ".[dev,docs]"
$ pytest -q
$ pytest tests/docs -q
$ bash scripts/check_docs_refs.sh
# Optional live connector smoke (requires refs):
# HONESTROLES_SMOKE_GREENHOUSE_REF, HONESTROLES_SMOKE_LEVER_REF,
# HONESTROLES_SMOKE_ASHBY_REF, HONESTROLES_SMOKE_WORKABLE_REF
$ bash scripts/run_ingest_smoke.sh
# Optional Neon DB smoke (requires NEON_DATABASE_URL):
$ PYTHON_BIN=.venv/bin/python DATABASE_URL_ENV=NEON_DATABASE_URL SCHEMA=honestroles_api bash scripts/run_neondb_smoke.sh

For local profiling data, keep large parquet inputs under data/ and write generated artifacts under dist/ (both are ignored by git).

Maintainer Notes

  • PyPI publishing is manual and token-based via bash scripts/publish_pypi.sh.
  • The script reads PYPI_API_KEY (or PYPI_API_TOKEN) from env/.env.
  • The GitHub Release workflow is manual (workflow_dispatch) only.
  • Before publish, run deterministic gate:
$ PYTHON_BIN=.venv/bin/python bash scripts/run_coverage.sh
  • Full maintainer runbook: docs/for-maintainers/release-and-pypi.md.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

honestroles-0.1.5.tar.gz (241.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

honestroles-0.1.5-py3-none-any.whl (150.3 kB view details)

Uploaded Python 3

File details

Details for the file honestroles-0.1.5.tar.gz.

File metadata

  • Download URL: honestroles-0.1.5.tar.gz
  • Upload date:
  • Size: 241.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.12

File hashes

Hashes for honestroles-0.1.5.tar.gz
Algorithm Hash digest
SHA256 527714e5b1539d2f4f6a61cadf02d1f4e93846ab7b86d1a567f1438e946d91a5
MD5 8e13ddf4cd70cfecbaa15dff192c6063
BLAKE2b-256 918b6274083cd98ae876f6a39d1bc1d5fd97aaab1138ba04bddce7b46b979e81

See more details on using hashes here.

File details

Details for the file honestroles-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: honestroles-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 150.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.12

File hashes

Hashes for honestroles-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 5660bf09d068b4db3fc73ba8250e88009df0905609874f3fcf27603b19ceb280
MD5 9a29ec6b9fc6b857a1df93b3e2f0ce74
BLAKE2b-256 ba322242da50546ea8a1bdb8c0b93aacd68311c46b3a1ab65fad611a9cb5d6f7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page