Clean, filter, label, and rate job description data with heuristics and local LLMs.

Project description

HonestRoles

Clean, filter, label, and rate job description data using heuristics and local LLMs.

HonestRoles is a Python package designed to transform raw job posting data into structured, scored, and searchable datasets. It provides a modular pipeline for normalization, high-performance filtering, and automated labeling using both traditional heuristics and local LLMs (Ollama).

Features

🧹 Clean: HTML stripping, location normalization (city/region/country), salary parsing, and record deduplication.
🔍 Filter: High-performance FilterChain with predicates for location, salary, skills, and keyword matching.
🏷️ Label: Automated seniority detection, role categorization, and tech stack extraction.
⭐️ Rate: Comprehensive job description scoring for completeness and quality.
🤖 LLM Integration: seamless integration with local Ollama models (e.g., Llama 3) for deep semantic analysis.

Installation

pip install honestroles

For development:

git clone https://github.com/hypertrial/honestroles.git
cd honestroles
pip install -e ".[dev]"

Quickstart

import honestroles as hr
from honestroles import schema

# Load raw job data (Parquet or DuckDB)
df = hr.read_parquet("jobs_current.parquet")

# 1. Clean and normalize data
df = hr.clean_jobs(df)

# 2. Apply complex filtering
chain = hr.FilterChain()
chain.add(hr.filter.by_location, regions=["California", "New York"])
chain.add(hr.filter.by_salary, min_salary=120_000, currency="USD")
chain.add(hr.filter.by_skills, required=["Python", "React"])
df = chain.apply(df)

# 3. Label roles (Heuristics + LLM)
df = hr.label_jobs(df, use_llm=True, model="llama3")

# 4. Rate job quality
df = hr.rate_jobs(df)

# Access data using schema constants
print(df[[schema.TITLE, schema.CITY, schema.COUNTRY]].head())

# Save structured results
hr.write_parquet(df, "jobs_scored.parquet")

Contract-First Flow

For source data, use contract normalization + validation before processing:

import honestroles as hr

df = hr.read_parquet("jobs_current.parquet", validate=False)
df = hr.normalize_source_data_contract(df)
df = hr.validate_source_data_contract(df)

df = hr.clean_jobs(df)
df = hr.filter_jobs(df, remote_only=True)
df = hr.label_jobs(df, use_llm=False)
df = hr.rate_jobs(df, use_llm=False)

See /docs/quickstart_contract.md and /docs/source_data_contract_v1.md.

Documentation index: /docs/index.md.

Core Modules

Schema Constants

Always use honestroles.schema for consistent column referencing:

from honestroles import schema

# Available constants:
# schema.TITLE, schema.DESCRIPTION_TEXT, schema.COMPANY
# schema.CITY, schema.REGION, schema.COUNTRY
# schema.SALARY_MIN, schema.SALARY_MAX, etc.

Filtering with `FilterChain`

The FilterChain allows you to compose multiple filtering rules efficiently:

from honestroles import FilterChain, filter_jobs

# Functional approach:
df = filter_jobs(df, remote_only=True, min_salary=100_000)

# Composable approach:
chain = FilterChain()
chain.add(hr.filter.by_keywords, include=["Engineer"], exclude=["Manager"])
chain.add(hr.filter.by_completeness, required_fields=[schema.DESCRIPTION_TEXT, schema.APPLY_URL])
filtered_df = chain.apply(df)

Local LLM Usage (Ollama)

Ensure Ollama is running locally:

ollama serve
ollama pull llama3

Then enable LLM-based labeling or quality rating:

df = hr.label_jobs(df, use_llm=True, model="llama3")
df = hr.rate_jobs(df, use_llm=True, model="llama3")

Package Layout

src/honestroles/
├── clean/        # HTML stripping, normalization, and dedup
├── filter/       # Composed FilterChain and predicates
├── io/           # Parquet and DuckDB I/O with validation
├── label/        # Seniority, Category, and Tech Stack labeling
├── llm/          # Ollama client and prompt templates
├── rate/         # Completeness, Quality, and Composite ratings
└── schema.py     # Centralized column name constants

Testing

Run the test suite with pytest:

pytest

Stability

Changelog: /CHANGELOG.md
Performance guardrails: /docs/performance.md
Docs index: /docs/index.md

Project details

Release history Release notifications | RSS feed

0.1.5

Mar 11, 2026

0.1.4

Mar 9, 2026

0.1.2

Mar 8, 2026

0.1.1

Feb 28, 2026

This version

0.1.0

Feb 15, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

honestroles-0.1.0.tar.gz (66.4 kB view details)

Uploaded Feb 15, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

honestroles-0.1.0-py3-none-any.whl (33.4 kB view details)

Uploaded Feb 15, 2026 Python 3

File details

Details for the file honestroles-0.1.0.tar.gz.

File metadata

Download URL: honestroles-0.1.0.tar.gz
Upload date: Feb 15, 2026
Size: 66.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.12

File hashes

Hashes for honestroles-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`f0c18137b2261eb67e88c223bc375dde536ff16845a7cbdcf787286d31631133`
MD5	`a708e61fc6c8b77ed6686397b09b3759`
BLAKE2b-256	`82d1f83d1d80c6e626f9ef4a945361f5fc37ea5c42346a1bde2119cc51ced393`

See more details on using hashes here.

File details

Details for the file honestroles-0.1.0-py3-none-any.whl.

File metadata

Download URL: honestroles-0.1.0-py3-none-any.whl
Upload date: Feb 15, 2026
Size: 33.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.12

File hashes

Hashes for honestroles-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2b30bf6518ffbdc369948be4384a64d9ca24e2f469d56ee54971a75251b40650`
MD5	`beb4a99b7b126e128cc569dbd37a5dab`
BLAKE2b-256	`a62f669952fe57b61a378f7afa29420b89ff7bbf624cee3bcdb87e51cf7297c1`

See more details on using hashes here.

honestroles 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

HonestRoles

Features

Installation

Quickstart

Contract-First Flow

Core Modules

Schema Constants

Filtering with `FilterChain`

Local LLM Usage (Ollama)

Package Layout

Testing

Stability

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

honestroles 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

HonestRoles

Features

Installation

Quickstart

Contract-First Flow

Core Modules

Schema Constants

Filtering with FilterChain

Local LLM Usage (Ollama)

Package Layout

Testing

Stability

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Filtering with `FilterChain`