Skip to main content

A production-grade Python framework for converting heterogeneous candidate information into canonical profiles.

Project description

Candidate Transformer

A production-grade Python framework for converting heterogeneous candidate information into unified canonical profiles.

Installation

# Standard installation
pip install candidate-transformer

# Development installation
git clone https://github.com/example/candidate-transformer.git
cd candidate-transformer
pip install -e ".[dev]"

Quick Start

Python API

from candidate_transformer import CandidateTransformer, PipelineConfig
import json

# Initialize the facade (loads default.json automatically if no config provided)
transformer = CandidateTransformer()

# Load heterogeneous data sources
with open('sample_data/recruiter.csv', 'r') as f:
    transformer.load('recruiter_csv', f)

with open('sample_data/ats.json', 'r') as f:
    transformer.load('ats_json', f)

# Execute pipeline and export JSON
output = transformer.export()
print(json.dumps(output, indent=2))

CLI Execution

Transform a single source or multiple heterogeneous sources sequentially:

candidate-transformer transform \
    --source recruiter_csv=sample_data/recruiter.csv \
    --source ats_json=sample_data/ats.json \
    --source resume_text=sample_data/resume.txt \
    --config configs/default.json

The CLI supports orchestrating true multi-source merging. Each --source defines a connector=file_path pair. The system will load all provided sources, deduplicate identities, and merge them into canonical profiles.

If you prefer configuration-driven workflows over CLI arguments, you can define sources in your JSON config:

{
  "sources": [
    { "connector": "recruiter_csv", "input": "sample_data/recruiter.csv" },
    { "connector": "ats_json", "input": "sample_data/ats.json" }
  ]
}

Note: If --source CLI arguments are provided, they will strictly override the sources array in your JSON configuration.

Validate projected output:

candidate-transformer validate \
    --config configs/default.json \
    --input output.json

Configuration

Configurations dictate how the internal CanonicalCandidate is reshaped (projected) into the final JSON output, as well as resolving conflict priorities. See configs/default.json and configs/minimal.json for examples.

Core Features

Entity Resolution

The framework deterministically merges candidate profiles based on a strict priority cascade, avoiding fragile fuzzy-matching or ML-based heuristics. Records are merged if they share:

  1. Exact Phone Match
  2. Exact Email Match
  3. Exact Name Match (case-insensitive)

Confidence Scoring

Each resolved candidate receives a deterministic confidence score (0.0 to 1.0) based on:

  • Completeness: Evaluates presence of Name, Contact, and Experience/Education.
  • Source Agreement: Rewards candidates that appear consistently across multiple sources (e.g. found in ATS, Resume, and CSV).

Output Validation

The projected JSON outputs are validated strictly against the dynamic schema defined in your configuration:

  • Strongly typed fields (string, number, string[])
  • Strict requirement enforcement
  • Deep nesting validation

Use candidate-transformer validate to ensure downstream systems receive perfectly formatted data.

Testing & Quality

To run the exhaustive test suite and quality checks:

pytest
ruff check .
black --check .
mypy src

Plugin Development

You can register new Connectors, Normalizers, or Strategies dynamically using the registries.

from candidate_transformer.connectors import connector_registry
from candidate_transformer.interfaces.connector import BaseConnector

@connector_registry("my_custom_source")
class CustomConnector(BaseConnector):
    pass

Publishing to PyPI

This package is configured with a modern pyproject.toml and hatchling.

python -m build
twine upload dist/*

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

candidate_transformer-0.1.0-py3-none-any.whl (31.5 kB view details)

Uploaded Python 3

File details

Details for the file candidate_transformer-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for candidate_transformer-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a912dc3106043a06ce326b010bfdd469a04af5a21cbe60a0b19a0b513afd76ba
MD5 ee4391ff9301cb51bc75dc6c1736b66d
BLAKE2b-256 303c445f3f2383181b118109b52ccedb68469bc74c7baec696a5d07d2fd91838

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page