A production-grade Python framework for converting heterogeneous candidate information into canonical profiles.
Project description
Candidate Transformer
A production-grade Python framework for converting heterogeneous candidate information into unified canonical profiles.
Installation
# Standard installation
pip install candidate-transformer
# Development installation
git clone https://github.com/example/candidate-transformer.git
cd candidate-transformer
pip install -e ".[dev]"
Quick Start
Python API
from candidate_transformer import CandidateTransformer, PipelineConfig
import json
# Initialize the facade (loads default.json automatically if no config provided)
transformer = CandidateTransformer()
# Load heterogeneous data sources
with open('sample_data/recruiter.csv', 'r') as f:
transformer.load('recruiter_csv', f)
with open('sample_data/ats.json', 'r') as f:
transformer.load('ats_json', f)
# Execute pipeline and export JSON
output = transformer.export()
print(json.dumps(output, indent=2))
CLI Execution
Transform a single source or multiple heterogeneous sources sequentially:
candidate-transformer transform \
--source recruiter_csv=sample_data/recruiter.csv \
--source ats_json=sample_data/ats.json \
--source resume_text=sample_data/resume.txt \
--config configs/default.json
The CLI supports orchestrating true multi-source merging. Each --source defines a connector=file_path pair. The system will load all provided sources, deduplicate identities, and merge them into canonical profiles.
If you prefer configuration-driven workflows over CLI arguments, you can define sources in your JSON config:
{
"sources": [
{ "connector": "recruiter_csv", "input": "sample_data/recruiter.csv" },
{ "connector": "ats_json", "input": "sample_data/ats.json" }
]
}
Note: If --source CLI arguments are provided, they will strictly override the sources array in your JSON configuration.
Validate projected output:
candidate-transformer validate \
--config configs/default.json \
--input output.json
Configuration
Configurations dictate how the internal CanonicalCandidate is reshaped (projected) into the final JSON output, as well as resolving conflict priorities.
See configs/default.json and configs/minimal.json for examples.
Core Features
Entity Resolution
The framework deterministically merges candidate profiles based on a strict priority cascade, avoiding fragile fuzzy-matching or ML-based heuristics. Records are merged if they share:
- Exact Phone Match
- Exact Email Match
- Exact Name Match (case-insensitive)
Confidence Scoring
Each resolved candidate receives a deterministic confidence score (0.0 to 1.0) based on:
- Completeness: Evaluates presence of Name, Contact, and Experience/Education.
- Source Agreement: Rewards candidates that appear consistently across multiple sources (e.g. found in ATS, Resume, and CSV).
Output Validation
The projected JSON outputs are validated strictly against the dynamic schema defined in your configuration:
- Strongly typed fields (
string,number,string[]) - Strict requirement enforcement
- Deep nesting validation
Use candidate-transformer validate to ensure downstream systems receive perfectly formatted data.
Testing & Quality
To run the exhaustive test suite and quality checks:
pytest
ruff check .
black --check .
mypy src
Plugin Development
You can register new Connectors, Normalizers, or Strategies dynamically using the registries.
from candidate_transformer.connectors import connector_registry
from candidate_transformer.interfaces.connector import BaseConnector
@connector_registry("my_custom_source")
class CustomConnector(BaseConnector):
pass
Publishing to PyPI
This package is configured with a modern pyproject.toml and hatchling.
python -m build
twine upload dist/*
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file candidate_transformer-0.1.0-py3-none-any.whl.
File metadata
- Download URL: candidate_transformer-0.1.0-py3-none-any.whl
- Upload date:
- Size: 31.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a912dc3106043a06ce326b010bfdd469a04af5a21cbe60a0b19a0b513afd76ba
|
|
| MD5 |
ee4391ff9301cb51bc75dc6c1736b66d
|
|
| BLAKE2b-256 |
303c445f3f2383181b118109b52ccedb68469bc74c7baec696a5d07d2fd91838
|