A production-grade Python framework for converting heterogeneous candidate information into canonical profiles.

These details have not been verified by PyPI

Project links

Project description

Candidate Transformer

Python License CLI REPL REST API

An enterprise-grade, production-ready Python framework for canonical candidate normalization. Candidate Transformer processes heterogeneous candidate profiles (resumes, ATS exports, recruiter spreadsheets) into a unified, highly structured canonical candidate dataset.

The framework provides an intelligent entity resolution engine, deterministic deduplication, configurable merge strategies, and an advanced projection engine to serve multiple downstream consumers from a single source of truth. It includes both a production-ready batch CLI (candidate-transformer) and a professional interactive REPL workspace (ctsh).

Core Features
Architecture
Project Structure
Interactive Shell (ctsh)
Command-Line Interface (CLI)
Getting Started
Input Files
Runtime Configuration
JSON Server Export
Outputs
Command Reference

Core Features

Data Ingestion: Natively parses CSV, ATS JSON, and unstructured Resume text from dozens of heterogeneous sources simultaneously.
Canonical Candidate Model: A strongly typed, immutable intermediate schema for all candidates.
Deterministic UUIDv5 IDs: Generates stable, highly reproducible candidate IDs across pipeline runs based on intrinsic identity properties.
Entity Resolution: Deterministically merges candidate profiles using strict contact and identity heuristics.
Field Normalization: Automatically normalizes fields including ISO-3166 country codes, E.164 phone numbers, and lowercased emails.
Merge Engine: Employs priority resolution for scalars, union merges for lists, deep dictionary merges, and deduplicates work experiences, education history, and projects.
Confidence Scoring: Rigorously evaluates profile completeness and cross-source corroboration to assign confidence scores (0.0 - 1.0).
Provenance Tracking: Every single merged field inherently tracks the contributing connector, merge strategy, assigned confidence, and a precise timestamp. Skill-level tracking ensures you know exactly where every skill was discovered.
Configurable Projections: Schema adaptability without code changes. Serve different representations to downstream consumers dynamically.
Plugin Architecture: A scalable connector registry makes adding custom connectors trivial.
Persistent Workspaces: Local SQLite/JSON workspaces seamlessly preserve session states across restarts.
JSON Server: Launch a detached, non-blocking background REST API directly from the REPL to serve your canonical dataset.
Human-readable UI: Rich terminal table visualizations for in-depth candidate inspection.

Architecture

flowchart LR
    A[Data Sources] --> B[Connector Registry]
    B --> C[Connectors]
    C --> D[Raw Candidate Records]
    D --> E[Extraction & Normalization]
    E --> F[Entity Resolution]
    F --> G[Merge Engine]
    G --> H[Canonical Candidate Dataset]
    H --> I[Provenance]
    H --> J[Projection Engine]
    J --> K[CLI Output]
    J --> L[ctsh Workspace]
    L --> M[Runtime Config Updates]
    M --> G
    L --> N[Load Additional Sources]
    N --> D

Pipeline Lifecycle

Load Sources
      ↓
Connector Parsing
      ↓
Normalization
      ↓
Entity Resolution
      ↓
Conflict Resolution
      ↓
Canonical Dataset
      ↓
Projection Engine
      ↓
CLI / ctsh / JSON Server

Supported Connectors

Connector	Input Format
recruiter_csv	CSV
ats_json	JSON
resume_text	Plain Text

Project Structure

src/
    candidate_transformer/
        api/                  # Public Facades
        cli/                  # Command definitions and shell REPL
        config/               # Pipeline configurations
        connectors/           # CSV, JSON, Text Connectors and Registry
        domain/               # Core Canonical Models
        exceptions/           # Custom error types
        export/               # JSON Server and export pipeline
        interfaces/           # Base classes and protocols
        normalizers/          # Field normalization (e.g., E.164 phones)
        pipeline/             # Resolution, Normalization, extraction stages
        projection/           # Configurable JSON Projection Engine
        strategies/           # Conflict Resolution Strategies
        utils/                # Utilities
        validation/           # Output validation
configs/                      # Configuration and projection JSONs
sample_data/                  # Example inputs
tests/                        # Test suites

Interactive Shell (ctsh)

ctsh (Candidate Transformer SHell) is the interactive REPL environment bundled with Candidate Transformer. It provides a persistent workspace for loading heterogeneous candidate sources, building canonical datasets, exploring candidates, switching projections, exporting data, managing runtime configuration, and serving the transformed dataset through JSON Server.

It provides an isolated runtime environment for developers and data engineers to experiment with candidate data. With ctsh, you can:

Utilize persistent workspaces to save, reopen, and continue previous transformation sessions seamlessly.
Inspect candidates using beautifully formatted Rich UI tables or verbose JSON output.
Dynamically runtime load additional sources and trigger a runtime rebuild without restarting.
Instantly switch projections for varied output shapes (e.g., swapping to analytics).
Manage pipeline configuration.
Perform one-command HTTP export via JSON Server.
Achieve a pure no restart workflow for enterprise data engineering.

Command-Line Interface (CLI)

The batch CLI (candidate-transformer) is intended for automation, scripting, CI/CD pipelines, scheduled jobs, and headless execution.

It exposes core commands tailored for non-interactive pipelines:

transform: Executes the data ingestion, canonicalization, and mapping workflow. It applies any specified projections or pipeline configuration over the provided sources.
validate: Asserts that a generated JSON profile complies perfectly with a configuration schema.

Getting Started

Installation from GitHub (Development & Source)

git clone https://github.com/suryanandanbabbar/candidate-transformer.git
cd candidate-transformer
python -m venv venv
source venv/bin/activate
pip install -e .

Quick Start

Input Files

Users can place their source datasets anywhere on their local filesystem and reference those absolute or relative paths directly in the CLI or ctsh commands. There is no rigid directory structure required for input data.

Example locations:

sample_data/
├── recruiter.csv
├── ats.json
└── resume.txt

Start the interactive shell:
```
ctsh
```

Create or open a workspace:

ctsh> workspace new recruitment-q3
ctsh> workspace open recruitment-q3

Load the three sources from sample_data:

ctsh> load recruiter_csv sample_data/recruiter.csv
ctsh> load ats_json sample_data/ats.json
ctsh> load resume_text sample_data/resume.txt

Build the canonical dataset:
```
ctsh> build
```
Show the first candidate:
```
ctsh> show 0
```
Switch to the analytics projection:
```
ctsh> project analytics
```

Batch CLI Example

Run a one-shot batch transformation from the command line:

candidate-transformer transform \
  --source recruiter_csv=sample_data/recruiter.csv \
  --source ats_json=sample_data/ats.json \
  --source resume_text=sample_data/resume.txt

--source <connector>=<filepath> defines the parser and file.

Applying a specific projection:

candidate-transformer transform \
  --source recruiter_csv=sample_data/recruiter.csv \
  --projection configs/projections/recruiter.json

--projection <path> overrides the output shape without changing pipeline logic.

Command Reference

Data Ingestion

Command	Description
`load <connector> <file>`	Load a data source through a connector.
`sources`	List loaded sources.
`connectors`	List registered connectors.

Pipeline & Projections

Command	Description
`build`	Build the canonical dataset.
`project <projection_name> [index] [--json]`	Run or preview a projection.
`projections`	List available projections.
`stats`	Show dataset statistics.
`status`	Show current workspace status.

Candidate Inspection

Command	Description
`show <name\|id\|index> [--verbose] [--json]`	Inspect a candidate.

Runtime Configuration

Command	Description
`config show`	Display runtime configuration.
`config begin`	Start a configuration edit session.
`config set <key> <value>`	Modify a configuration value.
`config apply`	Apply pending configuration changes.

Persistence & Export

Command	Description
`save canonical <file>`	Persist the canonical dataset.
`loadcanonical <file>`	Load a saved canonical dataset.
`export <projection_name> <file>`	Export a projection to disk.
`server start`	Launch the background JSON Server.
`server status`	Show JSON Server status.
`server stop`	Stop the background JSON Server.

Workspace Management

Command	Description
`workspace new <name>`	Create a new workspace.
`workspace open <name>`	Open an existing workspace.
`workspace list`	List available workspaces.
`workspace delete <name>`	Delete a workspace.
`reset`	Clear the current workspace state.

Utility

Command	Description
`history`	Show command history.
`clear`	Clear the terminal screen.
`help`	Display command help.
`exit`	Exit ctsh.

JSON Server Export

You can export the canonical dataset and launch a non-blocking, persistent REST API directly from the workspace.

ctsh> server start

This will automatically:

Export the current dataset and analytics payload.
Launch a fully detached background json-server instance.
Expose the data seamlessly as HTTP endpoints.

Manage the server lifecycle natively within ctsh:

server status: View the running PID, port, and endpoint paths.
server stop: Gracefully terminate the background process.

Troubleshooting: If json-server fails to start, ensure it is installed globally via npm install -g json-server.

Runtime Configuration

The interactive ctsh workspace supports true runtime pipeline evolution.

Without restarting the application, users can:

Modify merge priorities, projection selection, and connector behaviors.
Edit configurations (e.g., toggling include_confidence or include_provenance) interactively.
Rebuild the canonical dataset using the updated inputs or settings.
Switch projections instantly to reformat data on the fly.

This mirrors enterprise data engineering workflows where datasets evolve iteratively.

Example

ctsh> config begin
ctsh> config set include_provenance false
ctsh> config set include_confidence false
ctsh> config apply
ctsh> build

Outputs

Candidate Transformer generates a highly structured ecosystem of output formats optimized for different consumption patterns.

Canonical Dataset

The definitive source of truth generated by the pipeline. Includes:

Unified and deeply merged profiles
Deterministic UUIDv5 IDs for stable tracing
Intelligently normalized contacts (E.164 phones, standard emails)
Granular, field-level provenance tracking and conflict resolution audit trails

Built-in Projections

The canonical dataset can be dynamically shaped into distinct schemas:

Minimal: Essential identity fields (Name, Contact).
Recruiter: Tailored with extensive fields for recruitment evaluations.
ATS: Strict schema outputs compatible with Applicant Tracking Systems.
Analytics: Flattened records optimized for downstream data warehouses and BI tools.

Interactive Outputs

Within the ctsh REPL, output is designed for human insight:

Rich Tables: Beautifully formatted terminal UI for pipeline status and statistics.
JSON: Raw, machine-readable output dumps (--json).
Verbose Inspection: Detailed provenance table inspection with localized timestamps.
Statistics: Granular metrics on pipeline performance, dataset quality, and average confidence.

REST API Outputs

After launching server start, data is served securely over HTTP based on available projections. Every projection placed inside configs/projections/ is automatically published as an HTTP endpoint.

For example, if you have:

configs/projections/
analytics.json
minimal.json
crm.json
ats.json

It automatically becomes:

GET /analytics
GET /minimal
GET /crm
GET /ats

In addition, the following built-in endpoints are always available:

GET /canonical (or GET /candidates as an alias)
GET /metadata

Persisted Outputs

Exported projection JSON files
Saved standalone Canonical Dataset snapshot binaries
Persisted SQLite/JSON workspace states across sessions

License

This project is licensed under the MIT License.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.0.1

Jun 30, 2026

This version

1.0.0

Jun 30, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

profilefusion-1.0.0.tar.gz (57.4 kB view details)

Uploaded Jun 30, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

profilefusion-1.0.0-py3-none-any.whl (60.0 kB view details)

Uploaded Jun 30, 2026 Python 3

File details

Details for the file profilefusion-1.0.0.tar.gz.

File metadata

Download URL: profilefusion-1.0.0.tar.gz
Upload date: Jun 30, 2026
Size: 57.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.4

File hashes

Hashes for profilefusion-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`d5e1878cd43e453e92c6d25d27f3920a760dcb6aa826b45da00d36e152a2c7eb`
MD5	`e9f241fc10868c01085e308be862cfa3`
BLAKE2b-256	`4dd29daf2768e519f86729f49fb9a4bb710c51c55661cd941af0881eafca763e`

See more details on using hashes here.

File details

Details for the file profilefusion-1.0.0-py3-none-any.whl.

File metadata

Download URL: profilefusion-1.0.0-py3-none-any.whl
Upload date: Jun 30, 2026
Size: 60.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.4

File hashes

Hashes for profilefusion-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b04990a962b767a5269cb5ecdbf08ca95f96ded7f0d7a9233e9b9b60797cb663`
MD5	`5f2f506a5a2a1b1a76538635997d8c77`
BLAKE2b-256	`f51fc7f1c8fe1b042019942eba618af5b7919b32bde91553eec67515b74cfc51`

See more details on using hashes here.

profilefusion 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Candidate Transformer

Table of Contents

Core Features

Architecture

Pipeline Lifecycle

Supported Connectors

Project Structure

Interactive Shell (ctsh)

Command-Line Interface (CLI)

Getting Started

Installation from GitHub (Development & Source)

Quick Start

Input Files

Batch CLI Example

Command Reference

Data Ingestion

Pipeline & Projections

Candidate Inspection

Runtime Configuration

Persistence & Export

Workspace Management

Utility

JSON Server Export

Runtime Configuration

Example

Outputs

Canonical Dataset

Built-in Projections

Interactive Outputs

REST API Outputs

Persisted Outputs

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes