Skip to main content

Smart Recruitment Utility Library — job matching, resume parsing, validation & response formatting

Project description

☁️ CloudHire AI Utils

Smart Recruitment Utility Library for the CloudHire Platform

A Python library providing intelligent recruitment-related functionalities such as job matching (weighted, TF-IDF, similarity scoring), resume skill extraction, skill normalisation, resume-to-job matching, validation, and response formatting to enhance the efficiency and scalability of the CloudHire platform.


🚀 Why This Library is Important

♻️ Reusability

This library is completely independent of the main CloudHire application. It can be installed and used in any Python project that deals with recruitment, job portals, HR systems, or career platforms. Write once, use everywhere.

📈 Scalability

Each module follows the Single Responsibility Principle — it does one thing well. As the platform grows, you can extend individual modules (e.g., add new scoring algorithms to the recommender) without touching unrelated code.

🏗️ Clean Architecture

The library demonstrates separation of concerns — business logic (matching, parsing, validation) lives in the library, while the web application layer (routes, controllers) stays in the main app. This makes the codebase easier to test, debug, and maintain.

🧠 Intelligence

Rather than simple CRUD operations, this library adds AI-inspired logic such as TF-IDF scoring, Jaccard similarity, and weighted recommendations — demonstrating advanced thinking beyond basic development.

📦 Standards-Compliant

Published as a proper pip-installable package following Python packaging standards (setup.py, pyproject.toml, semantic versioning). Production-ready with built-in logging across all modules.


📦 Installation

From PyPI (after publishing)

pip install cloudhire-ai-utils

From source (local development)

git clone https://github.com/venkatsai/cloudhire-ai-utils.git
cd cloudhire-ai-utils
pip install -e .

🏗️ Library Structure

cloudhire-ai-utils/
├── cloudhire_ai_utils/
│   ├── __init__.py          # Package entry point (exports all classes)
│   ├── recommender.py       # 🔥 Job Recommendation Engine (weighted + TF-IDF + similarity)
│   ├── parser.py            # 📝 Resume Skill Extractor
│   ├── normalizer.py        # 🔄 Skill Normalizer (synonym mapping)
│   ├── matcher.py           # 🎯 Resume-to-Job Matcher (high-level pipeline)
│   ├── validator.py         # ✅ Job Validation Module
│   └── formatter.py         # 📦 API Response Formatter
├── setup.py
├── pyproject.toml
├── LICENSE
├── test_library.py
└── README.md

⚙️ Modules & Usage

1️⃣ Job Recommendation Engine (recommender.py) 🔥

Matches candidate skills against job listings using 3 scoring strategies:

  • Weighted scoring — custom skill weights for prioritisation
  • TF-IDF scoring — term frequency-inverse document frequency
  • Jaccard similarity — set-based similarity coefficient
from cloudhire_ai_utils import JobRecommender

recommender = JobRecommender(skill_weights={"python": 3, "aws": 2})

jobs = [
    {"id": 1, "title": "Backend Developer", "skills": ["python", "sql", "aws"]},
    {"id": 2, "title": "Frontend Developer", "skills": ["react", "css", "javascript"]},
    {"id": 3, "title": "Data Scientist", "skills": ["python", "machine learning", "pandas"]},
]

# Weighted scoring (default)
results = recommender.recommend_jobs(["python", "aws"], jobs)
for job, score in results:
    print(f"  {job['title']} — Score: {score}")

# Jaccard Similarity scoring
sim_results = recommender.recommend_by_similarity(["python", "aws"], jobs)
for job, sim in sim_results:
    print(f"  {job['title']} — Similarity: {sim}")

# TF-IDF scoring
tfidf_results = recommender.recommend_by_tfidf(["python", "aws"], jobs)
for job, score in tfidf_results:
    print(f"  {job['title']} — TF-IDF: {score}")

# Detailed recommendation (includes all scores)
detailed = recommender.recommend_jobs_detailed(["python", "aws"], jobs)
for r in detailed:
    print(f"  {r['job']['title']}")
    print(f"    Match: {r['match_pct']}% | Similarity: {r['similarity']} | TF-IDF: {r['tfidf_score']}")
    print(f"    Matched: {r['matched_skills']}")
    print(f"    Missing: {r['missing_skills']}")

2️⃣ Skill Normalizer (normalizer.py) 🔄

Converts skill synonyms and abbreviations to canonical forms. Ensures "js" and "javascript" are treated as the same skill.

from cloudhire_ai_utils import SkillNormalizer

normalizer = SkillNormalizer()

# Single skill normalisation
print(normalizer.normalize("js"))        # → "javascript"
print(normalizer.normalize("py"))        # → "python"
print(normalizer.normalize("k8s"))       # → "kubernetes"
print(normalizer.normalize("React.js"))  # → "react"

# Normalise a list
skills = normalizer.normalize_list(["js", "py", "k8s", "node", "ML"])
print(skills)  # ['javascript', 'kubernetes', 'machine learning', 'nodejs', 'python']

# Add custom synonyms at runtime
normalizer.add_synonym("RoR", "ruby on rails")

# Get all aliases for a canonical name
print(normalizer.get_synonyms_for("javascript"))
# ['js', 'javascript']

60+ built-in synonym mappings covering programming languages, frameworks, cloud platforms, databases, DevOps tools, and more.


3️⃣ Resume Skill Extractor (parser.py) 📝

Extracts technical and soft skills from raw resume text.

from cloudhire_ai_utils import ResumeParser

parser = ResumeParser()

resume_text = """
John Doe — Software Engineer
Experienced in Python, AWS, and React development.
Strong background in SQL databases and Docker containerisation.
Excellent leadership and communication skills.
"""

# Extract skills
skills = parser.extract_skills(resume_text)
print("Skills found:", skills)

# Get full summary
summary = parser.extract_summary(resume_text)
print(f"Total skills: {summary['skill_count']}")
print(f"Word count: {summary['word_count']}")

# Skills with mention count
counts = parser.extract_skills_with_count(resume_text)
print(counts)  # {'aws': 1, 'python': 1, 'react': 1, ...}

# Custom skills database
custom_parser = ResumeParser(custom_skills=["flutter", "dart"])

4️⃣ Resume-to-Job Matcher (matcher.py) 🎯

High-level pipeline that combines Parser + Normalizer + Recommender in one call.

from cloudhire_ai_utils import ResumeJobMatcher

matcher = ResumeJobMatcher(skill_weights={"python": 3, "aws": 2})

resume = "Experienced in py, JS, AWS, and k8s development..."

jobs = [
    {"id": 1, "title": "Backend Dev", "skills": ["python", "aws", "docker"]},
    {"id": 2, "title": "Frontend Dev", "skills": ["react", "javascript"]},
]

# One-call matching (extracts → normalises → matches)
results = matcher.match(resume, jobs)
for r in results:
    print(f"  {r['job']['title']} — Score: {r['score']} | Match: {r['match_pct']}%")

# Analyse resume without matching
analysis = matcher.analyse_resume(resume)
print(f"  Raw skills: {analysis['raw_skills']}")
print(f"  Normalised: {analysis['normalised_skills']}")

5️⃣ Job Validation Module (validator.py) ✅

Validates job posting data before saving.

from cloudhire_ai_utils import JobValidator
from cloudhire_ai_utils.validator import ValidationError

validator = JobValidator()

# Valid job
validator.validate({
    "title": "Python Developer",
    "skills": ["python", "django"],
    "salary_min": 40000,
    "salary_max": 60000,
    "status": "Active",
})  # Returns True

# Get errors without exception
errors = validator.get_errors({})
# ["'title' is required...", "'skills' is required..."]

# Add custom rule
validator.add_rule(
    lambda j: len(j.get("skills", [])) <= 10,
    "Maximum 10 skills allowed"
)

# Quick boolean check
print(validator.is_valid({"title": "Dev", "skills": ["python"]}))  # True

6️⃣ API Response Formatter (formatter.py) 📦

Standardises all API responses across the platform.

from cloudhire_ai_utils import ResponseFormatter

# Success / Error
ResponseFormatter.success({"id": 1, "title": "Developer"})
ResponseFormatter.error("Job not found", code=404)

# Pagination
ResponseFormatter.paginated(data=[...], page=1, per_page=10, total=50)

# Shortcuts
ResponseFormatter.created({"id": 5})
ResponseFormatter.deleted(resource_id=5)
ResponseFormatter.not_found("Job")
ResponseFormatter.unauthorized()
ResponseFormatter.validation_error(["Title is required"])

📊 Logging (Production-Ready)

All modules include structured Python logging for production observability.

import logging

# Enable logging to see library activity
logging.basicConfig(level=logging.INFO)

# Now all cloudhire_ai_utils operations will log:
# INFO  | cloudhire_ai_utils.recommender | Found 3 matching jobs (min_score=1)
# INFO  | cloudhire_ai_utils.parser      | Extracted 7 skills from text (255 chars)
# WARN  | cloudhire_ai_utils.validator   | Validation failed with 2 errors

🔗 Integration with CloudHire Backend

Flask Example

from flask import Flask, request, jsonify
from cloudhire_ai_utils import (
    ResumeJobMatcher, JobValidator, ResponseFormatter, SkillNormalizer
)

app = Flask(__name__)
matcher = ResumeJobMatcher()
validator = JobValidator()
normalizer = SkillNormalizer()

@app.route("/api/recommend", methods=["POST"])
def recommend():
    resume_text = request.json.get("resume", "")
    jobs = get_jobs_from_db()
    results = matcher.match(resume_text, jobs)
    return jsonify(ResponseFormatter.success(results))

@app.route("/api/parse-resume", methods=["POST"])
def parse_resume():
    text = request.json.get("text", "")
    analysis = matcher.analyse_resume(text)
    return jsonify(ResponseFormatter.success(analysis))

@app.route("/api/jobs", methods=["POST"])
def create_job():
    job_data = request.json
    # Normalise skills before saving
    job_data["skills"] = normalizer.normalize_list(job_data.get("skills", []))
    errors = validator.get_errors(job_data)
    if errors:
        return jsonify(ResponseFormatter.validation_error(errors)), 422
    saved = save_to_db(job_data)
    return jsonify(ResponseFormatter.created(saved)), 201

🧠 OOP Concepts Used

Concept Where Used
Encapsulation Private methods (_normalise, _clean_text, _synonyms)
Abstraction Simple public API hides complex logic
Inheritance ValidationError extends Exception
Polymorphism extract_skills() works on any string input
Composition ResumeJobMatcher composes Parser + Normalizer + Recommender
Static Methods ResponseFormatter uses @staticmethod
Facade Pattern ResumeJobMatcher.match() — single entry point
Strategy Pattern Multiple scoring algorithms in JobRecommender
Single Responsibility Each module handles exactly one concern
Open/Closed Custom rules via add_rule(), add_synonym() without modification

📋 Requirements

  • Python 3.8+
  • No external dependencies (pure Python!)

🚀 Build & Publish to PyPI

# Install build tools
pip install build twine

# Build the package
python -m build

# Upload to PyPI
twine upload dist/*

# Or upload to TestPyPI first
twine upload --repository testpypi dist/*

After publishing:

pip install cloudhire-ai-utils

📄 License

MIT License — see LICENSE for details.


👤 Author

Venkatsai — CloudHire Platform

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cloudhire_ai_utils-1.1.0.tar.gz (21.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cloudhire_ai_utils-1.1.0-py3-none-any.whl (20.8 kB view details)

Uploaded Python 3

File details

Details for the file cloudhire_ai_utils-1.1.0.tar.gz.

File metadata

  • Download URL: cloudhire_ai_utils-1.1.0.tar.gz
  • Upload date:
  • Size: 21.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for cloudhire_ai_utils-1.1.0.tar.gz
Algorithm Hash digest
SHA256 15bc9c87dc26d2ed28e4998c44e5debc2594ea0e5387be7af17e2505b341b876
MD5 d9d957e4b9be73cca963980c6d684515
BLAKE2b-256 2cb94cea7172f49471d6741ce9eaf12465030e1b2faa6211ee2ea4eea0584bcf

See more details on using hashes here.

File details

Details for the file cloudhire_ai_utils-1.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for cloudhire_ai_utils-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7780939dd06b6b28f15b7030c0f1abf7602bdb97c9280014604f91e08b42958d
MD5 162943dd55bd17195a2622a4b446a6b7
BLAKE2b-256 6a9766195f332cb389f326323039b726ea9619275be9b714ef002f19c98a70e0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page