A command-line CV Analyzer tool that processes PDF resumes and performs analysis.
Project description
Writeup - Software Engineering Project - 22916 2025A
Command-line interface (CLI) tool designed to analyze CVs (curriculum vitae) in PDF format. It allows users, particularly HR professionals, to evaluate CVs against specific job descriptions. The tool provides a match score, highlights relevant skills, and generates a summary of the candidate's suitability for the role. The analysis can be performed for individual CVs or in batch mode for multiple CVs in a directory. The project leverages Gemini APIs for advanced text analysis and supports output in JSON or PDF format or displaying in terminal.
Authors
| Contributor | GitHub Profile |
|---|---|
| @dolev146 | GitHub |
| @shimshon21 | GitHub |
| @yakov103 | GitHub |
Phase 1: Requirements Engineering
Consult with an LLM to define ONE significant and interesting CV analysis feature.
prompt : "I am a software developer developing a terminal Python app that gets a CV (curriculum vitae) PDF file as input and a text prompt from the user, for example, "Does this CV match the iOS developer?" I want to consult with you about what could be a significant and interesting analysis feature I can implement in my software. Document the feature's requirements clearly. Include acceptance criteria. The users of our software will be Human resources trying to find the best job applicant by role, seniority and relevent skills."
Document the feature's requirements clearly.
Feature Overview
Feature Name
Role Matching & Competency Analysis
Description:
The Role Matching & Competency Analysis feature takes a PDF CV and a text prompt describing the job role (e.g., “iOS Developer” or specific requirement job post. It then analyzes the CV content for skills, experiences, and qualifications relevant to that role. The outcome are:
- Numerical score between 0 - 10
- Years of Experience
- Summary
- Relevent skills
- Other skills
- Pros
- Cons
User Stories
As an HR professional, I want to input a CV and a prompt describing the job position so that I can receive a quick assessment of how well the CV matches the role. As an HR professional, I want to see a breakdown of relevant skills and experiences extracted from the CV so that I can understand the candidate's strengths and areas where they might fall short. As an HR professional, I want a confidence or match score that summarizes the overall fit so that I can compare multiple applicants easily.
Functional Requirements
-
File Parsing & Text Extraction Accept PDF input via command line, extract text reliably (up to ~10 pages).
-
Scoring Mechanism Generate a 0-10 score based on keyword frequency, critical experience, and years of experience.
-
Skills Generate Relevant skills which most likley will be required for the position. And other skill which might be less relvevant for the position.
-
Pros and Cons Generate all positive aspects and negative points as well.
Non-Functional Requirements
- Performance
The system should provide results for a typical 2–5 page CV within a few seconds. Larger CVs (up to ~10 pages) should be processed within an acceptable time (under 30 seconds, for example).
- Reliability
The system should reliably parse PDFs with standard text. If the PDF is scanned or image-based, the system may return an error or a warning unless Optical Character Recognition (OCR) is implemented.
- Maintainability
The code should be modular, with separate functions for PDF parsing, text analysis, scoring, and summarization, allowing for easy updates or improvements to each component.
- Scalability
The feature should be designed so that new skills or roles can be added easily to the matching algorithm (e.g., by adding or updating a skills dictionary or training an NLP model).
Include acceptance criteria
Acceptance Criteria
1. CV Input & Prompt Capture
Given a valid PDF file path and a relevant prompt, when the user runs the software in the terminal, then the software extracts text from the PDF and processes the prompt without error.**
2. Analysis & Scoring
Acceptence Criteria 1: Given a valid PDF that includes role (e.g:“iOS developer”) and seniority(e.g: Junior) , when the user runs the analysis, then the system outputs: A match score between 0 and 10, A Years of Experience, A short textual summary describing the candidate’s suitability, A list of detected relevant skills (e.g., “Swift,” “Objective-C,” “UIKit”), A list of other relevant skills (e.g., “Java,” “Kotlin,” “Android”), A list of Pros, A list of Cons.
Acceptence Criteria 2: Given a valid PDF that lacks any relevant skills for the prompt, when the user runs the analysis, then the system outputs a low match score (e.g., near 0) and indicates no relevant skills found. Invalid CV or Prompt
Acceptence Criteria 3(not implemented): Given an invalid or non-existent PDF file path, when the user runs the software, then the system provides an error message and does not crash. Given an empty or nonsensical prompt (e.g., “asdfghjk”), when the user runs the software, then the system warns the user that the role could not be identified and proceeds with minimal or no matching. Performance
Acceptence Criteria 4: Given a CV of ~5 pages, when the user runs the analysis, then the system should return results within a few seconds (e.g., under 20 seconds).
- Document LLM interactions (link). chat
Phase 2: Architecture
Define Command-Line Interface Specification (Inline)
CLI Commands
writeup evaluate
Evaluates a single CV PDF against a specific position and seniority level, optionally overriding the default API key. Generates a structured report in either JSON or PDF format.
Usage
writeup evaluate [OPTIONS] FILE
Description
- FILE (Positional Argument, Required)
Path to the CV PDF file that you want to evaluate.
Options
-
--position, -p (TEXT, Required)
The name/title of the position you are evaluating for (e.g., "Software Engineer"). -
--seniority, -s (TEXT, Required)
The seniority level of the position (e.g., "Mid-Level," "Senior," etc.). -
--output, -o (TEXT)
The output file name for the generated report.
If omitted, a default name is used:evaluation_report.jsonif--format jsonevaluation_report.pdfif--format pdf
-
--format, -f (TEXT, default: json)
The report format. Choose json or pdf. -
--api-key, -t (TEXT)
An optional override for theGEMINI_API_KEYenvironment variable.
Use this if you do not want to rely on the API key from.env. -
--help
Displays usage information and exits.
Example
writeup evaluate resume.pdf -p "Data Scientist" -s "Senior" --format pdf
This command evaluates resume.pdf for a Senior Data Scientist role, then creates and saves a PDF report (by default, named evaluation_report.pdf).
Plan file system interactions, i.e., input/output (inline).
Inputs
- The system accepts PDF files as input.
- Input files can be specified using a file path or directory path.
- When evaluating multiple files in a batch, the system will scan the specified directory for files matching the
.pdfextension. - The input file(s) must be accessible and readable by the system.
Outputs:
-
Output files are generated in JSON or PDF format.
-
The default output file name follows this pattern:
evaluation_report.json– for single-file JSON reportsevaluation_report.pdf– for single-file PDF reportsbatch_evaluation_report.json– for batch JSON reportsbatch_evaluation_report.pdf– for batch PDF reports
-
If an output filename is specified using the
--outputoption, the system will override the default name.
Your feature may use additional files for input and output.
A .env file or config file could store default API_KEY and other environment variables.
A logs directory could store logs if needed for debugging or usage reporting.
Identify relevant third-party libraries.
Typer
For easy and intuitive CLI creation.
Poetry
Dependency management and project packaging.
Curl
For requests
Google-Genai For Pdf analysis
Define team member responsibilities.
| User | Responsibilitiy |
|---|---|
| @dolev146 | Reasearch & Communication with Google api |
| @shimshon21 | Export to pdf & Third libraries managment & Documentation |
| @yakov103 | Infrastucture & Data flow |
LLM Interactions: chat
Phase 3: Design
🧠 CRC: Classes, Responsibilities, and Collaborations
A structured overview of the main classes, their core responsibilities, and their collaborators in the system.
🧑💻 CLI Interface
| 🔧 Class | 📌 Responsibilities | 🤝 Collaborations |
|---|---|---|
| CLIHandler (cli.py) |
- Parse command-line arguments (file path, position, seniority, output format). - Orchestrate the application's flow. - Handle user interaction and terminal output. |
- Calls Analyzer - Uses JsonExporter and PDFExporter - Interacts with Feedback |
🧰 Core Logic
| 🔧 Class | 📌 Responsibilities | 🤝 Collaborations |
|---|---|---|
| Analyzer (evaluator.py) |
- Evaluate CVs using the Gemini API. - Generate structured feedback (score, skills, pros, cons). - Support batch evaluations. |
- Uses Feedback - Interacts with TextPreprocessor - Uses Gemini API |
📦 Data Models
| 🔧 Class | 📌 Responsibilities | 🤝 Collaborations |
|---|---|---|
| Feedback (models.py) |
- Represent the structured output from the LLM (e.g., score, summary, skills). - Serve as a data model for reports. |
- Used by Analyzer - Passed to JsonExporter and PDFExporter |
📤 Reporting / Exporters
| 🔧 Class | 📌 Responsibilities | 🤝 Collaborations |
|---|---|---|
| JsonExporter (json_report.py) |
- Generate and save evaluation results as JSON. - Support single and batch exports. |
- Receives Feedback - Called by CLIHandler |
| PDFExporter (pdf_report.py) |
- Generate a PDF report using LLM output. | - Called by CLIHandler - Receives Feedback |
Phase 4: Coding & Testing
Files table:
| Directory | File Name | Description |
|---|---|---|
| writeup | cli.py | Get user input and return output in Json/PDF/Console format |
| writeup -> core | evaluator.py | Fetch response from LLM by given prompt of user required experience seniority |
| models.py | Store models fetched from the LLM | |
| writeup -> reports | json_report.py | Export LLM response into Json file format |
| pdf_report.py | Export LLM response into PDF file format | |
| writeup -> utils | text_utils.py | UI text utils for drawing break lines |
| writeup -> tests | conftest.py | Tests configuration file |
| test_batch_evaluate.py | Batch evaluate tests | |
| test_evaluate.py | Evaluate tests for multiple mock pdf files |
Testing
We use pytest as our testing framework. Below are the details of the tests implemented:
Test Configuration (conftest.py)
- gemini_api_key: A fixture that retrieves the
GEMINI_API_KEYenvironment variable. If the key is not provided, it skips tests that require a real API call. - cv_dir: A fixture that provides the path to the
cvdirectory, which is assumed to be located one level above thetestsdirectory.
Batch Evaluate Tests (test_batch_evaluate.py)
These tests are designed to evaluate the functionality of batch processing multiple CV PDFs in a directory.
- test_batch_evaluate_success: Tests the successful evaluation of multiple CV PDFs in a directory and checks if the batch report is generated correctly.
- test_batch_evaluate_no_files: Tests the scenario where no PDF files are found in the specified directory and ensures the appropriate error message is displayed.
- test_batch_evaluate_invalid_api_key: Tests the scenario where an invalid API key is provided and ensures the appropriate error message is displayed.
Evaluate Tests (test_evaluate.py)
These tests are designed to evaluate the functionality of processing a single CV PDF.
- test_evaluate_success: Tests the successful evaluation of a single CV PDF and checks if the report is generated correctly.
- test_evaluate_file_not_found: Tests the scenario where the specified CV PDF file is not found and ensures the appropriate error message is displayed.
- test_evaluate_invalid_api_key: Tests the scenario where an invalid API key is provided and ensures the appropriate error message is displayed.
- test_evaluate_invalid_format: Tests the scenario where an invalid report format is specified and ensures the appropriate error message is displayed.
Running Tests
To run the tests, use the following command:
poetry run pytest
This command will execute all the tests in the tests directory and provide a summary of the test results.
Phase 5: Documentation
Project Overview
The Writeup project is a CLI tool designed for HR professionals to analyze CVs in PDF format. It evaluates CVs against job descriptions, providing a match score, relevant skills, years of experience, and a summary of the candidate's suitability. The tool supports batch processing, JSON/PDF output, and leverages Gemini APIs for advanced text analysis. It is modular, scalable, and optimized for performance.
Project Structure
.
├── README.md
├── evaluation_report.json
├── poetry.lock
├── pyproject.toml
├── tests
│ ├── conftest.py
│ ├── test_batch_evaluate.py
│ └── test_evaluate.py
└── writeup
├── cli.py
├── core
│ ├── evaluator.py
│ └── models.py
├── reports
│ ├── json_report.py
│ └── pdf_report.py
└── utils
└── text_utils.py
6 directories, 13 files
Demo
Evaluate feature demo:
https://github.com/user-attachments/assets/1fef6e6f-c038-4f56-b554-e2a75cb2e037
Installation
pipx install writeup-cv-cli
Usage
After installing, you can run the CLI command as follows:
writeup evaluate --pdf-path path/to/cv.pdf -t <gemini_token>
This command scans and analyzes the provided CV PDF file and outputs an analysis report.
Development
-
Run Tests:
poetry run pytest
-
LLM Interactions: Save all LLM chats in the chats/ directory.
Screenshots
PDF Example:
Analyzer Output:
Analyzer JSON Output:
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file writeup_cv_cli-0.4.3.tar.gz.
File metadata
- Download URL: writeup_cv_cli-0.4.3.tar.gz
- Upload date:
- Size: 18.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.12.9 Darwin/24.2.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8e16a7c5b297764548da48aac4472472dbc69d71fbb83861e3682fc1aaa21a1a
|
|
| MD5 |
5a8ba702b13dd7d80751a75720eba320
|
|
| BLAKE2b-256 |
35516159236363c818a2804feb5287d07ced330d9a6e28c06440e06704225bc4
|
File details
Details for the file writeup_cv_cli-0.4.3-py3-none-any.whl.
File metadata
- Download URL: writeup_cv_cli-0.4.3-py3-none-any.whl
- Upload date:
- Size: 15.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.12.9 Darwin/24.2.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cef979b2cd09e9557742e180bf28eb6a3ff674ccb99fd803c96a8e1fc33061eb
|
|
| MD5 |
53140e58a17fd388354cf0237c8f6355
|
|
| BLAKE2b-256 |
b94f2521ecb7dd23de0fbb7d5d8a58488d1e43d892e4c02cd8cbd9af4569bb98
|