LLM-powered CV/resume extraction — typed Python dicts from PDF, DOCX, HTML, or TXT
Project description
cvxtract
LLM-powered structured extraction from CVs and resumes.
cvxtract loads a CV/resume in any common format (PDF, DOCX, HTML, plain text), sends the text to an LLM of your choice, and deserialises the response directly into typed Rust structs — no regex, no hand-written parsers.
Quick start
use cvxtract::{Extractor, Model};
#[tokio::main]
async fn main() {
// No API key needed — model is downloaded automatically on first run.
let mut extractor = Extractor::new(Some(Model::from_local()));
match extractor.extract_resume("resume.pdf".into()).await {
Ok(resume) => {
println!("Name: {}", resume.name);
println!("Email: {}", resume.email.as_deref().unwrap_or("-"));
println!("Jobs: {}", resume.experience.len());
}
Err(e) => eprintln!("Extraction failed: {e}"),
}
}
Installation
[dependencies]
cvxtract = "0.1"
tokio = { version = "1", features = ["full"] }
Providers
| Constructor | Backend | Auth |
|---|---|---|
Model::from_local() |
llama-cpp-2 on-device (Qwen3.5-2B) | none — model auto-downloaded |
Model::from_openai() |
OpenAI API | OPENAI_API_KEY |
Model::from_openrouter() |
OpenRouter | OPENROUTER_API_KEY |
Model::from_ollama() |
Local Ollama | Ollama running on localhost:11434 |
Model::from_openai_compatible() |
Any OpenAI-compatible endpoint | explicit key + URL |
Model::from_copilot() |
GitHub Copilot | COPILOT_TOKEN |
// OpenAI
let model = Model::from_openai("gpt-4o-mini");
// Ollama (local)
let model = Model::from_ollama("llama3.2");
// Any OpenAI-compatible endpoint
let model = Model::from_openai_compatible(
"https://api.my-provider.com/v1",
std::env::var("MY_API_KEY").unwrap(),
"my-model-name",
);
Supported input formats
| Format | Extension |
|---|---|
.pdf |
|
| Word | .docx |
| HTML | .html, .htm |
| Plain text | .txt |
Built-in Resume type
extract_resume() populates a comprehensive Resume struct:
pub struct Resume {
pub name: String,
pub email: Option<String>,
pub phone: Option<String>,
pub location: Option<String>,
pub linkedin: Option<String>,
pub github: Option<String>,
pub website: Option<String>,
pub summary: Option<String>,
pub experience: Vec<Experience>, // company, role, dates, highlights
pub education: Vec<Education>, // institution, degree, field, dates
pub skills: Vec<SkillGroup>, // grouped or flat skill lists
pub projects: Vec<Project>, // name, tech stack, URL
pub certifications: Vec<Certification>,
pub languages: Vec<Language>,
pub awards: Vec<Award>,
}
Custom types
Extract any shape by deriving serde::Deserialize and schemars::JsonSchema:
use cvxtract::{Extractor, Model};
use schemars::JsonSchema;
use serde::Deserialize;
#[derive(Debug, Deserialize, JsonSchema)]
struct ContactInfo {
name: String,
email: Option<String>,
phone: Option<String>,
}
#[tokio::main]
async fn main() {
let mut extractor = Extractor::new(Some(Model::from_openai("gpt-4o-mini")));
let info: ContactInfo = extractor
.extract::<ContactInfo>("resume.pdf".into())
.await
.unwrap();
println!("{info:#?}");
}
GPU acceleration (local model)
When using Model::from_local(), compile with a feature flag to offload layers to
your GPU. llama.cpp auto-fits what it can into VRAM and falls back to CPU for the
remainder — this is safe even on GPUs with limited memory.
# NVIDIA CUDA
cargo build --release --features cuda
# Apple Silicon (Metal)
cargo build --release --features metal
# AMD / Intel Vulkan
cargo build --release --features vulkan
# Cargo.toml
[dependencies]
cvxtract = { version = "0.1", features = ["cuda"] }
Error handling
All async methods return Result<T, ExtractionError>:
use cvxtract::ExtractionError;
match extractor.extract_resume(path).await {
Ok(resume) => { /* use resume */ }
Err(ExtractionError::LoadError(e)) => eprintln!("Could not load file: {e}"),
Err(ExtractionError::ModelError(m)) => eprintln!("LLM error: {m}"),
Err(ExtractionError::ParseError(e)) => eprintln!("JSON parse error: {e}"),
}
Raw document loading
Use UnstructuredLoader to extract plain text from a file without any LLM call:
use cvxtract::UnstructuredLoader;
let loader = UnstructuredLoader::new();
let doc = loader.load("resume.pdf")?;
println!("{} characters extracted", doc.content.len());
License
Licensed under either of MIT license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cvxtract-0.3.3.tar.gz.
File metadata
- Download URL: cvxtract-0.3.3.tar.gz
- Upload date:
- Size: 47.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5e23520a753a7dcdfcfabc68094df9be30788c120bf69df0ad2f21d4faee2630
|
|
| MD5 |
cf1ef9005e3e404eb837f8686d98e171
|
|
| BLAKE2b-256 |
31d37d6caaf37d9ac97a794135bc0d0c469a78a42c18604af4a83f7974256dc4
|
Provenance
The following attestation bundles were made for cvxtract-0.3.3.tar.gz:
Publisher:
publish-pypi.yml on satadeep3927/cvxtract
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cvxtract-0.3.3.tar.gz -
Subject digest:
5e23520a753a7dcdfcfabc68094df9be30788c120bf69df0ad2f21d4faee2630 - Sigstore transparency entry: 1096016065
- Sigstore integration time:
-
Permalink:
satadeep3927/cvxtract@f72f640920138658f599befb7dcd3aebb891185b -
Branch / Tag:
refs/tags/v0.3.3 - Owner: https://github.com/satadeep3927
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@f72f640920138658f599befb7dcd3aebb891185b -
Trigger Event:
push
-
Statement type:
File details
Details for the file cvxtract-0.3.3-cp39-abi3-win_amd64.whl.
File metadata
- Download URL: cvxtract-0.3.3-cp39-abi3-win_amd64.whl
- Upload date:
- Size: 4.0 MB
- Tags: CPython 3.9+, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
61f22078503fd027b2931ebe8e977829ca66ca68e0b65903d17946aaf9520904
|
|
| MD5 |
fee120332f6a326d74b05fbc870ed808
|
|
| BLAKE2b-256 |
acd2a3ef86907792ae668c7c6e89d6732ad886329e7607b50f7af8b60234880b
|
Provenance
The following attestation bundles were made for cvxtract-0.3.3-cp39-abi3-win_amd64.whl:
Publisher:
publish-pypi.yml on satadeep3927/cvxtract
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cvxtract-0.3.3-cp39-abi3-win_amd64.whl -
Subject digest:
61f22078503fd027b2931ebe8e977829ca66ca68e0b65903d17946aaf9520904 - Sigstore transparency entry: 1096016074
- Sigstore integration time:
-
Permalink:
satadeep3927/cvxtract@f72f640920138658f599befb7dcd3aebb891185b -
Branch / Tag:
refs/tags/v0.3.3 - Owner: https://github.com/satadeep3927
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@f72f640920138658f599befb7dcd3aebb891185b -
Trigger Event:
push
-
Statement type:
File details
Details for the file cvxtract-0.3.3-cp39-abi3-manylinux_2_34_x86_64.whl.
File metadata
- Download URL: cvxtract-0.3.3-cp39-abi3-manylinux_2_34_x86_64.whl
- Upload date:
- Size: 4.9 MB
- Tags: CPython 3.9+, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f912a4ebca0e20d8fa980a7af4121b1f9d8a5b7887805935b41aa42c79b0e45c
|
|
| MD5 |
01e958f2d5cc01af376d1640e165ca7c
|
|
| BLAKE2b-256 |
edfdeea694fa5d0e6c2da98eacf9777b83352b38148b11fc21e2d1c85921381a
|
Provenance
The following attestation bundles were made for cvxtract-0.3.3-cp39-abi3-manylinux_2_34_x86_64.whl:
Publisher:
publish-pypi.yml on satadeep3927/cvxtract
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cvxtract-0.3.3-cp39-abi3-manylinux_2_34_x86_64.whl -
Subject digest:
f912a4ebca0e20d8fa980a7af4121b1f9d8a5b7887805935b41aa42c79b0e45c - Sigstore transparency entry: 1096016121
- Sigstore integration time:
-
Permalink:
satadeep3927/cvxtract@f72f640920138658f599befb7dcd3aebb891185b -
Branch / Tag:
refs/tags/v0.3.3 - Owner: https://github.com/satadeep3927
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@f72f640920138658f599befb7dcd3aebb891185b -
Trigger Event:
push
-
Statement type:
File details
Details for the file cvxtract-0.3.3-cp39-abi3-manylinux_2_34_aarch64.whl.
File metadata
- Download URL: cvxtract-0.3.3-cp39-abi3-manylinux_2_34_aarch64.whl
- Upload date:
- Size: 4.8 MB
- Tags: CPython 3.9+, manylinux: glibc 2.34+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a9d1c7ecac176b1cc365852502fabb2e223a8e3ef988765dba29ffae70cf8ce5
|
|
| MD5 |
f0b51584f711a0955b0af9af267bf5b8
|
|
| BLAKE2b-256 |
cb53a460cc01e3006707c215d3257f368fddd00ceca10479742154d6ee9c3d30
|
Provenance
The following attestation bundles were made for cvxtract-0.3.3-cp39-abi3-manylinux_2_34_aarch64.whl:
Publisher:
publish-pypi.yml on satadeep3927/cvxtract
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cvxtract-0.3.3-cp39-abi3-manylinux_2_34_aarch64.whl -
Subject digest:
a9d1c7ecac176b1cc365852502fabb2e223a8e3ef988765dba29ffae70cf8ce5 - Sigstore transparency entry: 1096016104
- Sigstore integration time:
-
Permalink:
satadeep3927/cvxtract@f72f640920138658f599befb7dcd3aebb891185b -
Branch / Tag:
refs/tags/v0.3.3 - Owner: https://github.com/satadeep3927
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@f72f640920138658f599befb7dcd3aebb891185b -
Trigger Event:
push
-
Statement type:
File details
Details for the file cvxtract-0.3.3-cp39-abi3-macosx_11_0_x86_64.whl.
File metadata
- Download URL: cvxtract-0.3.3-cp39-abi3-macosx_11_0_x86_64.whl
- Upload date:
- Size: 4.5 MB
- Tags: CPython 3.9+, macOS 11.0+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e97ea41eeb8df8fac6f0539f8bbf8f4022aef21f7ad87dc95097726f902557a5
|
|
| MD5 |
35eb2ae102484913c40c8d7f9aa05bb8
|
|
| BLAKE2b-256 |
7d407bcb646615cbb5fb4951144f189fe67ca03391d11147cd639a8cef0d092d
|
Provenance
The following attestation bundles were made for cvxtract-0.3.3-cp39-abi3-macosx_11_0_x86_64.whl:
Publisher:
publish-pypi.yml on satadeep3927/cvxtract
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cvxtract-0.3.3-cp39-abi3-macosx_11_0_x86_64.whl -
Subject digest:
e97ea41eeb8df8fac6f0539f8bbf8f4022aef21f7ad87dc95097726f902557a5 - Sigstore transparency entry: 1096016089
- Sigstore integration time:
-
Permalink:
satadeep3927/cvxtract@f72f640920138658f599befb7dcd3aebb891185b -
Branch / Tag:
refs/tags/v0.3.3 - Owner: https://github.com/satadeep3927
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@f72f640920138658f599befb7dcd3aebb891185b -
Trigger Event:
push
-
Statement type:
File details
Details for the file cvxtract-0.3.3-cp39-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: cvxtract-0.3.3-cp39-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 4.3 MB
- Tags: CPython 3.9+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3e8f9578cd7693e95d26fff6c24a9314710216360a9ba9a365b9abf9d8cd4ab8
|
|
| MD5 |
72e86166424c9e6e4c8044fff61d65bc
|
|
| BLAKE2b-256 |
ce73d1078f0dec6607684c3b362b75ff75417bf4253077c820d67c798f2eabe8
|
Provenance
The following attestation bundles were made for cvxtract-0.3.3-cp39-abi3-macosx_11_0_arm64.whl:
Publisher:
publish-pypi.yml on satadeep3927/cvxtract
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cvxtract-0.3.3-cp39-abi3-macosx_11_0_arm64.whl -
Subject digest:
3e8f9578cd7693e95d26fff6c24a9314710216360a9ba9a365b9abf9d8cd4ab8 - Sigstore transparency entry: 1096016114
- Sigstore integration time:
-
Permalink:
satadeep3927/cvxtract@f72f640920138658f599befb7dcd3aebb891185b -
Branch / Tag:
refs/tags/v0.3.3 - Owner: https://github.com/satadeep3927
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@f72f640920138658f599befb7dcd3aebb891185b -
Trigger Event:
push
-
Statement type: