Skip to main content

LLM-powered CV/resume extraction — typed Python dicts from PDF, DOCX, HTML, or TXT

Project description

cvxtract

Crates.io Docs.rs License

LLM-powered structured extraction from CVs and resumes.

cvxtract loads a CV/resume in any common format (PDF, DOCX, HTML, plain text), sends the text to an LLM of your choice, and deserialises the response directly into typed Rust structs — no regex, no hand-written parsers.

Quick start

use cvxtract::{Extractor, Model};

#[tokio::main]
async fn main() {
    // No API key needed — model is downloaded automatically on first run.
    let mut extractor = Extractor::new(Some(Model::from_local()));

    match extractor.extract_resume("resume.pdf".into()).await {
        Ok(resume) => {
            println!("Name:  {}", resume.name);
            println!("Email: {}", resume.email.as_deref().unwrap_or("-"));
            println!("Jobs:  {}", resume.experience.len());
        }
        Err(e) => eprintln!("Extraction failed: {e}"),
    }
}

Installation

[dependencies]
cvxtract = "0.1"
tokio = { version = "1", features = ["full"] }

Providers

Constructor Backend Auth
Model::from_local() llama-cpp-2 on-device (Qwen3.5-2B) none — model auto-downloaded
Model::from_openai() OpenAI API OPENAI_API_KEY
Model::from_openrouter() OpenRouter OPENROUTER_API_KEY
Model::from_ollama() Local Ollama Ollama running on localhost:11434
Model::from_openai_compatible() Any OpenAI-compatible endpoint explicit key + URL
Model::from_copilot() GitHub Copilot COPILOT_TOKEN
// OpenAI
let model = Model::from_openai("gpt-4o-mini");

// Ollama (local)
let model = Model::from_ollama("llama3.2");

// Any OpenAI-compatible endpoint
let model = Model::from_openai_compatible(
    "https://api.my-provider.com/v1",
    std::env::var("MY_API_KEY").unwrap(),
    "my-model-name",
);

Supported input formats

Format Extension
PDF .pdf
Word .docx
HTML .html, .htm
Plain text .txt

Built-in Resume type

extract_resume() populates a comprehensive Resume struct:

pub struct Resume {
    pub name:           String,
    pub email:          Option<String>,
    pub phone:          Option<String>,
    pub location:       Option<String>,
    pub linkedin:       Option<String>,
    pub github:         Option<String>,
    pub website:        Option<String>,
    pub summary:        Option<String>,
    pub experience:     Vec<Experience>,   // company, role, dates, highlights
    pub education:      Vec<Education>,    // institution, degree, field, dates
    pub skills:         Vec<SkillGroup>,   // grouped or flat skill lists
    pub projects:       Vec<Project>,      // name, tech stack, URL
    pub certifications: Vec<Certification>,
    pub languages:      Vec<Language>,
    pub awards:         Vec<Award>,
}

Custom types

Extract any shape by deriving serde::Deserialize and schemars::JsonSchema:

use cvxtract::{Extractor, Model};
use schemars::JsonSchema;
use serde::Deserialize;

#[derive(Debug, Deserialize, JsonSchema)]
struct ContactInfo {
    name:  String,
    email: Option<String>,
    phone: Option<String>,
}

#[tokio::main]
async fn main() {
    let mut extractor = Extractor::new(Some(Model::from_openai("gpt-4o-mini")));
    let info: ContactInfo = extractor
        .extract::<ContactInfo>("resume.pdf".into())
        .await
        .unwrap();
    println!("{info:#?}");
}

GPU acceleration (local model)

When using Model::from_local(), compile with a feature flag to offload layers to your GPU. llama.cpp auto-fits what it can into VRAM and falls back to CPU for the remainder — this is safe even on GPUs with limited memory.

# NVIDIA CUDA
cargo build --release --features cuda

# Apple Silicon (Metal)
cargo build --release --features metal

# AMD / Intel Vulkan
cargo build --release --features vulkan
# Cargo.toml
[dependencies]
cvxtract = { version = "0.1", features = ["cuda"] }

Error handling

All async methods return Result<T, ExtractionError>:

use cvxtract::ExtractionError;

match extractor.extract_resume(path).await {
    Ok(resume) => { /* use resume */ }
    Err(ExtractionError::LoadError(e))  => eprintln!("Could not load file: {e}"),
    Err(ExtractionError::ModelError(m)) => eprintln!("LLM error: {m}"),
    Err(ExtractionError::ParseError(e)) => eprintln!("JSON parse error: {e}"),
}

Raw document loading

Use UnstructuredLoader to extract plain text from a file without any LLM call:

use cvxtract::UnstructuredLoader;

let loader = UnstructuredLoader::new();
let doc = loader.load("resume.pdf")?;
println!("{} characters extracted", doc.content.len());

License

Licensed under either of MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cvxtract-0.3.3.tar.gz (47.9 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cvxtract-0.3.3-cp39-abi3-win_amd64.whl (4.0 MB view details)

Uploaded CPython 3.9+Windows x86-64

cvxtract-0.3.3-cp39-abi3-manylinux_2_34_x86_64.whl (4.9 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.34+ x86-64

cvxtract-0.3.3-cp39-abi3-manylinux_2_34_aarch64.whl (4.8 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.34+ ARM64

cvxtract-0.3.3-cp39-abi3-macosx_11_0_x86_64.whl (4.5 MB view details)

Uploaded CPython 3.9+macOS 11.0+ x86-64

cvxtract-0.3.3-cp39-abi3-macosx_11_0_arm64.whl (4.3 MB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

File details

Details for the file cvxtract-0.3.3.tar.gz.

File metadata

  • Download URL: cvxtract-0.3.3.tar.gz
  • Upload date:
  • Size: 47.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cvxtract-0.3.3.tar.gz
Algorithm Hash digest
SHA256 5e23520a753a7dcdfcfabc68094df9be30788c120bf69df0ad2f21d4faee2630
MD5 cf1ef9005e3e404eb837f8686d98e171
BLAKE2b-256 31d37d6caaf37d9ac97a794135bc0d0c469a78a42c18604af4a83f7974256dc4

See more details on using hashes here.

Provenance

The following attestation bundles were made for cvxtract-0.3.3.tar.gz:

Publisher: publish-pypi.yml on satadeep3927/cvxtract

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cvxtract-0.3.3-cp39-abi3-win_amd64.whl.

File metadata

  • Download URL: cvxtract-0.3.3-cp39-abi3-win_amd64.whl
  • Upload date:
  • Size: 4.0 MB
  • Tags: CPython 3.9+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cvxtract-0.3.3-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 61f22078503fd027b2931ebe8e977829ca66ca68e0b65903d17946aaf9520904
MD5 fee120332f6a326d74b05fbc870ed808
BLAKE2b-256 acd2a3ef86907792ae668c7c6e89d6732ad886329e7607b50f7af8b60234880b

See more details on using hashes here.

Provenance

The following attestation bundles were made for cvxtract-0.3.3-cp39-abi3-win_amd64.whl:

Publisher: publish-pypi.yml on satadeep3927/cvxtract

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cvxtract-0.3.3-cp39-abi3-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for cvxtract-0.3.3-cp39-abi3-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 f912a4ebca0e20d8fa980a7af4121b1f9d8a5b7887805935b41aa42c79b0e45c
MD5 01e958f2d5cc01af376d1640e165ca7c
BLAKE2b-256 edfdeea694fa5d0e6c2da98eacf9777b83352b38148b11fc21e2d1c85921381a

See more details on using hashes here.

Provenance

The following attestation bundles were made for cvxtract-0.3.3-cp39-abi3-manylinux_2_34_x86_64.whl:

Publisher: publish-pypi.yml on satadeep3927/cvxtract

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cvxtract-0.3.3-cp39-abi3-manylinux_2_34_aarch64.whl.

File metadata

File hashes

Hashes for cvxtract-0.3.3-cp39-abi3-manylinux_2_34_aarch64.whl
Algorithm Hash digest
SHA256 a9d1c7ecac176b1cc365852502fabb2e223a8e3ef988765dba29ffae70cf8ce5
MD5 f0b51584f711a0955b0af9af267bf5b8
BLAKE2b-256 cb53a460cc01e3006707c215d3257f368fddd00ceca10479742154d6ee9c3d30

See more details on using hashes here.

Provenance

The following attestation bundles were made for cvxtract-0.3.3-cp39-abi3-manylinux_2_34_aarch64.whl:

Publisher: publish-pypi.yml on satadeep3927/cvxtract

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cvxtract-0.3.3-cp39-abi3-macosx_11_0_x86_64.whl.

File metadata

File hashes

Hashes for cvxtract-0.3.3-cp39-abi3-macosx_11_0_x86_64.whl
Algorithm Hash digest
SHA256 e97ea41eeb8df8fac6f0539f8bbf8f4022aef21f7ad87dc95097726f902557a5
MD5 35eb2ae102484913c40c8d7f9aa05bb8
BLAKE2b-256 7d407bcb646615cbb5fb4951144f189fe67ca03391d11147cd639a8cef0d092d

See more details on using hashes here.

Provenance

The following attestation bundles were made for cvxtract-0.3.3-cp39-abi3-macosx_11_0_x86_64.whl:

Publisher: publish-pypi.yml on satadeep3927/cvxtract

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cvxtract-0.3.3-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cvxtract-0.3.3-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 3e8f9578cd7693e95d26fff6c24a9314710216360a9ba9a365b9abf9d8cd4ab8
MD5 72e86166424c9e6e4c8044fff61d65bc
BLAKE2b-256 ce73d1078f0dec6607684c3b362b75ff75417bf4253077c820d67c798f2eabe8

See more details on using hashes here.

Provenance

The following attestation bundles were made for cvxtract-0.3.3-cp39-abi3-macosx_11_0_arm64.whl:

Publisher: publish-pypi.yml on satadeep3927/cvxtract

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page