Skip to main content

Kreuzberg document extraction tool for CrewAI agents

Project description

kreuzberg-crewai

PyPI version Python versions License Docs CI
Kreuzberg Banner

Kreuzberg document extraction tools for CrewAI agents.

Extract text and metadata from 88+ document formats — PDF, DOCX, XLSX, HTML, images with OCR, and more — directly from your CrewAI agents.

Installation

pip install kreuzberg-crewai

Quick Start

from crewai import Agent, Crew, Task

from kreuzberg_crewai import KreuzbergExtractTool

tool = KreuzbergExtractTool()

agent = Agent(
    role="Document Analyst",
    goal="Extract and analyze document content",
    backstory="You are an expert at reading and understanding documents.",
    tools=[tool],
)

task = Task(
    description="Extract the content from report.pdf and summarize the key findings.",
    expected_output="A summary of the key findings in the report.",
    agent=agent,
)

crew = Crew(agents=[agent], tasks=[task])
result = crew.kickoff()

Tools

KreuzbergExtractTool

Extracts text content from a document file.

Parameters:

Parameter Type Default Description
file_path str required Path to the document file
output_format "plain" | "markdown" | "html" "markdown" Output format
from kreuzberg_crewai import KreuzbergExtractTool

tool = KreuzbergExtractTool()

# The agent calls this automatically, but you can also call it directly:
content = tool._run(file_path="report.pdf", output_format="markdown")

KreuzbergExtractMetadataTool

Extracts metadata (title, authors, dates, page count, format-specific details) from a document file.

Parameters:

Parameter Type Default Description
file_path str required Path to the document file
from kreuzberg_crewai import KreuzbergExtractMetadataTool

tool = KreuzbergExtractMetadataTool()

metadata = tool._run(file_path="report.pdf")
# title: Annual Report 2025
# authors: ['John Doe']
# page_count: 42
# pdf_version: 1.7

Agent Example

Using both tools together:

from crewai import Agent, Crew, Task

from kreuzberg_crewai import KreuzbergExtractMetadataTool, KreuzbergExtractTool

extract_tool = KreuzbergExtractTool()
metadata_tool = KreuzbergExtractMetadataTool()

agent = Agent(
    role="Research Assistant",
    goal="Read documents and extract useful information",
    backstory="You help researchers by reading and analyzing documents.",
    tools=[extract_tool, metadata_tool],
)

task = Task(
    description=(
        "First, check the metadata of research-paper.pdf to find the authors and date. "
        "Then extract the full content in markdown format and list the key conclusions."
    ),
    expected_output="Authors, date, and key conclusions from the paper.",
    agent=agent,
)

crew = Crew(agents=[agent], tasks=[task])
result = crew.kickoff()

Supported Formats

Kreuzberg supports 88+ file formats:

  • Documents: PDF, DOCX, DOC, XLSX, XLS, PPTX, PPT, ODT, ODS, ODP, RTF, and more
  • Text/Markup: TXT, MD, HTML, XML, JSON, YAML, LaTeX, Jupyter notebooks
  • Images (OCR): PNG, JPEG, TIFF, GIF, BMP, WEBP, SVG
  • Email: EML, MSG (with attachment extraction)
  • eBooks: EPUB
  • Archives: ZIP, RAR, 7Z, TAR, GZIP
  • Data: CSV, DBF

Development

# Install dependencies
uv sync

# Run tests
uv run pytest

# Run linting
uv run ruff check src/ tests/
uv run ruff format --check src/ tests/

# Run type checking
uv run mypy src/

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kreuzberg_crewai-0.1.0.tar.gz (4.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kreuzberg_crewai-0.1.0-py3-none-any.whl (5.4 kB view details)

Uploaded Python 3

File details

Details for the file kreuzberg_crewai-0.1.0.tar.gz.

File metadata

  • Download URL: kreuzberg_crewai-0.1.0.tar.gz
  • Upload date:
  • Size: 4.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for kreuzberg_crewai-0.1.0.tar.gz
Algorithm Hash digest
SHA256 82036953ba59eca32a4a3cca939a2d81ca5a2b269db57861acc6fb7c66f353b0
MD5 f0522defbaf59d35ea62dfd2d781a0d7
BLAKE2b-256 32053d84b6fe6ec321ec282dd96f05e07a65185b6444f4f45fec93c4d3ef1fec

See more details on using hashes here.

Provenance

The following attestation bundles were made for kreuzberg_crewai-0.1.0.tar.gz:

Publisher: publish.yaml on kreuzberg-dev/kreuzberg-crewai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file kreuzberg_crewai-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for kreuzberg_crewai-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 70704ca6f23f5fad65b47d90abeb1ef667aeb229d65be36be02906aa373e2c20
MD5 c7a8025dc435ea966b5fc2c61908daa5
BLAKE2b-256 4d8a73a05408d977ccca930d2079de41c051714cde80bf72ac0af900b45621fc

See more details on using hashes here.

Provenance

The following attestation bundles were made for kreuzberg_crewai-0.1.0-py3-none-any.whl:

Publisher: publish.yaml on kreuzberg-dev/kreuzberg-crewai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page