A Python package for document processing and analysis with LLM integration

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language

Project description

aigrok

A Python package for document processing and analysis, with initial support for PDF files and LLM integration.

Why "grok"?

Ever wondered why we chose the name "aigrok"? Well, "grok" is a term coined by Robert A. Heinlein in his 1961 science fiction novel "Stranger in a Strange Land". It means to understand something so thoroughly that you become one with it. Or as a Martian would say, "to drink it all in" (literally in their case - Martians were quite... thorough in their understanding).

We thought it was the perfect name for our tool because:

It's doing deep document analysis (groking the content)
It's using AI to understand documents (artificial groking, if you will)
It sounds like a noise a PDF would make if you squeezed it too hard

Plus, let's be honest, "aigrok" is way cooler than "ai_document_analyzer_v2_final_FINAL_really_final.py"

Installation

Using pip

You can install aigrok directly from PyPI:

pip install aigrok

From Source

Clone the repository:

git clone https://github.com/yourusername/aigrok.git
cd aigrok

Install in development mode:

pip install -e .

Install Ollama (required for LLM integration):
- Follow instructions at Ollama's website
- Pull the model you want to use (e.g., ollama pull llama3.2-vision:11b)

Usage

Command Line Interface

Process PDFs directly from the command line using the aigrok command:

# Basic PDF text extraction
aigrok "Extract the text" input.pdf

# Analyze PDF with LLM
aigrok "Summarize this document" input.pdf --model llama3.2-vision:11b

# Extract specific information with different output formats
aigrok "Extract author names" input.pdf --format text  # Comma-separated list
aigrok "Extract author names" input.pdf --format json  # JSON array of objects
aigrok "Extract author names" input.pdf --format csv   # CSV with headers

# Extract metadata only
aigrok "Extract metadata" input.pdf --metadata-only

# Save output to file
aigrok "Extract the text" input.pdf -o output.txt

# Enable verbose logging
aigrok "Extract the text" input.pdf -v

Available options:

input: Path to the PDF file
--model: Name of the Ollama model to use
--output, -o: Save output to file
--format: Output format (text, json, csv, markdown)
--metadata-only: Only extract metadata
--verbose, -v: Enable verbose logging

Output Formats

The tool supports multiple output formats:

Text (default):

Lanxiang Hu, Qiyu Li, Anze Xie, Nan Jiang, Haojian Jin, Hao Zhang

JSON:

[
  {"first_name": "Lanxiang", "last_name": "Hu"},
  {"first_name": "Qiyu", "last_name": "Li"},
  {"first_name": "Anze", "last_name": "Xie"}
]

CSV:

first_name,last_name
Lanxiang,Hu
Qiyu,Li
Anze,Xie

Markdown:

# Document Analysis Results

## Metadata
- Pages: 1
- Author: Example Author

## Extracted Text
[Document text here]

## LLM Analysis
[Analysis results here]

Python API

from aigrok import PDFProcessor

# Initialize the processor
processor = PDFProcessor()

# Process a PDF file with LLM analysis
result = processor.process_file(
    "path/to/your/document.pdf",
    prompt="Extract the author names",
    model="llama3.2-vision:11b"
)

if result.success:
    # Access LLM's analysis
    print(result.llm_response)
    
    # Access other data
    print(result.text)
    print(result.metadata)
    print(f"Document has {result.page_count} pages")
else:
    print(f"Error: {result.error}")

Development

Running Tests

pytest

Project Structure

aigrok/ - Main package directory
- pdf_processor.py - PDF processing with LLM integration
- cli.py - Command-line interface
tests/ - Test files
requirements.txt - Project dependencies
setup.py - Package installation configuration
pyproject.toml - Build system requirements

Contributing

We welcome contributions! Whether you want to fix a bug, add a feature, or improve documentation, please feel free to:

Fork the repository
Create a feature branch
Make your changes
Submit a pull request

And remember, as a wise Martian once said, "thou art god" (that's more Heinlein humor for you).

Postscript

90% of this project was written by AI using Cursor and Claude.

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language

Release history Release notifications | RSS feed

0.3.3

Dec 21, 2024

0.3.2

Dec 21, 2024

0.3.1

Dec 16, 2024

0.2.6

Dec 15, 2024

0.2.2

Dec 14, 2024

This version

0.2.1

Dec 13, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aigrok-0.2.1.tar.gz (1.2 MB view details)

Uploaded Dec 13, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

aigrok-0.2.1-py3-none-any.whl (14.1 kB view details)

Uploaded Dec 13, 2024 Python 3

File details

Details for the file aigrok-0.2.1.tar.gz.

File metadata

Download URL: aigrok-0.2.1.tar.gz
Upload date: Dec 13, 2024
Size: 1.2 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.12.7

File hashes

Hashes for aigrok-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`805219ad52e742921c2536f21455d31b50870ba40f5cfb294f110ebfe7cd030b`
MD5	`bf4c9a562fbfd992e0138da4cb9bdcbf`
BLAKE2b-256	`1f6300647b0ff52d837b1f9e02e4827bc95aa01f76e90048a0c23d830c9edc4e`

See more details on using hashes here.

File details

Details for the file aigrok-0.2.1-py3-none-any.whl.

File metadata

Download URL: aigrok-0.2.1-py3-none-any.whl
Upload date: Dec 13, 2024
Size: 14.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.12.7

File hashes

Hashes for aigrok-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2018b3d2604f1ef7a52a9fe12b46838d8a98d3a6ba0a1688de6fedc0483351c0`
MD5	`75c0516133529f9520a7db11e28d40ed`
BLAKE2b-256	`ec8e21ce364becf90c2eafe729537f878a8108d72f0d08f8c657fba65d1008ee`

See more details on using hashes here.

aigrok 0.2.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

aigrok

Why "grok"?

Installation

Using pip

From Source

Usage

Command Line Interface

Output Formats

Python API

Development

Running Tests

Project Structure

Contributing

Postscript

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes