PDF metadata extraction CLI using PyExifTool
Project description
fileKor
Local metadata engine that extracts, summarizes, classifies, and tags files using taxonomy-based labeling.
Quick Start
# Install uv
winget install astral-sh.uv
git clone filekor
cd filekor
# Setup
uv venv
# Windows
.venv\Scripts\activate
# MacOS/Linux
source venv/bin/activate
uv pip install -e .
# CLI Usage
filekor extract documento.pdf
filekor sidecar documento.pdf
filekor sidecar ./documentos --dir # Process directory (generates merged.kor by default)
filekor sidecar ./documentos --dir --no-merge # Generate individual .kor files
filekor sidecar ./documentos --dir --db # Use database to regenerate when available
filekor labels documento.pdf
filekor sync documento.kor # Sync existing .kor to database
filekor merge ./directorio # Merge multiple .kor files
filekor delete --path ./doc.pdf # Delete by path
filekor delete --sha <hash> # Delete by SHA256
Library Usage
filekor can be used as a Python library for database-backed queries and search:
from filekor.db import get_db, sync_file, search_files
# Get database instance (lazy singleton)
db = get_db()
# Sync a .kor file to the database
sync_file("./documento.kor")
# Search files by labels and content with scoring
results = search_files(
labels=["finance", "2026"],
query="budget report"
)
# Returns ranked results with relevance scores
Enable auto-sync in config.yaml to automatically update the database when using CLI commands.
Features
Core Features
- Metadata Extraction - Extract metadata from PDF, TXT, MD files using PyExifTool
- Text Extraction - Extract and summarize text content from supported files
- Sidecar Generation - Generate YAML sidecar files (.kor) with full metadata
- Taxonomy Labels - LLM-based classification with custom taxonomy support
LLM Providers
- Google Gemini - Native Gemini API support
- OpenAI - GPT-4o, GPT-4o-mini support
- Groq - Fast inference with Llama models
- OpenRouter - Access to 200+ free models
- Mock Provider - Testing without API calls
Database & Search
- SQLite Database - Index all .kor metadata
- Full-Text Search - FTS5 for fast filename/metadata search
- Multi-Label Search - OR logic for filtering by multiple labels
- Relevance Scoring - Configurable weights for search ranking
- Auto-Sync - Automatic database updates from CLI
Interfaces
- CLI - Complete command-line interface
- Library API - Python API for integration
- 100 Tests - Comprehensive test coverage
Documentation
| Guide | Description |
|---|---|
| Installation | Setup and installation |
| Usage | CLI commands reference |
| Library | Python Library API with code examples |
| Taxonomy | Labels and taxonomy configuration |
| LLM | LLM provider setup (Gemini, OpenAI, Groq, OpenRouter) |
| Development | Development and testing guide |
Project Structure
fileKor/
├── src/filekor/ # Source code
│ ├── cli.py # CLI interface
│ ├── db.py # Database module (SQLite)
│ ├── models.py # Database models
│ ├── sidecar.py # Sidecar model
│ ├── labels.py # Labels module
│ └── llm.py # LLM providers
├── docs/ # Documentation
├── test-files/ # Test files
├── tests/ # Test suite
└── README.md
License
Apache License 2.0 - See LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file filekor-0.1.1.tar.gz.
File metadata
- Download URL: filekor-0.1.1.tar.gz
- Upload date:
- Size: 50.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bdbddc265963d9c9e253586f1953de87b9e88f3aa2a0050d6c862c908f8b6ff8
|
|
| MD5 |
e92588ad61ca110183642ecc5bb650b4
|
|
| BLAKE2b-256 |
2e569f45cf20e59e7a1b65998f065b23c35ee5348ac086d40f8856048648126f
|
File details
Details for the file filekor-0.1.1-py3-none-any.whl.
File metadata
- Download URL: filekor-0.1.1-py3-none-any.whl
- Upload date:
- Size: 50.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9151d9e77debf78c327eeaed4d3fab9a37bf3becdbd878702944a6afd4e61582
|
|
| MD5 |
a2254397c73c24a2e4a9908f2bf48b3a
|
|
| BLAKE2b-256 |
618c9453bd8ae73855727debb6b34a9a14262e1aae2f69842a94c0d4817c4c90
|