A Python package for extracting TikZ picture environments from codebases and generating AI context files
Project description
TikZ Extractor CLI
A Python package for extracting TikZ picture environments from codebases and generating AI context files. This tool recursively scans directories, extracts TikZ blocks from various file types, saves them as individual .tex files, and creates a consolidated context file for AI model consumption.
Features
- 🔍 Recursive Directory Scanning: Automatically finds TikZ blocks in specified file types
- 📄 Multiple File Format Support: Works with
.tex,.md,.py, and other text files - 🎯 Smart Extraction: Uses robust regex patterns to extract complete TikZ picture environments
- 📁 Organized Output: Saves each TikZ block as a separate
.texfile with descriptive names - 🤖 AI Context Generation: Creates a consolidated file with all extracted blocks for LLM consumption
- 🖥️ CLI Interface: Easy-to-use command-line interface with flexible options
- 📦 Python Module: Importable package for programmatic usage
- 🔧 Dry Run Mode: Preview extraction results without writing files
- 📝 Verbose Logging: Detailed processing information when needed
Installation
Using pip
pip install tikz-extractor
Using Poetry
poetry add tikz-extractor
Development Installation
git clone https://github.com/vaibhavblayer/tikz-extractor.git
cd tikz-extractor
poetry install --with dev
Usage
Command Line Interface
Basic Usage
Extract TikZ blocks from the current directory:
tikz-extract
Advanced Usage
# Specify source and output directories
tikz-extract --src ./documents --out ./extracted_tikz
# Process specific file types
tikz-extract --ext .tex,.md,.rst
# Generate custom AI context file
tikz-extract --ai-file my_tikz_context.txt
# Dry run to preview results
tikz-extract --dry-run --verbose
# Full example with all options
tikz-extract \
--src ./latex_project \
--out ./tikz_output \
--ext .tex,.md \
--ai-file tikz_for_ai.txt \
--verbose
CLI Parameters
| Parameter | Short | Default | Description |
|---|---|---|---|
--src |
-s |
. |
Source directory to scan recursively |
--out |
-o |
tikz |
Output directory for extracted .tex files |
--ext |
-e |
.tex,.md,.py |
Comma-separated list of file extensions to process |
--ai-file |
-a |
ai_context.txt |
Path for the AI context file |
--dry-run |
-d |
False |
Preview mode - show results without writing files |
--verbose |
-v |
False |
Enable detailed logging output |
--help |
-h |
- | Show help message and exit |
Python Module Interface
Basic Programmatic Usage
from tikz_extractor import extractor
from pathlib import Path
# Extract TikZ blocks from a directory
src_dir = Path("./documents")
out_dir = Path("./extracted")
extensions = [".tex", ".md"]
metadata = extractor.extract_from_directory(src_dir, out_dir, extensions)
# Process the results
for block_info in metadata:
print(f"Extracted from: {block_info['source']}")
print(f"Saved to: {block_info['out_path']}")
print(f"Block index: {block_info['index']}")
Advanced Programmatic Usage
from tikz_extractor.extractor import (
find_files,
extract_tikz_from_text,
write_extracted_blocks,
build_ai_context
)
from pathlib import Path
# Step-by-step extraction process
src_dir = Path("./my_project")
out_dir = Path("./tikz_blocks")
extensions = [".tex", ".md"]
# 1. Find all relevant files
files = find_files(src_dir, extensions)
print(f"Found {len(files)} files to process")
# 2. Process each file
all_metadata = []
for file_path in files:
try:
content = file_path.read_text(encoding='utf-8')
tikz_blocks = extract_tikz_from_text(content)
if tikz_blocks:
metadata = write_extracted_blocks(tikz_blocks, file_path, out_dir)
all_metadata.extend(metadata)
print(f"Extracted {len(tikz_blocks)} blocks from {file_path}")
except Exception as e:
print(f"Error processing {file_path}: {e}")
# 3. Generate AI context file
if all_metadata:
ai_file = Path("tikz_context.txt")
build_ai_context(all_metadata, ai_file)
print(f"AI context file created: {ai_file}")
Individual Function Usage
from tikz_extractor.extractor import extract_tikz_from_text, sanitize_name
from pathlib import Path
# Extract TikZ blocks from text content
latex_content = """
Some text here...
\\begin{tikzpicture}
\\draw (0,0) -- (1,1);
\\end{tikzpicture}
More content...
"""
blocks = extract_tikz_from_text(latex_content)
print(f"Found {len(blocks)} TikZ blocks")
# Generate safe filenames
path = Path("src/diagrams/flow_chart.tex")
safe_name = sanitize_name(path)
print(f"Safe filename: {safe_name}") # Output: src__diagrams__flow_chart.tex
Output Format
Extracted Files
Each TikZ block is saved as a separate .tex file with the naming pattern:
{sanitized_source_path}__tikz{index}.tex
Examples:
src/diagrams/network.tex→src__diagrams__network.tex__tikz1.texdocs/README.md→docs__README.md__tikz1.tex
AI Context File
The AI context file contains all extracted TikZ blocks with structured headers:
### Source: src/diagrams/network.tex
### Snippet: src__diagrams__network.tex__tikz1.tex
\begin{tikzpicture}
\node (A) at (0,0) {Start};
\node (B) at (2,0) {End};
\draw[->] (A) -- (B);
\end{tikzpicture}
---
### Source: docs/README.md
### Snippet: docs__README.md__tikz1.tex
\begin{tikzpicture}
\draw (0,0) circle (1);
\end{tikzpicture}
---
Examples
Example 1: LaTeX Project
# Extract from a LaTeX thesis project
tikz-extract \
--src ./thesis \
--out ./thesis_tikz \
--ext .tex \
--ai-file thesis_diagrams.txt \
--verbose
Example 2: Documentation with Embedded TikZ
# Process Markdown documentation with TikZ diagrams
tikz-extract \
--src ./docs \
--ext .md,.rst \
--out ./doc_diagrams \
--ai-file documentation_tikz.txt
Example 3: Mixed Codebase
# Extract from various file types in a research project
tikz-extract \
--src ./research_project \
--ext .tex,.md,.py,.txt \
--out ./all_tikz \
--dry-run # Preview first
API Reference
Core Functions
extract_from_directory(src: Path, out_dir: Path, exts: List[str]) -> List[Dict]
Orchestrates the complete TikZ extraction workflow.
Parameters:
src: Source directory to scanout_dir: Output directory for extracted filesexts: List of file extensions to process
Returns: List of metadata dictionaries for each extracted block
find_files(src: Path, exts: List[str]) -> List[Path]
Recursively finds files with specified extensions.
Parameters:
src: Directory to searchexts: List of file extensions (with or without leading dots)
Returns: List of Path objects for matching files
extract_tikz_from_text(text: str) -> List[str]
Extracts TikZ picture environments from text content.
Parameters:
text: Text content to search
Returns: List of complete TikZ block strings
write_extracted_blocks(blocks: List[str], src_path: Path, out_dir: Path) -> List[Dict]
Writes TikZ blocks to individual files and generates metadata.
Parameters:
blocks: List of TikZ block stringssrc_path: Original source file pathout_dir: Output directory
Returns: List of metadata dictionaries
build_ai_context(metadata: List[Dict], ai_file: Path) -> None
Creates an AI context file with all extracted blocks.
Parameters:
metadata: List of block metadataai_file: Path for the output context file
Error Handling
The tool is designed to be robust and continue processing even when encountering issues:
- Unreadable files: Skipped with optional logging
- Encoding issues: Attempts UTF-8, skips on failure
- Permission errors: Skipped with warning messages
- Missing directories: Output directories are created automatically
- Malformed TikZ blocks: Extracts what's parseable
- Empty results: Informative message, graceful exit
Development
Setting up Development Environment
git clone https://github.com/vaibhavblayer/tikz-extractor.git
cd tikz-extractor
poetry install --with dev
Running Tests
# Run all tests
poetry run pytest
# Run with coverage
poetry run pytest --cov=tikz_extractor
# Run specific test file
poetry run pytest tests/test_extractor.py -v
Code Quality
# Format code
poetry run black tikz_extractor tests
# Sort imports
poetry run isort tikz_extractor tests
# Lint code
poetry run flake8 tikz_extractor tests
# Type checking
poetry run mypy tikz_extractor
Contributing
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Changelog
See CHANGELOG.md for version history and changes.
Support
- Issues: GitHub Issues
- Discussions: GitHub Discussions
Related Projects
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tikz_extractor-0.2.2.tar.gz.
File metadata
- Download URL: tikz_extractor-0.2.2.tar.gz
- Upload date:
- Size: 16.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.2.1 CPython/3.11.13 Linux/6.11.0-1018-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9da94117a352418654804ffebf32c0688d02939b616381b9838bf17309630edb
|
|
| MD5 |
f8a94167fa3a3df3492aca1e7e71e369
|
|
| BLAKE2b-256 |
023dfb11571edefd49910be7ea44d30367eb282dad2b3cb3cabab95a91c4eb7c
|
File details
Details for the file tikz_extractor-0.2.2-py3-none-any.whl.
File metadata
- Download URL: tikz_extractor-0.2.2-py3-none-any.whl
- Upload date:
- Size: 16.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.2.1 CPython/3.11.13 Linux/6.11.0-1018-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1d392e0c512ff40936b16f124d089bdf11d50d0d3ddd7b94ab2a52333b6b222d
|
|
| MD5 |
52bc6a062602165dcb87b569ce5a2839
|
|
| BLAKE2b-256 |
0506064c7b558a160668b073973ba398c92997f6b2ba0978196ef423fcd25be1
|