No project description provided

Project description

kabigon

A URL content loader library that extracts content from various sources (YouTube, Instagram Reels, Twitter/X, Reddit, Truth Social, GitHub files, PDFs, web pages) and converts them to text/markdown format.

Features

✨ Multi-Platform Support: YouTube, Twitter/X, Truth Social, Reddit, Instagram Reels, PTT, GitHub files, PDFs, and generic web pages

🔄 Async-First Design: Built with async/await for efficient parallel processing

🎯 Smart Fallback: Automatically tries multiple extraction strategies until one succeeds

🚀 Simple API: Single-line usage with sensible defaults, or full control with custom loader chains

🔌 Extensible: Easy to add new loaders for additional platforms

Installation
Usage
Supported Sources
Examples
Troubleshooting
Development
License

Installation

uv tool install kabigon
# or just
uvx kabigon <url>

# Install Playwright browsers
uvx playwright install chromium
# or
uvx playwright install chrome

Usage

CLI

uvx kabigon <url>

# Examples
uvx kabigon --list
uvx kabigon --loader youtube,playwright https://www.youtube.com/watch?v=dQw4w9WgXcQ
uvx kabigon --loader twitter https://x.com/elonmusk/status/123456789
uvx kabigon https://www.youtube.com/watch?v=dQw4w9WgXcQ
uvx kabigon https://truthsocial.com/@realDonaldTrump/posts/123456
uvx kabigon https://reddit.com/r/python/comments/xyz/...
uvx kabigon https://github.com/anthropics/claude-code/blob/main/plugins/ralph-wiggum/README.md
uvx kabigon https://example.com/document.pdf

Python API - Sync

import kabigon

url = "https://www.google.com.tw"

# Simplest usage - automatically uses the best loader
content = kabigon.load_url_sync(url)
print(content)

# Or use specific loader
content = kabigon.PlaywrightLoader().load_sync(url)
print(content)

# With multiple loaders (tries each in order)
loader = kabigon.Compose([
    kabigon.TwitterLoader(),
    kabigon.TruthSocialLoader(),
    kabigon.YoutubeLoader(),
    kabigon.RedditLoader(),
    kabigon.PDFLoader(),
    kabigon.PlaywrightLoader(),  # Fallback for generic URLs
])
content = loader.load_sync(url)
print(content)

Python API - Async

import asyncio
import kabigon

async def main():
    url = "https://www.google.com.tw"

    # Simplest usage - automatically uses the best loader
    content = await kabigon.load_url(url)
    print(content)

    # Or use specific loader
    loader = kabigon.PlaywrightLoader()
    content = await loader.load(url)
    print(content)

    # Batch processing multiple URLs in parallel
    urls = [
        "https://x.com/user1/status/123",
        "https://truthsocial.com/@user/posts/456",
        "https://youtube.com/watch?v=abc",
        "https://reddit.com/r/python/comments/xyz",
    ]

    loader = kabigon.Compose([
        kabigon.TwitterLoader(),
        kabigon.TruthSocialLoader(),
        kabigon.YoutubeLoader(),
        kabigon.RedditLoader(),
        kabigon.PlaywrightLoader(),
    ])

    # Parallel processing with automatic loader selection
    results = await asyncio.gather(*[kabigon.load_url(url) for url in urls])
    for url, content in zip(urls, results):
        print(f"{url}: {len(content)} chars")

asyncio.run(main())

API Comparison

Usage	Simplest	Custom Loader Chain
Sync	`kabigon.load_url_sync(url)`	`loader.load_sync(url)`
Async	`await kabigon.load_url(url)`	`await loader.load(url)`
Batch Async	`await asyncio.gather(*[kabigon.load_url(url) for url in urls])`	`await asyncio.gather(*[loader.load(url) for url in urls])`

Supported Sources

Source	Loader	Description
YouTube	`YoutubeLoader`	Extracts video transcripts
YouTube	`YoutubeYtdlpLoader`	Audio transcription via yt-dlp + Whisper
Twitter/X	`TwitterLoader`	Extracts tweet content
Truth Social	`TruthSocialLoader`	Extracts Truth Social posts
Reddit	`RedditLoader`	Extracts Reddit posts and comments
Instagram Reels	`ReelLoader`	Audio transcription + metadata
GitHub	`GitHubLoader`	Fetches GitHub web pages and file content (supports repo URLs + `github.com/.../blob/...`)
BBC	`BBCLoader`	BBC article extraction with article-aware parsing
CNN	`CNNLoader`	CNN article extraction with article-aware parsing
PDF	`PDFLoader`	Extracts text from PDF files (URL or local)
PTT	`PttLoader`	Taiwan PTT forum posts
Generic Web	`PlaywrightLoader`	Browser-based scraping for any website
Generic Web	`HttpxLoader`	Simple HTTP requests with markdown conversion

Examples

See the examples/ directory for more usage examples:

simple_usage.py - Basic single-line usage
async_usage.py - Async usage and parallel batch processing
twitter.py - Twitter/X post extraction
truthsocial.py - Truth Social post extraction
read_reddit.py - Reddit post and comments extraction
ptt.py - PTT forum post extraction
fetch_billgertz_tweet.py - Real-world Twitter scraping example

Troubleshooting

Playwright browser not installed

Error: Executable doesn't exist at /path/to/chromium

Solution: Install Playwright browsers after installing kabigon:

playwright install chromium

FFmpeg not found (for audio transcription)

Error: ffmpeg not found

Solution: Install FFmpeg for your platform:

# Ubuntu/Debian
sudo apt-get install ffmpeg

# macOS
brew install ffmpeg

# Windows
# Download from https://ffmpeg.org/download.html

Or set custom FFmpeg path:

export FFMPEG_PATH=/path/to/ffmpeg

Timeout errors

Error: Timeout 30000ms exceeded

Solution: Increase timeout for slow-loading pages:

# Increase timeout to 60 seconds
loader = kabigon.PlaywrightLoader(timeout=60_000)
content = loader.load_sync(url)

CAPTCHA or rate limiting

Some websites may show CAPTCHAs or block automated access. For Reddit, kabigon automatically uses old.reddit.com to avoid CAPTCHAs. For other sites, you may need to:

Add delays between requests
Use a custom user agent
Implement retry logic with exponential backoff

Development

Setup

# Clone the repository
git clone https://github.com/narumiruna/kabigon.git
cd kabigon

# Install dependencies with uv
uv sync

# Install Playwright browsers
playwright install chromium

Testing

# Run all tests with coverage
uv run pytest -v -s --cov=src tests

# Run specific test file
uv run pytest -v -s tests/loaders/test_youtube.py

Current test coverage: 69% (37 tests passing)

Linting and Type Checking

# Run linter
uv run ruff check .

# Run type checker
uv run ty check .

# Auto-fix linting issues
uv run ruff check --fix .

# Format code
uv run ruff format .

Building and Publishing

# Build wheel
uv build -f wheel

# Publish to PyPI
uv publish

Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

When adding a new loader:

Create a new file in src/kabigon/loaders/
Inherit from the Loader base class
Implement async def load(url: str) -> str
Add domain validation
Add tests in tests/loaders/
Update documentation

See CLAUDE.md for detailed development guidelines.

License

MIT License - see LICENSE file for details.

Project details

Release history Release notifications | RSS feed

0.19.2

May 3, 2026

0.19.1

May 3, 2026

0.19.0

May 2, 2026

0.18.2

Apr 5, 2026

0.18.1

Apr 3, 2026

0.18.0

Mar 31, 2026

0.17.6

Mar 30, 2026

0.17.5

Mar 30, 2026

This version

0.17.4

Mar 23, 2026

0.17.3

Mar 17, 2026

0.17.2

Mar 17, 2026

0.17.1

Mar 17, 2026

0.17.0

Mar 17, 2026

0.16.4

Feb 12, 2026

0.16.3

Feb 12, 2026

0.16.2

Feb 12, 2026

0.16.1

Feb 5, 2026

0.16.0

Jan 25, 2026

0.15.0

Jan 21, 2026

0.14.3

Jan 15, 2026

0.14.2

Jan 15, 2026

0.14.1

Jan 12, 2026

0.14.0

Jan 4, 2026

0.13.0

Jan 3, 2026

0.12.0

Jan 3, 2026

0.11.0

Jan 3, 2026

0.10.1

Jan 2, 2026

0.10.0

Jan 2, 2026

0.9.4

Oct 29, 2025

0.9.3

Oct 29, 2025

0.9.2

Sep 30, 2025

0.8.15

Sep 9, 2025

0.8.14

Sep 1, 2025

0.8.13

Sep 1, 2025

0.8.12

Aug 26, 2025

0.8.11

Jul 17, 2025

0.8.10

May 13, 2025

0.8.9

May 13, 2025

0.8.8

May 12, 2025

0.8.7

May 10, 2025

0.8.6

May 8, 2025

0.8.5

May 6, 2025

0.8.4

May 3, 2025

0.8.3

May 3, 2025

0.8.2

May 3, 2025

0.8.1

May 1, 2025

0.8.0

May 1, 2025

0.7.0

May 1, 2025

0.6.1

May 1, 2025

0.6.0

Apr 28, 2025

0.5.3

Apr 8, 2025

0.5.2

Mar 23, 2025

0.5.1

Mar 23, 2025

0.5.0

Mar 22, 2025

0.4.2

Mar 21, 2025

0.4.1

Mar 15, 2025

0.4.0

Mar 10, 2025

0.3.1

Feb 17, 2025

0.3.0

Feb 11, 2025

0.2.3

Feb 9, 2025

0.2.2

Feb 9, 2025

0.2.1

Feb 9, 2025

0.2.0

Feb 9, 2025

0.1.0

Feb 9, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

kabigon-0.17.4-py3-none-any.whl (30.0 kB view details)

Uploaded Mar 23, 2026 Python 3

File details

Details for the file kabigon-0.17.4-py3-none-any.whl.

File metadata

Download URL: kabigon-0.17.4-py3-none-any.whl
Upload date: Mar 23, 2026
Size: 30.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.12 {"installer":{"name":"uv","version":"0.10.12","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for kabigon-0.17.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`dd39ba9eb135c59d44328a8b3f8a1350b7b682ba286d63758b88c67900cd49d9`
MD5	`543538eb330389269c5a9284075fda29`
BLAKE2b-256	`77693d5529e3d20be04504e8d99a8a83ba00324aee080adf25ad16a3dde35edb`

See more details on using hashes here.

kabigon 0.17.4

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

kabigon

Features

Table of Contents

Installation

Usage

CLI

Python API - Sync

Python API - Async

API Comparison

Supported Sources

Examples

Troubleshooting

Playwright browser not installed

FFmpeg not found (for audio transcription)

Timeout errors

CAPTCHA or rate limiting

Development

Setup

Testing

Linting and Type Checking

Building and Publishing

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes