Skip to main content

Analyst-friendly scrapers with TUI and uvx support

Project description

Inklog

Inklog is a collection of uvx-friendly scrapers for analysts. It provides a TUI for browsing available scrapers and a CLI for automation. The current focus is downloading files (PDF/EPUB). Webpage-to-markdown scraping is planned but not implemented yet.

All dates in this repository use the ISO8601 format: YYYY-MM-DD.

Quick start

Run the TUI

uvx inklog

List scrapers

uvx inklog list

Show example URLs

uvx inklog info malaysia_parliament

Run a scraper (CLI)

uvx inklog run malaysia_parliament 2025-11-01 2025-11-30

Run a single document type:

uvx inklog run malaysia_parliament:lower_house_hansard 2025-11-01 2025-11-30

TUI

The TUI shows the available scrapers, their descriptions, and lets you run a scraper with date ranges and options. Each Malaysia Parliament document type appears as its own row for quick runs.

Key bindings:

  • Enter or Run button: run the selected scraper
  • Example URLs can be opened from the details pane
  • Q: quit

Malaysia Parliament scraper

This scraper downloads PDFs from the Malaysia Parliament site for the following document types:

  • Dewan Rakyat - Jawapan Lisan (Oral Answers)
  • Dewan Rakyat - Jawapan Bukan Lisan (Non-Oral Answers)
  • Dewan Rakyat - Penyata Rasmi (Official Report)
  • Dewan Negara - Jawapan Bertulis (Written Answers)

Filename rules

Some sources change their filename patterns over the years. Inklog models this using date-bound filename rules, so each document type can define a list of templates tied to start/end dates. The Malaysia Parliament scraper ships with the current pattern and a rules table that can be extended when historical patterns are known.

CLI options

inklog run supports overriding boolean options with --set:

uvx inklog run malaysia_parliament 2025-11-01 2025-11-30 \
  --set jawapan_lisan_rakyat=true \
  --set jawapan_bukan_lisan_rakyat=true \
  --set penyata_rasmi_rakyat=true \
  --set jawapan_bertulis_negara=false

Adding a new scraper

  1. Copy src/inklog/scrapers/template.py and rename it.
  2. Update ScraperMeta and implement run().
  3. Ensure the module exports SCRAPER.

Scrapers are auto-discovered from the inklog.scrapers package.

Development

uv sync --extra dev
uv run ruff check --fix .
uv run ruff format .
uv run pytest

Roadmap

  • Add markdown scraping mode (TBD).
  • Expand filename rules for historical Malaysia Parliament patterns.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

inklog-0.1.0.tar.gz (25.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

inklog-0.1.0-py3-none-any.whl (18.0 kB view details)

Uploaded Python 3

File details

Details for the file inklog-0.1.0.tar.gz.

File metadata

  • Download URL: inklog-0.1.0.tar.gz
  • Upload date:
  • Size: 25.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.28 {"installer":{"name":"uv","version":"0.9.28","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for inklog-0.1.0.tar.gz
Algorithm Hash digest
SHA256 621be75131932378f4005873cdd4377d2f9c778207a0d5094bd7862b874dd6d6
MD5 b0be4b770e68854e596e4060c887c2ca
BLAKE2b-256 3173dfaf18bea9931b1ce7fc09ad2b7ef59d3367008b06724696480df6856936

See more details on using hashes here.

File details

Details for the file inklog-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: inklog-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 18.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.28 {"installer":{"name":"uv","version":"0.9.28","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for inklog-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 94beb0bb8097c00507df4c1f12a8b7757d3461322ecc3b0f83932eb1b4a5adb5
MD5 d405608b16a6b5e47c7943e8f8ca841a
BLAKE2b-256 9af23c1391c306248142cda6e1742823319c0bc9e855f7366f8d1bea5e7980ae

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page