Skip to main content

Analyst-friendly scrapers with TUI and uvx support

Project description

Inklog

Inklog is a collection of uvx-friendly scrapers for analysts. It provides a CLI for automation and a TUI (work in progress) for browsing available scrapers. The current focus is downloading files (PDF/EPUB). Webpage-to-markdown scraping is planned but not implemented yet.

All dates in this repository use the ISO8601 format: YYYY-MM-DD.

Quick start

Run a scraper (CLI)

uvx inklog run malaysia_parliament 2025-11-01 2025-11-30

List scrapers

uvx inklog list

Show example URLs

uvx inklog info malaysia_parliament

Run the TUI

uvx inklog

Run a single document type:

uvx inklog run malaysia_parliament:lower_house_hansard 2025-11-01 2025-11-30

TUI

The TUI is a work in progress. It shows the available scrapers, their descriptions, and lets you run a scraper with date ranges and options. Each Malaysia Parliament document type appears as its own row for quick runs.

Key bindings:

  • Enter or Run button: run the selected scraper
  • Example URLs can be opened from the details pane
  • Q: quit

Malaysia Parliament scraper

This scraper downloads PDFs from the Malaysia Parliament site for the following document types:

  • Dewan Rakyat - Jawapan Lisan (Oral Answers)
  • Dewan Rakyat - Jawapan Bukan Lisan (Non-Oral Answers)
  • Dewan Rakyat - Penyata Rasmi (Official Report)
  • Dewan Negara - Jawapan Bertulis (Written Answers)

Filename rules

Some sources change their filename patterns over the years. Inklog models this using date-bound filename rules, so each document type can define a list of templates tied to start/end dates. The Malaysia Parliament scraper ships with the current pattern and a rules table that can be extended when historical patterns are known.

CLI options

inklog run supports overriding boolean options with --set:

uvx inklog run malaysia_parliament 2025-11-01 2025-11-30 \
  --set jawapan_lisan_rakyat=true \
  --set jawapan_bukan_lisan_rakyat=true \
  --set penyata_rasmi_rakyat=true \
  --set jawapan_bertulis_negara=false

Adding a new scraper

  1. Copy src/inklog/scrapers/template.py and rename it.
  2. Update ScraperMeta and implement run().
  3. Ensure the module exports SCRAPER.

Scrapers are auto-discovered from the inklog.scrapers package.

Development

uv sync --extra dev
uv run ruff check --fix .
uv run ruff format .
uv run pytest

Roadmap

  • Add markdown scraping mode (TBD).
  • Expand filename rules for historical Malaysia Parliament patterns.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

inklog-0.1.2.tar.gz (31.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

inklog-0.1.2-py3-none-any.whl (23.6 kB view details)

Uploaded Python 3

File details

Details for the file inklog-0.1.2.tar.gz.

File metadata

  • Download URL: inklog-0.1.2.tar.gz
  • Upload date:
  • Size: 31.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.28 {"installer":{"name":"uv","version":"0.9.28","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for inklog-0.1.2.tar.gz
Algorithm Hash digest
SHA256 f2caa35307f1d5e8f757d1a7b69965ec71de5b11c77b9746d92f840725a92c6c
MD5 0e5122ae5c59d8d93cb7b72e7bf7d8d4
BLAKE2b-256 a274ca337f487eec11a254fb7495c6707b46938298ff5c7d2d4b60e218783372

See more details on using hashes here.

File details

Details for the file inklog-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: inklog-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 23.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.28 {"installer":{"name":"uv","version":"0.9.28","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for inklog-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 2b05ae085aa0a5849101eababdd50089c63e383df52c99fb3885a04c9284e4c7
MD5 d1edf4e2ce9ab9944ea5bddd1bd0b649
BLAKE2b-256 ff026ad7bff812eb7b8cf20c399f7137fe06b963736440414836cd8856341c9f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page