Skip to main content

Analyst-friendly scrapers with TUI and uvx support

Project description

Inklog

Inklog is a collection of uvx-friendly scrapers for analysts. It provides a CLI for automation and a TUI (work in progress) for browsing available scrapers. The current focus is downloading files (PDF/EPUB). Webpage-to-markdown scraping is planned but not implemented yet.

All dates in this repository use the ISO8601 format: YYYY-MM-DD.

Quick start

Run a scraper (CLI)

uvx inklog run malaysia_parliament 2025-11-01 2025-11-30

List scrapers

uvx inklog list

Show example URLs

uvx inklog info malaysia_parliament

Run the TUI

uvx inklog

Run a single document type:

uvx inklog run malaysia_parliament:lower_house_hansard 2025-11-01 2025-11-30

TUI

The TUI is a work in progress. It shows the available scrapers, their descriptions, and lets you run a scraper with date ranges and options. Each Malaysia Parliament document type appears as its own row for quick runs.

Key bindings:

  • Enter or Run button: run the selected scraper
  • Example URLs can be opened from the details pane
  • Q: quit

Malaysia Parliament scraper

This scraper downloads PDFs from the Malaysia Parliament site for the following document types:

  • Dewan Rakyat - Jawapan Lisan (Oral Answers)
  • Dewan Rakyat - Jawapan Bukan Lisan (Non-Oral Answers)
  • Dewan Rakyat - Penyata Rasmi (Official Report)
  • Dewan Negara - Jawapan Bertulis (Written Answers)

Filename rules

Some sources change their filename patterns over the years. Inklog models this using date-bound filename rules, so each document type can define a list of templates tied to start/end dates. The Malaysia Parliament scraper ships with the current pattern and a rules table that can be extended when historical patterns are known.

CLI options

inklog run supports overriding boolean options with --set:

uvx inklog run malaysia_parliament 2025-11-01 2025-11-30 \
  --set jawapan_lisan_rakyat=true \
  --set jawapan_bukan_lisan_rakyat=true \
  --set penyata_rasmi_rakyat=true \
  --set jawapan_bertulis_negara=false

Adding a new scraper

  1. Copy src/inklog/scrapers/template.py and rename it.
  2. Update ScraperMeta and implement run().
  3. Ensure the module exports SCRAPER.

Scrapers are auto-discovered from the inklog.scrapers package.

Development

uv sync --extra dev
uv run ruff check --fix .
uv run ruff format .
uv run pytest

Roadmap

  • Add markdown scraping mode (TBD).
  • Expand filename rules for historical Malaysia Parliament patterns.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

inklog-0.1.1.tar.gz (31.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

inklog-0.1.1-py3-none-any.whl (23.5 kB view details)

Uploaded Python 3

File details

Details for the file inklog-0.1.1.tar.gz.

File metadata

  • Download URL: inklog-0.1.1.tar.gz
  • Upload date:
  • Size: 31.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.28 {"installer":{"name":"uv","version":"0.9.28","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for inklog-0.1.1.tar.gz
Algorithm Hash digest
SHA256 98a591b443123e2bcf4215d4876bc70c00e83815d6955e01946fb3531bbc1b76
MD5 fa489504ad515f4f361c0687c308f602
BLAKE2b-256 bc92f8adb18692886c96756f2315d784f055d6dd8ea3a015d888714b2e62fbe3

See more details on using hashes here.

File details

Details for the file inklog-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: inklog-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 23.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.28 {"installer":{"name":"uv","version":"0.9.28","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for inklog-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d4ebc739c863dd3f139e92e140c3eebd11af66ed16264d8fc49a7d550623715e
MD5 6b1e4fe5d7fbab2b69643a2d9dc5f484
BLAKE2b-256 aabbca7196c1fdd83309e07e45afae99396a3f84fe9b153390b41f54d67ae474

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page