Analyst-friendly scrapers with TUI and uvx support
Project description
Inklog
Inklog is a collection of uvx-friendly scrapers for analysts. It provides a CLI for automation and a TUI (work in progress) for browsing available scrapers. The current focus is downloading files (PDF/EPUB). Webpage-to-markdown scraping is planned but not implemented yet.
All dates in this repository use the ISO8601 format: YYYY-MM-DD.
Quick start
Run a scraper (CLI)
uvx inklog run malaysia_parliament 2025-11-01 2025-11-30
List scrapers
uvx inklog list
Show example URLs
uvx inklog info malaysia_parliament
Run the TUI
uvx inklog
Run a single document type:
uvx inklog run malaysia_parliament:lower_house_hansard 2025-11-01 2025-11-30
TUI
The TUI is a work in progress. It shows the available scrapers, their descriptions, and lets you run a scraper with date ranges and options. Each Malaysia Parliament document type appears as its own row for quick runs.
Key bindings:
EnterorRunbutton: run the selected scraper- Example URLs can be opened from the details pane
Q: quit
Malaysia Parliament scraper
This scraper downloads PDFs from the Malaysia Parliament site for the following document types:
- Dewan Rakyat - Jawapan Lisan (Oral Answers)
- Dewan Rakyat - Jawapan Bukan Lisan (Non-Oral Answers)
- Dewan Rakyat - Penyata Rasmi (Official Report)
- Dewan Negara - Jawapan Bertulis (Written Answers)
Filename rules
Some sources change their filename patterns over the years. Inklog models this using date-bound filename rules, so each document type can define a list of templates tied to start/end dates. The Malaysia Parliament scraper ships with the current pattern and a rules table that can be extended when historical patterns are known.
CLI options
inklog run supports overriding boolean options with --set:
uvx inklog run malaysia_parliament 2025-11-01 2025-11-30 \
--set jawapan_lisan_rakyat=true \
--set jawapan_bukan_lisan_rakyat=true \
--set penyata_rasmi_rakyat=true \
--set jawapan_bertulis_negara=false
Adding a new scraper
- Copy
src/inklog/scrapers/template.pyand rename it. - Update
ScraperMetaand implementrun(). - Ensure the module exports
SCRAPER.
Scrapers are auto-discovered from the inklog.scrapers package.
Development
uv sync --extra dev
uv run ruff check --fix .
uv run ruff format .
uv run pytest
Roadmap
- Add markdown scraping mode (TBD).
- Expand filename rules for historical Malaysia Parliament patterns.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file inklog-0.1.2.tar.gz.
File metadata
- Download URL: inklog-0.1.2.tar.gz
- Upload date:
- Size: 31.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.28 {"installer":{"name":"uv","version":"0.9.28","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f2caa35307f1d5e8f757d1a7b69965ec71de5b11c77b9746d92f840725a92c6c
|
|
| MD5 |
0e5122ae5c59d8d93cb7b72e7bf7d8d4
|
|
| BLAKE2b-256 |
a274ca337f487eec11a254fb7495c6707b46938298ff5c7d2d4b60e218783372
|
File details
Details for the file inklog-0.1.2-py3-none-any.whl.
File metadata
- Download URL: inklog-0.1.2-py3-none-any.whl
- Upload date:
- Size: 23.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.28 {"installer":{"name":"uv","version":"0.9.28","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2b05ae085aa0a5849101eababdd50089c63e383df52c99fb3885a04c9284e4c7
|
|
| MD5 |
d1edf4e2ce9ab9944ea5bddd1bd0b649
|
|
| BLAKE2b-256 |
ff026ad7bff812eb7b8cf20c399f7137fe06b963736440414836cd8856341c9f
|