Skip to main content

A Python library for crawling and retrieving all notices published under Japan’s Furikome Sagi Relief Act, with support for both full data extraction and incremental updates.

Project description

sagikoza

PyPI - Version

A Python library for automatically crawling and retrieving all public notices under Japan’s Furikome Sagi Relief Act. Supports both full and incremental data extraction, returning results as a list of dictionaries.

日本語の説明はこちらを参照して下さい


Features

  • Automatically retrieves public notices under the Furikome Sagi Relief Act
  • Supports fetching by year or for the latest 3 months
  • Incremental (diff) data retrieval
  • Returns data as a list of dictionaries

Supported Environments

  • Python 3.8 or later

Installation

Install from PyPI:

python -m pip install sagikoza

Latest from GitHub:

git clone https://github.com/new-village/sagikoza
cd sagikoza
python setup.py install

Usage

Fetch notices for a specific year

Retrieve notices published since 2008 for a given year (e.g., '2025').

import sagikoza
accounts = sagikoza.fetch('2025')
print(accounts)
# [{'doc_id': '12345', 'link': '/pubs_basic_frame.php?...', 'id': '...', ...}, ...]

Fetch notices for the last 3 months

Call without arguments to get notices from the latest 3 months.

import sagikoza
accounts = sagikoza.fetch()
print(accounts)

Save data example

Save the retrieved data in Parquet format.

import pandas as pd
import sagikoza
accounts = sagikoza.fetch()
df = pd.DataFrame(accounts)
df.to_parquet('accounts.parquet', index=False)

Function Specification

  • fetch(year: str = "near3") -> list[dict]
    • Specify a year (YYYY) or "near3" for the latest 3 months
    • Raises an exception on failure

Internal Workflow

  1. Fetch notice list (POST: sel_pubs.php)
  2. Fetch notice details (POST: pubs_dispatcher.php)
  3. Fetch basic info (GET: pubs_basic_frame.php)
  4. Fetch account details (POST: k_pubstype_00_detail.php, etc.)

Parameters required for each step are extracted from the HTML and used for subsequent page transitions.

Logging

Uses Python's standard logging module. For detailed logs:

import logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s %(levelname)s %(name)s %(message)s')
import sagikoza
sagikoza.fetch()

By default, only WARNING and above are shown. For more detail, set level=logging.DEBUG.

Error Handling

  • Network, HTTP, and timeout errors raise a FetchError exception
  • If no records are found, a WARNING log is output

Notes

  • This library retrieves data from public sources. Changes to the source website may affect functionality
  • Accuracy and completeness of retrieved data are not guaranteed. Please use together with official information

License

Apache License 2.0

  • BeautifulSoup (MIT License)

Contribution

Bug reports, feature requests, and pull requests are welcome. Please use GitHub Issues or Pull Requests.

Reference

Page Flow

The web pages to be scraped cannot be accessed directly by URL, but can be transitioned to the next page by making a POST request with a combination of parameters hidden within the page. Note: pubs_basic_frame.php can exceptionally be accessed via GET.

The web page contents can be obtained by accessing file using methods and payload. The contents include the payload's value, which is required for accessing other pages, in an element of parameters, which can be found using a selector.

category file method payload selector parameters
notices sel_pubs.php POST {"search_term": "near3", "search_no": "none", "search_pubs_type": "none", "sort_id": "5"} table.sel_pubs_list > tbody > input <input type="hidden" name="doc_id" value="15362">
submits pubs_dispatcher.php POST {"head_line": "", "doc_id": "15362"} table:nth-child(9) > tbody > tr > td.6 > a <a href="./pubs_basic_frame.php?inst_code=0153&amp;p_id=05&amp;pn=365597&amp;re=0">(別添)</a>
subjects pubs_basic_frame.php GET inst_code=0153&p_id=05&pn=365597&re=0 table:nth-child(12) > tbody > tr > td:nth-child(1) > input[type=submit] <form method="POST" name="list_form" action="./k_pubstype_04_detail.php" target="_blank"></form><br><input type="submit" name="r_no" value=" 2420-0153-0007 ">
accounts k_pubstype_00_detail.php POST {"r_no":"+2420-0153-0007+", "pn": "365597", "r_no": "2420-0153-0007", "p_id": "05", "re": "0", "referer": "0"}

Note

  • 支払手続終了(支払い申請がない)
  • 支払手続終了(支払該当者決定を受けた者がない)
  • 支払手続終了(被害回復分配金のすべての支払い等)
  • 被害回復分配金が支払われない(債権額が1,000円未満)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sagikoza-2.1.2.tar.gz (24.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sagikoza-2.1.2-py3-none-any.whl (21.9 kB view details)

Uploaded Python 3

File details

Details for the file sagikoza-2.1.2.tar.gz.

File metadata

  • Download URL: sagikoza-2.1.2.tar.gz
  • Upload date:
  • Size: 24.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.18

File hashes

Hashes for sagikoza-2.1.2.tar.gz
Algorithm Hash digest
SHA256 2140eee92fab2f3417a75c9d23a4a023a80d8a9afbb9cf0b09d66d3db11a44dd
MD5 c34453445ef4fcbc88dae54cfa252668
BLAKE2b-256 bad8c76d0ae22669378db509e2d3cde16a009b0ee47f74422c30f1936d39dea9

See more details on using hashes here.

File details

Details for the file sagikoza-2.1.2-py3-none-any.whl.

File metadata

  • Download URL: sagikoza-2.1.2-py3-none-any.whl
  • Upload date:
  • Size: 21.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.18

File hashes

Hashes for sagikoza-2.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 cc42bbaa35e407b0246809e13bf926d6d382557e8e77490ce930372a7df40ab9
MD5 d5e03f40bd39a81ffd6dcb6033dd8db3
BLAKE2b-256 1d81af85b49cf9a6ab9a449c212fa06444282498d31183e18eb62ede08004010

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page