Skip to main content

A Python library for crawling and retrieving all notices published under Japan’s Furikome Sagi Relief Act, with support for both full data extraction and incremental updates.

Project description

sagikoza

PyPI - Version

This is a Python library that automatically collects and obtains all public notices based on Japan's “Furikome Sagi Relief Act” You can obtain account information contained in public notices for the past three months or one year, in a registered format or standardized format.

日本語の説明はこちらを参照ください

Features

  • Fetching by year or for the latest 3 months
  • Parsing of collected data
  • Handling data by list of dictionary
  • Retry support when fetching fails
  • Ensure consistent ID assignment. This can support incremental updates.
  • Standardization of account names fileds and date fields (optional)

Supported Environments

  • Python 3.8 or later

Installation

Install from PyPI:

python -m pip install sagikoza

Latest from GitHub:

git clone https://github.com/new-village/sagikoza
cd sagikoza
python setup.py install

Usage

Fetch notices for a specific year.

Fetching notices for the year specified in the parameter. This parameter may be available after 2008.

import sagikoza
accounts = sagikoza.fetch('2025')
print(accounts[:5])
# [{'uid': 'd06beb...', 'bank_name': 'みずほ銀行', 'name': 'グエン テイ ホアイ ニエン', 'name_alias': 'NGUYEN THI HOAI NHIEN' ...}, ...] 

Fetch notices for the last 3 months

Fetching without arguments to get notices from the latest 3 months.

import sagikoza
accounts = sagikoza.fetch()
print(accounts)
# [{'uid': 'd06beb...', 'bank_name': 'みずほ銀行', 'name': 'グエン テイ ホアイ ニエン', 'name_alias': 'NGUYEN THI HOAI NHIEN' ...}, ...] 

Fetch raw data

If you want to fetch raw data before normalization, set the normalize parameter to False to skip the normalization process.

import sagikoza
accounts = sagikoza.fetch('near3', normalize=False)
print(accounts)
# [{'uid': 'd06beb...', 'bank_name': 'みずほ銀行', 'name': 'NGUYEN THI HOAI NHIEN (グエン テイ ホアイ ニエン)', ...}, ...] 

Save data example

I recommend you to use pandas's to_parquet, if you would like to save the data in local.

import pandas as pd
import sagikoza
accounts = sagikoza.fetch()
df = pd.DataFrame(accounts)
df.to_parquet('accounts.parquet', index=False)

Function Specification

  • fetch(year: str = "near3") -> list[dict]
    • Specify a year (YYYY) or "near3" for the latest 3 months
    • Raises an exception on failure

Internal Workflow

  1. Fetch notice list (POST: sel_pubs.php)
  2. Fetch submits by Financial Institutions (POST: pubs_dispatcher.php)
  3. Fetch subjects (GET: pubs_basic_frame.php)
  4. Fetch accounts of financial crime (POST: k_pubstype_01_detail.php, etc.)

Parameters required for each step are extracted from the HTML and used for subsequent page transitions.

Logging

Uses Python's standard logging module. For detailed logs:

import logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s %(levelname)s %(name)s %(message)s')
import sagikoza
sagikoza.fetch()

By default, only WARNING and above are shown. For more detail, set level=logging.DEBUG.

Notes

  • This library retrieves data from public sources. Changes to the source website may affect functionality
  • Accuracy and completeness of retrieved data are not guaranteed. Please use together with official information

License

Apache License 2.0

  • BeautifulSoup (MIT License)

Contribution

Bug reports, feature requests, and pull requests are welcome. Please use GitHub Issues or Pull Requests.

Reference

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sagikoza-2.2.0.tar.gz (29.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sagikoza-2.2.0-py3-none-any.whl (24.6 kB view details)

Uploaded Python 3

File details

Details for the file sagikoza-2.2.0.tar.gz.

File metadata

  • Download URL: sagikoza-2.2.0.tar.gz
  • Upload date:
  • Size: 29.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.18

File hashes

Hashes for sagikoza-2.2.0.tar.gz
Algorithm Hash digest
SHA256 9777ce58425f471edc3644b7aff3d7bb9718b08b31ce4453e48aab324421b04f
MD5 5ba1f57ed9d9ef8f8d801f805ee89cf8
BLAKE2b-256 d381825924c37ac1fec291300087f51cdb57ec065621ad261023dfc3277d78d1

See more details on using hashes here.

File details

Details for the file sagikoza-2.2.0-py3-none-any.whl.

File metadata

  • Download URL: sagikoza-2.2.0-py3-none-any.whl
  • Upload date:
  • Size: 24.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.18

File hashes

Hashes for sagikoza-2.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0a2c5f4fbcba2d3b0023695182ad57675759b35cec8d0573a0d40e7964385a2b
MD5 8168204077de8dbff25ad08cccdcf39c
BLAKE2b-256 03cc6d2c6278dacd7c7e0195d3cdc8945802fd00b4664ea2b28cd7bb472abe64

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page