Skip to main content

A Python library for crawling and retrieving all notices published under Japan’s Furikome Sagi Relief Act, with support for both full data extraction and incremental updates.

Project description

sagikoza

PyPI - Version

This is a Python library that automatically collects and obtains all public notices based on Japan's “Furikome Sagi Relief Act” You can obtain account information contained in public notices for the past three months or one year, in a registered format or standardized format.

日本語の説明はこちらを参照ください

Features

  • Fetching by year or for the latest 3 months
  • Parsing of collected data
  • Handling data by list of dictionary
  • Retry support when fetching fails
  • Ensure consistent ID assignment. This can support incremental updates.
  • Standardization of account names fileds and date fields (optional)

Supported Environments

  • Python 3.8 or later

Installation

Install from PyPI:

python -m pip install sagikoza

Latest from GitHub:

git clone https://github.com/new-village/sagikoza
cd sagikoza
python setup.py install

Usage

Fetch notices for a specific year.

Fetching notices for the year specified in the parameter. This parameter may be available after 2008.

import sagikoza
accounts = sagikoza.fetch('2025')
print(accounts[:5])
# [{'uid': 'd06beb...', 'bank_name': 'みずほ銀行', 'name': 'グエン テイ ホアイ ニエン', 'name_alias': 'NGUYEN THI HOAI NHIEN' ...}, ...] 

Fetch notices for the last 3 months

Fetching without arguments to get notices from the latest 3 months.

import sagikoza
accounts = sagikoza.fetch()
print(accounts)
# [{'uid': 'd06beb...', 'bank_name': 'みずほ銀行', 'name': 'グエン テイ ホアイ ニエン', 'name_alias': 'NGUYEN THI HOAI NHIEN' ...}, ...] 

Fetch raw data

If you want to fetch raw data before normalization, set the normalize parameter to False to skip the normalization process.

import sagikoza
accounts = sagikoza.fetch('near3', normalize=False)
print(accounts)
# [{'uid': 'd06beb...', 'bank_name': 'みずほ銀行', 'name': 'NGUYEN THI HOAI NHIEN (グエン テイ ホアイ ニエン)', ...}, ...] 

Save data example

I recommend you to use pandas's to_parquet, if you would like to save the data in local.

import pandas as pd
import sagikoza
accounts = sagikoza.fetch()
df = pd.DataFrame(accounts)
df.to_parquet('accounts.parquet', index=False)

Function Specification

  • fetch(year: str = "near3") -> list[dict]
    • Specify a year (YYYY) or "near3" for the latest 3 months
    • Raises an exception on failure

Internal Workflow

  1. Fetch notice list (POST: sel_pubs.php)
  2. Fetch submits by Financial Institutions (POST: pubs_dispatcher.php)
  3. Fetch subjects (GET: pubs_basic_frame.php)
  4. Fetch accounts of financial crime (POST: k_pubstype_01_detail.php, etc.)

Parameters required for each step are extracted from the HTML and used for subsequent page transitions.

Logging

Uses Python's standard logging module. For detailed logs:

import logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s %(levelname)s %(name)s %(message)s')
import sagikoza
sagikoza.fetch()

By default, only WARNING and above are shown. For more detail, set level=logging.DEBUG.

Notes

  • This library retrieves data from public sources. Changes to the source website may affect functionality
  • Accuracy and completeness of retrieved data are not guaranteed. Please use together with official information

License

Apache License 2.0

  • BeautifulSoup (MIT License)

Contribution

Bug reports, feature requests, and pull requests are welcome. Please use GitHub Issues or Pull Requests.

Reference

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sagikoza-2.3.0.tar.gz (30.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sagikoza-2.3.0-py3-none-any.whl (25.8 kB view details)

Uploaded Python 3

File details

Details for the file sagikoza-2.3.0.tar.gz.

File metadata

  • Download URL: sagikoza-2.3.0.tar.gz
  • Upload date:
  • Size: 30.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.18

File hashes

Hashes for sagikoza-2.3.0.tar.gz
Algorithm Hash digest
SHA256 e1890197d62a137e9940a549d77d1911e60ec2075f77bce6efa17af0334fa901
MD5 759a16e39192b883a961e9959d26fdf7
BLAKE2b-256 0f7b25b8ed27875d8e3842ab3fe7ed5f40b2669c5da2ede9e9e3eeb0af4d87e9

See more details on using hashes here.

File details

Details for the file sagikoza-2.3.0-py3-none-any.whl.

File metadata

  • Download URL: sagikoza-2.3.0-py3-none-any.whl
  • Upload date:
  • Size: 25.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.18

File hashes

Hashes for sagikoza-2.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 da72ff537e0786634ba045571aae976e9da1e2980c93215803b0fd65e8d4d79e
MD5 4a2b524f83dbe32d4e217758bded5f6b
BLAKE2b-256 8235b199b2a989ffe34d4b91f04f38ec185ea0bbd6163c50528878b4ccb9bf6a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page