Skip to main content

A Python library for crawling and retrieving all notices published under Japan’s Furikome Sagi Relief Act, with support for both full data extraction and incremental updates.

Project description

sagikoza

PyPI - Version

This is a Python library that automatically collects and obtains all public notices based on Japan's “Furikome Sagi Relief Act” You can obtain account information contained in public notices for the past three months or one year, in a registered format or standardized format.

日本語の説明はこちらを参照ください

Features

  • Fetching by year or for the latest 3 months
  • Parsing of collected data
  • Handling data by list of dictionary
  • Retry support when fetching fails
  • Ensure consistent ID assignment. This can support incremental updates.
  • Standardization of account names fileds and date fields (optional)

Supported Environments

  • Python 3.8 or later

Installation

Install from PyPI:

python -m pip install sagikoza

Latest from GitHub:

git clone https://github.com/new-village/sagikoza
cd sagikoza
python setup.py install

Usage

Fetch notices for a specific year.

Fetching notices for the year specified in the parameter. This parameter may be available after 2008.

import sagikoza
accounts = sagikoza.fetch('2025')
print(accounts[:5])
# [{'uid': 'd06beb...', 'bank_name': 'みずほ銀行', 'name': 'グエン テイ ホアイ ニエン', 'name_alias': 'NGUYEN THI HOAI NHIEN' ...}, ...] 

Fetch notices for the last 3 months

Fetching without arguments to get notices from the latest 3 months.

import sagikoza
accounts = sagikoza.fetch()
print(accounts)
# [{'uid': 'd06beb...', 'bank_name': 'みずほ銀行', 'name': 'グエン テイ ホアイ ニエン', 'name_alias': 'NGUYEN THI HOAI NHIEN' ...}, ...] 

Fetch raw data

If you want to fetch raw data before normalization, set the normalize parameter to False to skip the normalization process.

import sagikoza
accounts = sagikoza.fetch('near3', normalize=False)
print(accounts)
# [{'uid': 'd06beb...', 'bank_name': 'みずほ銀行', 'name': 'NGUYEN THI HOAI NHIEN (グエン テイ ホアイ ニエン)', ...}, ...] 

Save data example

I recommend you to use pandas's to_parquet, if you would like to save the data in local.

import pandas as pd
import sagikoza
accounts = sagikoza.fetch()
df = pd.DataFrame(accounts)
df.to_parquet('accounts.parquet', index=False)

Function Specification

  • fetch(year: str = "near3") -> list[dict]
    • Specify a year (YYYY) or "near3" for the latest 3 months
    • Raises an exception on failure

Internal Workflow

  1. Fetch notice list (POST: sel_pubs.php)
  2. Fetch submits by Financial Institutions (POST: pubs_dispatcher.php)
  3. Fetch subjects (GET: pubs_basic_frame.php)
  4. Fetch accounts of financial crime (POST: k_pubstype_01_detail.php, etc.)

Parameters required for each step are extracted from the HTML and used for subsequent page transitions.

Logging

Uses Python's standard logging module. For detailed logs:

import logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s %(levelname)s %(name)s %(message)s')
import sagikoza
sagikoza.fetch()

By default, only WARNING and above are shown. For more detail, set level=logging.DEBUG.

Notes

  • This library retrieves data from public sources. Changes to the source website may affect functionality
  • Accuracy and completeness of retrieved data are not guaranteed. Please use together with official information

License

Apache License 2.0

  • BeautifulSoup (MIT License)

Contribution

Bug reports, feature requests, and pull requests are welcome. Please use GitHub Issues or Pull Requests.

Reference

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sagikoza-2.3.1.tar.gz (31.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sagikoza-2.3.1-py3-none-any.whl (26.0 kB view details)

Uploaded Python 3

File details

Details for the file sagikoza-2.3.1.tar.gz.

File metadata

  • Download URL: sagikoza-2.3.1.tar.gz
  • Upload date:
  • Size: 31.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.18

File hashes

Hashes for sagikoza-2.3.1.tar.gz
Algorithm Hash digest
SHA256 86888f38632f70be320fb787fc1870d6548493fafbbd34737ff195bf764f79a2
MD5 1f44cb31493c8b7a165c42b8109a2870
BLAKE2b-256 79157dfc97c1121b0619e49d82c2d5eb1cc55c0b9119ccf65584bc785ae18390

See more details on using hashes here.

File details

Details for the file sagikoza-2.3.1-py3-none-any.whl.

File metadata

  • Download URL: sagikoza-2.3.1-py3-none-any.whl
  • Upload date:
  • Size: 26.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.18

File hashes

Hashes for sagikoza-2.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b409ac87a45f96de37005a188d0bdad308c2d8ae59684f3044b021d88b21073d
MD5 a5ddf1e000574834a75a8c01a02e9db8
BLAKE2b-256 1f741ae68d41cab3ad1c8b27a3e64b05280b5aa1a0ca7a50839c0f82e5c9c337

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page