A Python library for crawling and retrieving all notices published under Japan’s Furikome Sagi Relief Act, with support for both full data extraction and incremental updates.
Project description
sagikoza
This is a Python library that automatically collects and obtains all public notices based on Japan's “Furikome Sagi Relief Act” You can obtain account information contained in public notices for the past three months or one year, in a registered format or standardized format.
Features
- Fetching by year or for the latest 3 months
- Parsing of collected data
- Handling data by list of dictionary
- Retry support when fetching fails
- Ensure consistent ID assignment. This can support incremental updates.
- Standardization of account names fileds and date fields (optional)
Supported Environments
- Python 3.8 or later
Installation
Install from PyPI:
python -m pip install sagikoza
Latest from GitHub:
git clone https://github.com/new-village/sagikoza
cd sagikoza
python setup.py install
Usage
Fetch notices for a specific year.
Fetching notices for the year specified in the parameter. This parameter may be available after 2008.
import sagikoza
accounts = sagikoza.fetch('2025')
print(accounts[:5])
# [{'uid': 'd06beb...', 'bank_name': 'みずほ銀行', 'name': 'グエン テイ ホアイ ニエン', 'name_alias': 'NGUYEN THI HOAI NHIEN' ...}, ...]
Fetch notices for the last 3 months
Fetching without arguments to get notices from the latest 3 months.
import sagikoza
accounts = sagikoza.fetch()
print(accounts)
# [{'uid': 'd06beb...', 'bank_name': 'みずほ銀行', 'name': 'グエン テイ ホアイ ニエン', 'name_alias': 'NGUYEN THI HOAI NHIEN' ...}, ...]
Fetch raw data
If you want to fetch raw data before normalization, set the normalize parameter to False to skip the normalization process.
import sagikoza
accounts = sagikoza.fetch('near3', normalize=False)
print(accounts)
# [{'uid': 'd06beb...', 'bank_name': 'みずほ銀行', 'name': 'NGUYEN THI HOAI NHIEN (グエン テイ ホアイ ニエン)', ...}, ...]
Save data example
I recommend you to use pandas's to_parquet, if you would like to save the data in local.
import pandas as pd
import sagikoza
accounts = sagikoza.fetch()
df = pd.DataFrame(accounts)
df.to_parquet('accounts.parquet', index=False)
Function Specification
fetch(year: str = "near3") -> list[dict]- Specify a year (YYYY) or "near3" for the latest 3 months
- Raises an exception on failure
Internal Workflow
- Fetch notice list (POST: sel_pubs.php)
- Fetch submits by Financial Institutions (POST: pubs_dispatcher.php)
- Fetch subjects (GET: pubs_basic_frame.php)
- Fetch accounts of financial crime (POST: k_pubstype_01_detail.php, etc.)
Parameters required for each step are extracted from the HTML and used for subsequent page transitions.
Logging
Uses Python's standard logging module. For detailed logs:
import logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s %(levelname)s %(name)s %(message)s')
import sagikoza
sagikoza.fetch()
By default, only WARNING and above are shown. For more detail, set level=logging.DEBUG.
Notes
- This library retrieves data from public sources. Changes to the source website may affect functionality
- Accuracy and completeness of retrieved data are not guaranteed. Please use together with official information
License
Apache License 2.0
- BeautifulSoup (MIT License)
Contribution
Bug reports, feature requests, and pull requests are welcome. Please use GitHub Issues or Pull Requests.
Reference
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sagikoza-2.2.0.tar.gz.
File metadata
- Download URL: sagikoza-2.2.0.tar.gz
- Upload date:
- Size: 29.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9777ce58425f471edc3644b7aff3d7bb9718b08b31ce4453e48aab324421b04f
|
|
| MD5 |
5ba1f57ed9d9ef8f8d801f805ee89cf8
|
|
| BLAKE2b-256 |
d381825924c37ac1fec291300087f51cdb57ec065621ad261023dfc3277d78d1
|
File details
Details for the file sagikoza-2.2.0-py3-none-any.whl.
File metadata
- Download URL: sagikoza-2.2.0-py3-none-any.whl
- Upload date:
- Size: 24.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0a2c5f4fbcba2d3b0023695182ad57675759b35cec8d0573a0d40e7964385a2b
|
|
| MD5 |
8168204077de8dbff25ad08cccdcf39c
|
|
| BLAKE2b-256 |
03cc6d2c6278dacd7c7e0195d3cdc8945802fd00b4664ea2b28cd7bb472abe64
|