jpcorpreg is a Python library that downloads corporate registry which is published in the Corporate Number Publication Site as a data frame.

Project description

jpcorpreg

PyPI - Version

jpcorpreg is a Python library that downloads corporate registry which is published in the Corporate Number Publication Site as a data frame.

Installation

jpcorpreg is available on pip installation.

$ python -m pip install jpcorpreg

GitHub Install

Installing the latest version from GitHub:

$ git clone https://github.com/new-village/jpcorpreg
$ cd jpcorpreg
$ pip install -e .

Usage

This section demonstrates how to use this library to load and process data from the National Tax Agency's Corporate Number Publication Site.

Starting with version 2.0.0, jpcorpreg provides a robust object-oriented client (CorporateRegistryClient) optimized for reading large datasets and native Parquet partitioning. Recent updates introduce chunked streaming capabilities that allow downloading and parsing of the entire national dataset (All prefectures) within tight memory bounds (e.g. Cloud Run deployments with less than 1GB RAM) without creating large temporary files.

Initializing the Client

First, import and initialize the client:

from jpcorpreg import CorporateRegistryClient
client = CorporateRegistryClient()

Direct Data Loading

To download data for a specific prefecture as a pandas DataFrame, use the fetch method. By passing the prefecture name in as an argument, it will perform streaming fetch from the National Tax site:

>>> df = client.fetch("Shimane")

To execute the download across all prefectures across Japan, simply leave the parameter empty or pass "All":

>>> df = client.fetch()

Differential Data Loading

If you want to download only the daily differential updates (sabun), use the fetch_diff function. By passing a date in YYYYMMDD format, you can download the diff for that specific date. If no date is provided, the latest available diff is returned.

>>> df = client.fetch_diff("20260220")

Parquet Output and Partitioning

If you prefer to save the downloaded data for data lakes explicitly, pass format="parquet". You can also supply the partition_cols argument so that the dataset is written in partitioned directories on disk. The function returns the output base directory path.

Partitioning Context Notes:

For fetch() (full wash dataset), use something like partition_cols=["prefecture_name"]. Avoid using "update_date" on a full data wash to prevent query fragmentation.
For fetch_diff() (daily diff data), use partition_cols=["update_date"] to append daily updates seamlessly into your data lake structure.

>>> # Example: Output differential data partitioned by update_date
>>> out_dir = client.fetch_diff(format="parquet", partition_cols=["update_date"])

You can then read the dynamically generated Parquet Dataset efficiently with pandas or PyArrow:

>>> import pandas as pd
>>> df = pd.read_parquet(out_dir)

Project details

Release history Release notifications | RSS feed

This version

2.0.2

Mar 8, 2026

2.0.1

Feb 22, 2026

2.0.0

Feb 22, 2026

1.8.1

Aug 16, 2025

1.8.0

Aug 16, 2025

1.7.0

Aug 12, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jpcorpreg-2.0.2.tar.gz (14.3 kB view details)

Uploaded Mar 8, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

jpcorpreg-2.0.2-py3-none-any.whl (13.3 kB view details)

Uploaded Mar 8, 2026 Python 3

File details

Details for the file jpcorpreg-2.0.2.tar.gz.

File metadata

Download URL: jpcorpreg-2.0.2.tar.gz
Upload date: Mar 8, 2026
Size: 14.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for jpcorpreg-2.0.2.tar.gz
Algorithm	Hash digest
SHA256	`b865a7acff1c55c3d84251008422cdb5eafaa05481731b148f4e622f094ea0fd`
MD5	`38aaf7d14b98ae1c9a3785e233ae3924`
BLAKE2b-256	`60e32d55e7c91e089365e1983b0707161091a8fcded7d05992fe049a941a432c`

See more details on using hashes here.

File details

Details for the file jpcorpreg-2.0.2-py3-none-any.whl.

File metadata

Download URL: jpcorpreg-2.0.2-py3-none-any.whl
Upload date: Mar 8, 2026
Size: 13.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for jpcorpreg-2.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2f733b9f57044e8dc26920dce5546c12b3562dc8de85a23847d801ab44c114a4`
MD5	`63e34778773388c5ee007bb58058878b`
BLAKE2b-256	`5f43d0b7aff50a3802e177d601f9015dab6df3e61c1b6a9eb576ea1f2c1fbc44`

See more details on using hashes here.

jpcorpreg 2.0.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

jpcorpreg

Installation

GitHub Install

Usage

Initializing the Client

Direct Data Loading

Differential Data Loading

Parquet Output and Partitioning

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes