Skip to main content

jpcorpreg is a Python library that downloads corporate registry which is published in the Corporate Number Publication Site as a data frame.

Project description

jpcorpreg

Test PyPI - Version

jpcorpreg is a Python library that downloads corporate registry which is published in the Corporate Number Publication Site as a data frame.

Installation


jpcorpreg is available on pip installation.

$ python -m pip install jpcorpreg

GitHub Install

Installing the latest version from GitHub:

$ git clone https://github.com/new-village/jpcorpreg
$ cd jpcorpreg
$ pip install -e .

Usage

This section demonstrates how to use this library to load and process data from the National Tax Agency's Corporate Number Publication Site.

Starting with version 2.0.0, jpcorpreg provides a robust object-oriented client (CorporateRegistryClient) optimized for reading large datasets and native Parquet partitioning.

Initializing the Client

First, import and initialize the client:

from jpcorpreg import CorporateRegistryClient
client = CorporateRegistryClient()

Direct Data Loading

To download data for a specific prefecture as a pandas DataFrame, use the download_prefecture method. By passing the prefecture name in as an argument, it will perform streaming fetch from the National Tax site:

>>> df = client.download_prefecture("Shimane")

To execute the download across all prefectures across Japan, simply use download_all:

>>> df = client.download_all()

Differential Data Loading

If you want to download only the daily differential updates (sabun), use the download_diff function. By passing a date in YYYYMMDD format, you can download the diff for that specific date. If no date is provided, the latest available diff is returned.

>>> df = client.download_diff("20260220")

Parquet Output and Partitioning

If you prefer to save the downloaded data for data lakes explicitly, pass format="parquet". You can also supply the partition_cols argument so that the dataset is written in partitioned directories on disk. For example, partitioning by update_date. The function returns the output base directory path.

>>> # Example: Output differential data partitioned by update_date
>>> out_dir = client.download_diff(format="parquet", partition_cols=["update_date"])

You can then read the dynamically generated Parquet Dataset efficiently with pandas or PyArrow:

>>> import pandas as pd
>>> df = pd.read_parquet(out_dir)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jpcorpreg-2.0.0.tar.gz (13.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

jpcorpreg-2.0.0-py3-none-any.whl (12.7 kB view details)

Uploaded Python 3

File details

Details for the file jpcorpreg-2.0.0.tar.gz.

File metadata

  • Download URL: jpcorpreg-2.0.0.tar.gz
  • Upload date:
  • Size: 13.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for jpcorpreg-2.0.0.tar.gz
Algorithm Hash digest
SHA256 d1286dd028870d2dca3114fd313cb703ebbf83609067a1fbc1f0ee3356fc3fb3
MD5 73d6ef3b0189d474d37bf120fb932e46
BLAKE2b-256 1bdb91dce1cb069e17341c52a88dc11c8ea46659da122dbf325867d54e878ce1

See more details on using hashes here.

File details

Details for the file jpcorpreg-2.0.0-py3-none-any.whl.

File metadata

  • Download URL: jpcorpreg-2.0.0-py3-none-any.whl
  • Upload date:
  • Size: 12.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for jpcorpreg-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1023a832dd84633dda3e4fb1d0443421e43c1ac21ef9049d7195fb005f008731
MD5 c6caba916756d2fb8ff318678f331943
BLAKE2b-256 90cd739dfb87feb3e7da84938cf5d8b1bf580014309a156d3c303c6588b12cb7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page