Skip to main content

cnparser is a parser library of Corporate Number Publication Site data.

Project description

cnparser

Test PyPI - Version

cnparser is a python library for loading and enrichment Corporate Number Publication Site data that is provided from National Tax Agency Japan. cnparser only support to parse latest data now.

Installation


cnparser is available on pip installation.

$ python -m pip install cnparser

GitHub Install

Installing the latest version from GitHub:

$ git clone https://github.com/new-village/cnparser
$ cd cnparser
$ python setup.py install

Usage

This section demonstrates how to use this library to load and process data from the National Tax Agency's Corporate Number Publication Site.

Direct Data Loading

To download data for a specific prefecture, use the load function. By passing the prefecture name as an argument, you can obtain a DataFrame containing data for that prefecture.If you wish to download data for a specific prefecture, you must specify the prefecture name in Roman characters (list of the supported prefectures).
To execute the load function without specifying any arguments, data for all prefectures across Japan will be downloaded.

>>> import cnparser
>>> df = cnparser.load("Shimane")

CSV Data Loading

If you already have a downloaded CSV file, use the read_csv function. By passing the file path as an argument, you can obtain a DataFrame with headers from the CSV data.

>>> import cnparser
>>> df = cnparser.read_csv("path/to/data.csv")

Data Enrichment Functionality

The enrich function standardises and transforms the values of specific fields in the loaded DataFrame.

>>> import cnparser
>>> df = cnparser.enrich(df)

The functions perform all processing, but it is possible to apply only specific processing by defining specific processing as an argument.

>>> import cnparser
>>> df = cnparser.enrich(df, "enrich_kana" ...)

The processes supported by the enrich function are as follows:

  • enrich_kana: Function that adds a standardized furigana column furigana to the DataFrame. It handles data entry by converting name to kana, if furigana is NaN. Note that currently only kanji and katakana conversions are supported. Alphabet conversions are not supported.
  • enrich_kind: Function that adds the kind label to the legal_entity.
  • enrich_post_code: Function that adds the formatted postcode as XXX-XXX to post_code.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cnparser-1.6.3.tar.gz (13.6 kB view details)

Uploaded Source

Built Distribution

cnparser-1.6.3-py3-none-any.whl (14.7 kB view details)

Uploaded Python 3

File details

Details for the file cnparser-1.6.3.tar.gz.

File metadata

  • Download URL: cnparser-1.6.3.tar.gz
  • Upload date:
  • Size: 13.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.15

File hashes

Hashes for cnparser-1.6.3.tar.gz
Algorithm Hash digest
SHA256 a585f5e8192c2ec72fc75fbe320eff455914b737b4b4a5267cb9b9f7c13062f4
MD5 218ae60fae1a4a9f00adb3c2dcdaa93c
BLAKE2b-256 75912f80d4023f932c1d2be3f986503aa90b5c47bf2576d82a6c72c4b93eef68

See more details on using hashes here.

File details

Details for the file cnparser-1.6.3-py3-none-any.whl.

File metadata

  • Download URL: cnparser-1.6.3-py3-none-any.whl
  • Upload date:
  • Size: 14.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.15

File hashes

Hashes for cnparser-1.6.3-py3-none-any.whl
Algorithm Hash digest
SHA256 084927571660d7ff322e92e4c73f782689470998e731a08d3b32e1282791198f
MD5 4d62b11ccf3a21c922c3cfec3c40a61c
BLAKE2b-256 a7ad714858342b053d333fa331072c8f3738f5c5c33437fa7af527b48bc592fb

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page