Skip to main content

Library to provide list of Vietnam administrative divisions (tỉnh thành, quận huyện, phường xã).

Project description

https://madewithlove.now.sh/vn?heart=true&colorA=%23ffcd00&colorB=%23da251d https://badgen.net/pypi/v/vietnam-provinces

Library to provide list of Vietnam administrative divisions (tỉnh thành, quận huyện, phường xã) with the name and code as defined by General Statistics Office of Viet Nam (Tổng cục Thống kê).

Example:

{
    "name": "Tỉnh Cà Mau",
    "code": 96,
    "codename": "tinh_ca_mau",
    "division_type": "tỉnh",
    "phone_code": 290,
    "districts": [
        {
            "name": "Huyện Đầm Dơi",
            "code": 970,
            "codename": "huyen_dam_doi",
            "division_type": "huyện",
            "wards": [
                {
                    "name": "Thị trấn Đầm Dơi",
                    "code": 32152,
                    "codename": "thi_tran_dam_doi",
                    "division_type": "thị trấn"
                },
                {
                    "name": "Xã Tạ An Khương",
                    "code": 32155,
                    "codename": "xa_ta_an_khuong",
                    "division_type": "xã"
                },
            ]
        }
    ]
}

This library provides data in these forms:

  1. JSON

This data is suitable for applications which don’t need to access the data often. They are fine with loading JSON and extract information from it. The JSON files are saved in data folder. You can get the file path via vietnam_provinces.NESTED_DIVISIONS_JSON_PATH variable.

Note that this variable only returns the path of the file, not the content. It is up to application developer to use any method to parse the JSON. For example:

import orjson
import rapidjson
from vietnam_provinces import NESTED_DIVISIONS_JSON_PATH

NESTED_DIVISIONS_JSON_PATH.open() as f:
    rapidjson.load(f)

# Or
orjson.loads(NESTED_DIVISIONS_JSON_PATH.read_bytes())

Due to the big amount of data (10767 wards all over Viet Nam), this loading will be slow.

  1. Python data type

This data is useful for some applications which need to access the data more often. They are built as Enum, where you can import in Python code:

>>> from vietnam_provinces.enums import ProvinceEnum, ProvinceDEnum, DistrictEnum, DistrictDEnum

>>> ProvinceEnum.P_77
<ProvinceEnum.P_77: Province(name='Tỉnh Bà Rịa - Vũng Tàu', code=77, division_type=<VietNamDivisionType.TINH: 'tỉnh'>, codename='tinh_ba_ria_vung_tau', phone_code=254)>

>>> ProvinceDEnum.BA_RIA_VUNG_TAU
<ProvinceDEnum.BA_RIA_VUNG_TAU: Province(name='Tỉnh Bà Rịa - Vũng Tàu', code=77, division_type=<VietNamDivisionType.TINH: 'tỉnh'>, codename='tinh_ba_ria_vung_tau', phone_code=254)>

>>> DistrictEnum.D_624
>>> <DistrictEnum.D_624: District(name='Thị xã Ayun Pa', code=624, division_type=<VietNamDivisionType.THI_XA: 'thị xã'>, codename='thi_xa_ayun_pa', province_code=64)>

>>> DistrictDEnum.AYUN_PA_GL
<DistrictDEnum.AYUN_PA_GL: District(name='Thị xã Ayun Pa', code=624, division_type=<VietNamDivisionType.THI_XA: 'thị xã'>, codename='thi_xa_ayun_pa', province_code=64)>

>>> from vietnam_provinces.enums.wards import WardEnum, WardDEnum

>>> WardEnum.W_7450
<WardEnum.W_7450: Ward(name='Xã Đông Hưng', code=7450, division_type=<VietNamDivisionType.XA: 'xã'>, codename='xa_dong_hung', district_code=218)>

>>> WardDEnum.BG_DONG_HUNG_7450
<WardDEnum.BG_DONG_HUNG_7450: Ward(name='Xã Đông Hưng', code=7450, division_type=<VietNamDivisionType.XA: 'xã'>, codename='xa_dong_hung', district_code=218)>

Loading wards this way is far more faster than the JSON option.

They are made as Enum, so that library user can take advantage of auto-complete feature of IDE/code editors in development. It prevents mistake due to typing wrong variable.

The Ward Enum has two variants:

  • WardEnum: Has member name in form of numeric ward code (W_28912). It helps look up a ward by its code (which is a most-seen use case).
  • WardDEnum: Has more readable member name (D means “descriptive”), to help the application code easier to reason about. For example, looking at WardDEnum.BT_PHAN_RI_CUA_22972, the programmer can guess that this ward is “Phan Rí Cửa”, of “Bình Thuận” province.

Similarly, other levels (District, Province) also have two variants of Enum.

Example of looking up Ward, District, Province with theirs numeric code:

# Assume that you are loading user info from your database
user_info = load_user_info()

province_code = user_info['province_code']
province = ProvinceEnum[f'P_{province_code}'].value

Unlike ProvinceDEnum, DistrictDEnum, the WardDEnum has ward code in member name. It is because there are too many Vietnamese wards with the same name. There is no way to build unique ID for wards, with pure Latin letters (Vietnamese punctuations stripped), even if we add district and province info to the ID. Let’s take “Xã Đông Thành” and “Xã Đông Thạnh” as example. Both belong to “Huyện Bình Minh” of “Vĩnh Long”, both produces ID name “DONG_THANH”. Although Python allows Unicode as ID name, like “ĐÔNG_THẠNH”, but it is not practical yet because the code formatter tool (Black) will still normalizes it to Latin form.

Because the WardEnum has many records (10767 at the time of wring, February 2020) and may not be needed in some applications, I move it to separate module, to avoid loading automatically to application.

Member of these enums, the Province, District and Ward data types, all are immutable. They can be imported from top-level of vietnam_provinces.

While Province and District types are namedtuple, Ward are a frozen dataclass. This is because of a difficult situation, where standard Enum is too slow to load when it has very many members, and the faster alternative, fast-enum, has compatible issue with namedtuple.

Install

pip3 install vietnam-provinces

This library is compatible with Python 3.7+ (due to the use of dataclass).

Development

In development, this project has a tool to convert data from government sources.

The tool doesn’t directly crawl data from government websites because the data rarely change (it doesn’t worth developing the feature which you only need to use each ten years), and because those websites provide data in unfriendly Microsoft Office formats.

Update data

In the future, when the authority reorganize administrative divisions, we need to collect this data again from GSOVN website. Do:

  • Go to: https://www.gso.gov.vn/dmhc2015/ (this URL may change when GSOVN replaces their software).
  • Find the button “Xuất Excel”.
  • Tick the “Quận Huyện Phường Xã” checkbox.
  • Click the button to export and download list of units in Excel (XLS) file.
  • Use LibreOffice to convert Excel file to CSV file. For example, we name it Xa_2020-02-25.csv.
  • Run this tool to compute data to JSON format:
python3 -m dev -i dev/seed-data/Xa_2020-02-25.csv -o data/nested-divisions.json

You can run

python3 -m dev --help

to see more options of that tool.

Note that this tool is only available in the source folder (cloned from Git). It is not included in the distributable Python package.

Generate Python code

python3 -m dev -i dev/seed-data/Xa_2020-02-25.csv -f python

Data source

Credit

Given to you by Nguyễn Hồng Quân, after nights and weekends.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for vietnam-provinces, version 0.2.1
Filename, size File type Python version Upload date Hashes
Filename, size vietnam_provinces-0.2.1-py3-none-any.whl (726.8 kB) File type Wheel Python version py3 Upload date Hashes View
Filename, size vietnam-provinces-0.2.1.tar.gz (690.5 kB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page