Find Taiwan ZIP code by address fuzzily.
Project description
This package lets you find ZIP code by address in Taiwan.
The main features:
- Fast. It builds ZIP code index by tokenization.
- Gradual. It returns partial ZIP code rather than noting when address is not detailed enoguh.
- Stand-alone. It depends on nothing.
Usage
Find ZIP code gradually:
>>> import zipcodetw >>> zipcodetw.find('臺北市') u'1' >>> zipcodetw.find('臺北市信義區') u'110' >>> zipcodetw.find('臺北市信義區市府路') u'110' >>> zipcodetw.find('臺北市信義區市府路1號') u'11008'
After v0.3, you even can find ZIP code like:
>>> zipcodetw.find('松山區') u'105' >>> zipcodetw.find('秀山街') u'' >>> zipcodetw.find('台北市秀山街') u'10042'
Installation
It is available on PyPI:
$ sudo pip install zipcodetw
Just install it and have fun. :)
Build Index Manually
If you install it by pip or python setup.py install, a ZIP code index will be built automatically. But if you want to use it from source code, you have to build an index manually:
$ python -m zipcodetw.builder
Data
The ZIP code directory is provided by Chunghwa Post, and is available from: http://www.post.gov.tw/post/internet/Download/all_list.jsp?ID=2201#dl_txt_s_A0206
Changelog
v0.6.1
- Fix the py2 py3 compatibility. Thanks the contribution from Poren Chiang and Ryan.
v0.6
- Updated the data to 2014/12.
v0.5.7
- Fixed a rarely issue that causes IndexError.
v0.5.6
- Reverted removing insignificant tokens introduced in v0.5.4.
- It now handles insignificant tokens; and
- redundant units in the finding logic (directory.find).
- Allowed number token ends without unit.
- Now address.tokens is a list.
v0.5.5
- Fixed a gradual matching issue causing some wrong results.
v0.5.4
- Removed the token whose unit is insignificant automatically.
v0.5.3
- Fixed and simplified the matching logic for address tail.
- Refined the index building logic.
v0.5.2
- Fixed the issue while it was running in multi-threaded environment.
- Added a new argument, keep_alive, for the Directory class.
v0.5.1
- Refined the code slightly.
v0.5
- It now builds a ZIP code index when you install it; so
- the package size is 12.5x smaller.
- The internal API is better now.
v0.4
- It now shipped with an index compiled in SQLite; so
- initiation time is ~680x faster, i.e. ~30ms each import; and
- zipcodetw.find is ~1.9x slower, i.e. ~2ms each call; and
- has bigger package size.
- All code was moved into zipcodetw package.
- zipcodetw.find now returns unicode instead of string.
v0.3
- It builds full index for middle tokens; and
- also normalizes Chinese numerals now!
- zipcodetw.find is ~1.06x faster.
- But initiation time increases to ~1.7x.
v0.2
- zipcodetw.find is 8x faster now!
- It has a better tokenizing logic; and
- a better matching logic for sub-number now.
- zipcodetw.find_zipcodes was removed.
- Internal API was changed a lot.
- The tests are better now.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Filename, size | File type | Python version | Upload date | Hashes |
---|---|---|---|---|
Filename, size zipcodetw-0.6.4.1989.tar.gz (436.2 kB) | File type Source | Python version None | Upload date | Hashes View |