Skip to main content

Chinese Province, City and Area Recognition Utilities

Project description

chinese_province_city_area_mapper

chinese_province_city_area_mapper:一个用于识别简体中文字符串中省,市和区并能够进行映射,检验和简单绘图的python模块

举个例子:

 ["徐汇区虹漕路461号58号楼5楼", "泉州市洛江区万安塘西工业区"]
         ↓ 转换
|省    |市   |区    |地址                 |
|上海市|上海市|徐汇区|虹漕路461号58号楼5楼  |
|福建省|泉州市|洛江区|万安塘西工业区        |

chinese_province_city_area_mapper: built to be recognize Chinese province,city and area in simplified Chinese string, it can automaticall map area to city and map city to province. for example:

 ["徐汇区虹漕路461号58号楼5楼", "泉州市洛江区万安塘西工业区"]
         ↓ transform
|省    |市   |区    |地址                 |
|上海市|上海市|徐汇区|虹漕路461号58号楼5楼  |
|福建省|泉州市|洛江区|万安塘西工业区        |

完整文档见该模块的Github, GitHub: https://github.com/DQinYuan/chinese_province_city_area_mapper

特点

  • 基于jieba分词进行匹配,同时加入了一些额外的匹配逻辑保证了准确率

  • 如果地址数据比较脏的,不能指望依靠这个模块达到100%的准确,本模块只能保证尽可能地提取信息,如果想要达到100%准确率的话,最好在匹配完后再人工核验一下

  • 自带完整的省,市,区三级地名及其经纬度的数据

  • 支持自定义省,市,区映射

  • 输出的是基于pandas的DataFrame类型的表结构,易于理解和使用

  • 封装了简单的绘图功能,可以很方便地进行简单的数据可视化

  • MIT 授权协议

安装说明

代码目前仅仅支持python3

pip install cpca

Get Started

本模块中最主要的方法是cpca.transform, 该方法可以输入任意的可迭代类型(如list,pandas的Series类型等), 然后将其转换为一个DataFrame,下面演示一个最为简单的使用方法:

location_str = ["徐汇区虹漕路461号58号楼5楼", "泉州市洛江区万安塘西工业区", "朝阳区北苑华贸城"]
from cpca import *
df = transform(location_str)
df

输出的结果为:

     区    市      省         地址
0  徐汇区  上海市  上海市   虹漕路461号58号楼5楼
1  洛江区  泉州市  福建省   万安塘西工业区
2  朝阳区  北京市  北京市   北苑华贸城

如果还想知道更多的细节,请访问该 模块的github地址 https://github.com/DQinYuan/chinese_province_city_area_mapper, 在那里我写了更多的细节.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cpca-0.3.tar.gz (69.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cpca-0.3-py3-none-any.whl (73.4 kB view details)

Uploaded Python 3

File details

Details for the file cpca-0.3.tar.gz.

File metadata

  • Download URL: cpca-0.3.tar.gz
  • Upload date:
  • Size: 69.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.14.2 setuptools/40.0.0 requests-toolbelt/0.8.0 tqdm/4.23.4 CPython/3.6.1

File hashes

Hashes for cpca-0.3.tar.gz
Algorithm Hash digest
SHA256 e8871963e65ead42e63e34ff210d43e2944e71ef95bb321051c2792fb20a049c
MD5 1563b6ec8211dc2d405dd3439092512c
BLAKE2b-256 33057bfc71e2973fb81f0e85d5ae81c4e1f70c54654f5be7be52f30fd4640d28

See more details on using hashes here.

File details

Details for the file cpca-0.3-py3-none-any.whl.

File metadata

  • Download URL: cpca-0.3-py3-none-any.whl
  • Upload date:
  • Size: 73.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.14.2 setuptools/40.0.0 requests-toolbelt/0.8.0 tqdm/4.23.4 CPython/3.6.1

File hashes

Hashes for cpca-0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 649eebd93806133ca940eabf83763bd69fde4fa37027fc9ad7b0ed32fa976db9
MD5 6df55d9d39fd6e34cbd265aefee977ff
BLAKE2b-256 47634e58203340ac939a7eb518a23350bd4fe9637a7d35fdf0203b6c5418474a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page