A library for parsing multinational street addresses using deep learning.
Project description
Here is deepparse.
Deepparse is a state-of-the-art library for parsing multinational street addresses using deep learning.
Use deepparse to:
- Use the pre-trained models to parse multinational addresses.
- Retrain our pre-trained models on new data to parse multinational addresses.
Read the documentation at deepparse.org.
Deepparse is compatible with the latest version of PyTorch and Python >= 3.6.
Countries and Results
The following table presents the accuracy on the 20 countries we used during training for both our models.
| Country | Fasttext (%) | BPEmb (%) | Country | Fasttext (%) | BPEmb (%) |
|---|---|---|---|---|---|
| Italy | 99.66 | 99.73 | United States | 99.56 | 99.53 |
| Germany | 99.72 | 99.84 | Austria | 99.19 | 99.03 |
| South Korea | 99.96 | 100.00 | Canada | 99.76 | 99.80 |
| Mexico | 99.54 | 99.60 | Australia | 99.62 | 99.74 |
| Finland | 99.75 | 99.87 | Netherlands | 99.50 | 99.84 |
| France | 99.54 | 99.50 | United Kingdom | 99.54 | 99.62 |
| Russia | 98.71 | 99.49 | Norway | 99.40 | 98.71 |
| Switzerland | 99.48 | 99.61 | Poland | 99.64 | 99.83 |
| Brazil | 99.33 | 99.24 | Denmark | 99.65 | 99.84 |
| Spain | 99.70 | 99.79 | Czechia | 99.46 | 99.83 |
We have also made a zero-shot evaluation of our models using data from 41 other countries; the results are shown in the next table.
| Country | Fasttext (%) | BPEmb (%) | Country | Fasttext (%) | BPEmb (%) |
|---|---|---|---|---|---|
| Philippines | 81.56 | 83.73 | South Africa | 92.69 | 95.03 |
| Colombia | 85.92 | 87.50 | Venezuela | 95.36 | 89.67 |
| Bermuda | 91.30 | 93.66 | Lithuania | 89.21 | 76.60 |
| Moldova | 88.51 | 89.13 | India | 66.91 | 77.26 |
| Malaysia | 81.31 | 92.78 | Bosnia | 88.91 | 84.33 |
| Belgium | 89.57 | 86.41 | Ukraine | 91.80 | 92.73 |
| Greece | 83.42 | 39.82 | Algeria | 86.93 | 80.62 |
| Slovakia | 81.00 | 91.28 | Bangladesh | 74.49 | 79.29 |
| Latvia | 93.80 | 80.18 | Reunion | 96.48 | 93.40 |
| Romania | 93.23 | 91.83 | Singapore | 84.55 | 81.68 |
| Indonesia | 63.15 | 67.97 | Cyprus | 97.69 | 98.30 |
| Portugal | 93.39 | 93.20 | Serbia | 95.62 | 94.69 |
| Croatia | 96.63 | 86.24 | Japan | 44.33 | 35.77 |
| New Caledonia | 99.42 | 99.01 | New Zealand | 97.04 | 98.86 |
| Uzbekistan | 87.63 | 71.93 | Faroe Islands | 71.73 | 85.46 |
| Hungary | 47.00 | 24.05 | Slovenia | 96.27 | 97.28 |
| Paraguay | 97.00 | 97.15 | Iceland | 95.76 | 98.01 |
| Estonia | 90.61 | 76.45 | Argentina | 89.47 | 88.55 |
| Bulgaria | 92.70 | 95.87 | Sweden | 77.29 | 87.77 |
| Belarus | 88.77 | 93.00 | Kazakhstan | 87.24 | 91.23 |
| Ireland | 86.35 | 87.49 |
Getting started:
from deepparse.parser import AddressParser
address_parser = AddressParser(model_type="bpemb", device=0)
# you can parse one address
parsed_address = address_parser("350 rue des Lilas Ouest Québec Québec G1L 1B6")
# or multiple addresses
parsed_address = address_parser(
["350 rue des Lilas Ouest Québec Québec G1L 1B6", "350 rue des Lilas Ouest Québec Québec G1L 1B6"])
# you can also get the probability of the predicted tags
parsed_address = address_parser("350 rue des Lilas Ouest Québec Québec G1L 1B6", with_prob=True)
Retrain a model
see here for a complete example.
# We will retrain the fasttext version of our pretrained model.
address_parser = AddressParser(model_type="fasttext", device=0)
address_parser.retrain(training_container, 0.8, epochs=5, batch_size=8)
Installation
Before installing deepparse, you must have the latest version of PyTorch in your environment.
- Install the stable version of deepparse:
pip install deepparse
- Install the latest development version of deepparse:
pip install -U git+https://github.com/GRAAL-Research/deepparse.git@dev
Cite
Use the following for the article;
@misc{yassine2020leveraging,
title={{Leveraging Subword Embeddings for Multinational Address Parsing}},
author={Marouane Yassine and David Beauchemin and François Laviolette and Luc Lamontagne},
year={2020},
eprint={2006.16152},
archivePrefix={arXiv}
}
and this one for the package;
@misc{deepparse,
author = {Marouane Yassine and David Beauchemin},
title = {{Deepparse: A state-of-the-art deep learning multinational addresses parser}},
year = {2020},
note = {\url{https://deepparse.org}}
}
Contributing to Deepparse
We welcome user input, whether it is regarding bugs found in the library or feature propositions ! Make sure to have a look at our contributing guidelines for more details on this matter.
License
Deepparse is LGPLv3 licensed, as found in the LICENSE file.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file deepparse-0.2.3.tar.gz.
File metadata
- Download URL: deepparse-0.2.3.tar.gz
- Upload date:
- Size: 52.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.55.0 CPython/3.8.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a595702faaeaad61ffca4da98ca6a30d78278b08999a91dc2044f5e8cfb76e01
|
|
| MD5 |
37f69a57299372b1995082ebea0304eb
|
|
| BLAKE2b-256 |
6aae2e51b7e2b8f92935fa7be7a4e04e9261ca9062be47a8ebe9d7c0a81375cf
|
File details
Details for the file deepparse-0.2.3-py3-none-any.whl.
File metadata
- Download URL: deepparse-0.2.3-py3-none-any.whl
- Upload date:
- Size: 82.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.55.0 CPython/3.8.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1cd2eba6fbf38a8f32709c975b0f1848d218670bfa76fd8b1d401640ff3fa762
|
|
| MD5 |
30654d06bb2ff6b2f73f98f3087d712c
|
|
| BLAKE2b-256 |
234df6b80148605b738ee2b2e7ee87e56252d70ceadeef21550d08b4cef1c2cf
|