Skip to main content

A library for parsing multinational street addresses using deep learning.

Project description

License: LGPL v3 Continuous Integration

Here is deepparse.

Deepparse is a state-of-the-art library for parsing multinational street addresses using deep learning.

Use deepparse to:

  • Use the pre-trained models to parse multinational addresses.
  • Retrain our pre-trained models on new data to parse multinational addresses.

Read the documentation at deepparse.org.

Deepparse is compatible with the latest version of PyTorch and Python >= 3.6.

Countries and Results

The following table presents the accuracy on the 20 countries we used during training for both our models.

Country Fasttext (%) BPEmb (%) Country Fasttext (%) BPEmb (%)
Italy 99.66 99.73 United States 99.56 99.53
Germany 99.72 99.84 Austria 99.19 99.03
South Korea 99.96 100.00 Canada 99.76 99.80
Mexico 99.54 99.60 Australia 99.62 99.74
Finland 99.75 99.87 Netherlands 99.50 99.84
France 99.54 99.50 United Kingdom 99.54 99.62
Russia 98.71 99.49 Norway 99.40 98.71
Switzerland 99.48 99.61 Poland 99.64 99.83
Brazil 99.33 99.24 Denmark 99.65 99.84
Spain 99.70 99.79 Czechia 99.46 99.83

We have also made a zero-shot evaluation of our models using data from 41 other countries; the results are shown in the next table.

Country Fasttext (%) BPEmb (%) Country Fasttext (%) BPEmb (%)
Philippines 81.56 83.73 South Africa 92.69 95.03
Colombia 85.92 87.50 Venezuela 95.36 89.67
Bermuda 91.30 93.66 Lithuania 89.21 76.60
Moldova 88.51 89.13 India 66.91 77.26
Malaysia 81.31 92.78 Bosnia 88.91 84.33
Belgium 89.57 86.41 Ukraine 91.80 92.73
Greece 83.42 39.82 Algeria 86.93 80.62
Slovakia 81.00 91.28 Bangladesh 74.49 79.29
Latvia 93.80 80.18 Reunion 96.48 93.40
Romania 93.23 91.83 Singapore 84.55 81.68
Indonesia 63.15 67.97 Cyprus 97.69 98.30
Portugal 93.39 93.20 Serbia 95.62 94.69
Croatia 96.63 86.24 Japan 44.33 35.77
New Caledonia 99.42 99.01 New Zealand 97.04 98.86
Uzbekistan 87.63 71.93 Faroe Islands 71.73 85.46
Hungary 47.00 24.05 Slovenia 96.27 97.28
Paraguay 97.00 97.15 Iceland 95.76 98.01
Estonia 90.61 76.45 Argentina 89.47 88.55
Bulgaria 92.70 95.87 Sweden 77.29 87.77
Belarus 88.77 93.00 Kazakhstan 87.24 91.23
Ireland 86.35 87.49

Getting started:

from deepparse.parser import AddressParser

address_parser = AddressParser(model_type="bpemb", device=0)

# you can parse one address
parsed_address = address_parser("350 rue des Lilas Ouest Québec Québec G1L 1B6")

# or multiple addresses
parsed_address = address_parser(
    ["350 rue des Lilas Ouest Québec Québec G1L 1B6", "350 rue des Lilas Ouest Québec Québec G1L 1B6"])

# you can also get the probability of the predicted tags
parsed_address = address_parser("350 rue des Lilas Ouest Québec Québec G1L 1B6", with_prob=True)

Retrain a model

see here for a complete example.

# We will retrain the fasttext version of our pretrained model.
address_parser = AddressParser(model_type="fasttext", device=0)

address_parser.retrain(training_container, 0.8, epochs=5, batch_size=8)

Installation

Before installing deepparse, you must have the latest version of PyTorch in your environment.

  • Install the stable version of deepparse:
pip install deepparse
  • Install the latest development version of deepparse:
pip install -U git+https://github.com/GRAAL-Research/deepparse.git@dev

Cite

Use the following for the article;

@misc{yassine2020leveraging,
    title={{Leveraging Subword Embeddings for Multinational Address Parsing}},
    author={Marouane Yassine and David Beauchemin and François Laviolette and Luc Lamontagne},
    year={2020},
    eprint={2006.16152},
    archivePrefix={arXiv}
}

and this one for the package;

@misc{deepparse,
    author = {Marouane Yassine and David Beauchemin},
    title  = {{Deepparse: A state-of-the-art deep learning multinational addresses parser}},
    year   = {2020},
    note   = {\url{https://deepparse.org}}
}

Contributing to Deepparse

We welcome user input, whether it is regarding bugs found in the library or feature propositions ! Make sure to have a look at our contributing guidelines for more details on this matter.

License

Deepparse is LGPLv3 licensed, as found in the LICENSE file.


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deepparse-0.2.3.tar.gz (52.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

deepparse-0.2.3-py3-none-any.whl (82.8 kB view details)

Uploaded Python 3

File details

Details for the file deepparse-0.2.3.tar.gz.

File metadata

  • Download URL: deepparse-0.2.3.tar.gz
  • Upload date:
  • Size: 52.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.55.0 CPython/3.8.6

File hashes

Hashes for deepparse-0.2.3.tar.gz
Algorithm Hash digest
SHA256 a595702faaeaad61ffca4da98ca6a30d78278b08999a91dc2044f5e8cfb76e01
MD5 37f69a57299372b1995082ebea0304eb
BLAKE2b-256 6aae2e51b7e2b8f92935fa7be7a4e04e9261ca9062be47a8ebe9d7c0a81375cf

See more details on using hashes here.

File details

Details for the file deepparse-0.2.3-py3-none-any.whl.

File metadata

  • Download URL: deepparse-0.2.3-py3-none-any.whl
  • Upload date:
  • Size: 82.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.55.0 CPython/3.8.6

File hashes

Hashes for deepparse-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 1cd2eba6fbf38a8f32709c975b0f1848d218670bfa76fd8b1d401640ff3fa762
MD5 30654d06bb2ff6b2f73f98f3087d712c
BLAKE2b-256 234df6b80148605b738ee2b2e7ee87e56252d70ceadeef21550d08b4cef1c2cf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page