deepparse

A library for parsing multinational street addresses using deep learning.

These details have not been verified by PyPI

Project links

Project description

Here is deepparse.

Deepparse is a state-of-the-art library for parsing multinational street addresses using deep learning.

Use deepparse to

Use the pre-trained models to parse multinational addresses,
retrain our pre-trained models on new data to parse multinational addresses,
retrain our pre-trained models with your own prediction tags easily.

Read the documentation at deepparse.org.

Deepparse is compatible with the latest version of PyTorch and Python >= 3.7.

Countries and Results

We evaluate our models on two forms of address data

clean data which refers to addresses containing elements from four categories, namely a street name, a municipality, a province and a postal code,
incomplete data which is made up of addresses missing at least one category amongst the aforementioned ones.

You can get our dataset here.

Clean Data

The following table presents the accuracy (using clean data) on the 20 countries we used during training for both our models.

Country	Fasttext (%)	BPEmb (%)	Country	Fasttext (%)	BPEmb (%)
Norway	99.06	98.3	Austria	99.21	97.82
Italy	99.65	98.93	Mexico	99.49	98.9
United Kingdom	99.58	97.62	Switzerland	98.9	98.38
Germany	99.72	99.4	Denmark	99.71	99.55
France	99.6	98.18	Brazil	99.31	97.69
Netherlands	99.47	99.54	Australia	99.68	98.44
Poland	99.64	99.52	Czechia	99.48	99.03
United States	99.56	97.69	Canada	99.76	99.03
South Korea	99.97	99.99	Russia	98.9	96.97
Spain	99.73	99.4	Finland	99.77	99.76

We have also made a zero-shot evaluation of our models using clean data from 41 other countries; the results are shown in the next table.

Country	Fasttext (%)	BPEmb (%)	Country	Fasttext (%)	BPEmb (%)
Latvia	89.29	68.31	Faroe Islands	71.22	64.74
Colombia	85.96	68.09	Singapore	86.03	67.19
Réunion	84.3	78.65	Indonesia	62.38	63.04
Japan	36.26	34.97	Portugal	93.09	72.01
Algeria	86.32	70.59	Belgium	93.14	86.06
Malaysia	83.14	89.64	Ukraine	93.34	89.42
Estonia	87.62	70.08	Bangladesh	72.28	65.63
Slovenia	89.01	83.96	Hungary	51.52	37.87
Bermuda	83.19	59.16	Romania	90.04	82.9
Philippines	63.91	57.36	Belarus	93.25	78.59
Bosnia	88.54	67.46	Moldova	89.22	57.48
Lithuania	93.28	69.97	Paraguay	96.02	87.07
Croatia	95.8	81.76	Argentina	81.68	71.2
Ireland	80.16	54.44	Kazakhstan	89.04	76.13
Greece	87.08	38.95	Bulgaria	91.16	65.76
Serbia	92.87	76.79	New Caledonia	94.45	94.46
Sweden	73.13	86.85	Venezuela	79.23	70.88
New Zealand	91.25	75.57	Iceland	83.7	77.09
India	70.3	63.68	Uzbekistan	85.85	70.1
Cyprus	89.64	89.47	Slovakia	78.34	68.96
South Africa	95.68	74.82

Incomplete Data

The following table presents the accuracy on the 20 countries we used during training for both our models but for incomplete data. We didn't test on the other 41 countries since we did not train on them and therefore do not expect to achieve an interesting performance.

Country	Fasttext (%)	BPEmb (%)	Country	Fasttext (%)	BPEmb (%)
Norway	99.52	99.75	Austria	99.55	98.94
Italy	99.16	98.88	Mexico	97.24	95.93
United Kingdom	97.85	95.2	Switzerland	99.2	99.47
Germany	99.41	99.38	Denmark	97.86	97.9
France	99.51	98.49	Brazil	98.96	97.12
Netherlands	98.74	99.46	Australia	99.34	98.7
Poland	99.43	99.41	Czechia	98.78	98.88
United States	98.49	96.5	Canada	98.96	96.98
South Korea	91.1	99.89	Russia	97.18	96.01
Spain	99.07	98.35	Finland	99.04	99.52

Getting Started:

from deepparse.parser import AddressParser

address_parser = AddressParser(model_type="bpemb", device=0)

# you can parse one address
parsed_address = address_parser("350 rue des Lilas Ouest Québec Québec G1L 1B6")

# or multiple addresses
parsed_address = address_parser(
    ["350 rue des Lilas Ouest Québec Québec G1L 1B6", "350 rue des Lilas Ouest Québec Québec G1L 1B6"])

# or multinational addresses
# Canada, US, Germany, UK and South Korea
parsed_address = address_parser(
    ["350 rue des Lilas Ouest Québec Québec G1L 1B6", "777 Brockton Avenue, Abington MA 2351",
     "Ansgarstr. 4, Wallenhorst, 49134", "221 B Baker Street", "서울특별시 종로구 사직로3길 23"])

# you can also get the probability of the predicted tags
parsed_address = address_parser("350 rue des Lilas Ouest Québec Québec G1L 1B6", with_prob=True)

The predictions tags are the following

"StreetNumber": for the street number,
"StreetName": for the name of the street,
"Unit": for the unit (such as apartment),
"Municipality": for the municipality,
"Province": for the province or local region,
"PostalCode": for the postal code,
"Orientation": for the street orientation (e.g. west, east),
"GeneralDelivery": for other delivery information.

Retrain a Model

see here for a complete example.

# We will retrain the fasttext version of our pretrained model.
address_parser = AddressParser(model_type="fasttext", device=0)

address_parser.retrain(training_container, 0.8, epochs=5, batch_size=8)

Retrain a Model With New Tags

See here for a complete example.

address_components = {"ATag":0, "AnotherTag": 1, "EOS": 2}
address_parser.retrain(training_container, 0.8, epochs=1, batch_size=128, prediction_tags=address_components)

Download our Models

Here are the URLs to download our pre-trained models directly

Installation

Before installing deepparse, you must have the latest version of PyTorch in your environment.

Install the stable version of deepparse:

pip install deepparse

Install the latest development version of deepparse:

pip install -U git+https://github.com/GRAAL-Research/deepparse.git@dev

Cite

Use the following for the article;

@misc{yassine2020leveraging,
    title={{Leveraging Subword Embeddings for Multinational Address Parsing}},
    author={Marouane Yassine and David Beauchemin and François Laviolette and Luc Lamontagne},
    year={2020},
    eprint={2006.16152},
    archivePrefix={arXiv}
}

and this one for the package;

@misc{deepparse,
    author = {Marouane Yassine and David Beauchemin},
    title  = {{Deepparse: A state-of-the-art deep learning multinational addresses parser}},
    year   = {2020},
    note   = {\url{https://deepparse.org}}
}

Contributing to Deepparse

We welcome user input, whether it is regarding bugs found in the library or feature propositions ! Make sure to have a look at our contributing guidelines for more details on this matter.

License

Deepparse is LGPLv3 licensed, as found in the LICENSE file.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.9.13

Sep 12, 2024

0.9.12

Jul 9, 2024

0.9.11

Jul 9, 2024

0.9.10

Jun 23, 2024

0.9.9

Aug 11, 2023

0.9.8

May 25, 2023

0.9.7

May 22, 2023

0.9.6

Mar 31, 2023

0.9.5

Feb 23, 2023

0.9.4

Feb 20, 2023

0.9.3

Nov 24, 2022

0.9.2

Sep 23, 2022

0.9.1

Aug 19, 2022

0.9

Aug 19, 2022

0.8.2

Jul 27, 2022

0.8.1

Jul 26, 2022

0.8

Jul 6, 2022

0.7.6

Jun 9, 2022

0.7.4

May 12, 2022

0.7.3

Apr 8, 2022

0.7.2

Mar 20, 2022

0.7.1

Mar 16, 2022

0.7

Feb 11, 2022

0.6.7

Feb 10, 2022

0.6.6

Feb 9, 2022

0.6.5

Feb 9, 2022

0.6.4

Jan 21, 2022

0.6.3

Dec 21, 2021

0.6.2

Dec 13, 2021

0.6.1

Dec 8, 2021

0.5.1

Nov 1, 2021

0.5

Oct 21, 2021

This version

0.4.4

Oct 4, 2021

0.4.3

Oct 1, 2021

0.4.2

Jul 23, 2021

0.4.1

Jun 15, 2021

0.4

Jun 9, 2021

0.3.6

May 28, 2021

0.3.5

May 11, 2021

0.3.4

Apr 4, 2021

0.3.3

Apr 2, 2021

0.3.2

Feb 9, 2021

0.3.1

Feb 9, 2021

0.3

Feb 9, 2021

0.2.3

Dec 27, 2020

0.2.2

Dec 25, 2020

0.2.1

Dec 24, 2020

0.2

Dec 24, 2020

0.1.3.1

Dec 9, 2020

0.1.3

Nov 14, 2020

0.1.2

Oct 19, 2020

0.1

Oct 16, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deepparse-0.4.4.tar.gz (84.6 kB view hashes)

Uploaded Oct 4, 2021 Source

Built Distribution

deepparse-0.4.4-py3-none-any.whl (136.2 kB view hashes)

Uploaded Oct 4, 2021 Python 3

Hashes for deepparse-0.4.4.tar.gz

Hashes for deepparse-0.4.4.tar.gz
Algorithm	Hash digest
SHA256	`702aa4aeed0cb1abd4730d8c68c798b328633cc65c453c8d5d1e50681754b6b3`
MD5	`04336b689d6742e578a8b4336d34b319`
BLAKE2b-256	`c128b11c3803d5fe016dd7324af3f135505582365ee1981d8353f2bbffc5a4c2`

Hashes for deepparse-0.4.4-py3-none-any.whl

Hashes for deepparse-0.4.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`46bff30c9c2b2e79862b1545a2b1ae8806e1ce6dd5b565f0d36595b975277af6`
MD5	`8c0383f2c6049e5c61da35641c95976f`
BLAKE2b-256	`c95289cbbd03de1b47df6716f0eb34043d41cbd7607cf58c67b5ca4503eda58b`