Skip to main content

Radius searches on Canadian FSA codes, location data

Project description

postalcodes-ca

postalcodes-ca is a fork of Scott Rodkey's pypostalcode package, which is itself a fork of Nathan Van Gheem's pyzipcode package. The zipcode database has been replaced with Canadian cities and their postal codes. The general usage is similar.

Install

To install:

pip install postalcodes-ca

A1A 1A1 - a primer on Canadian postal codes

Canadian postal codes are six characters in the format A1A 1A1, where A is a letter and 1 is a digit, with a space separating the third and fourth characters. Postal codes do not use the 6 letters D, F, I, O, Q or U. Additionally, W and Z are not used as the first letter of any postal code.

The first three characters are the forward sortation area (FSA), and the last three are the local delivery unit (LDU). The first letter, called the postal district, tells you the province. Quebec and Ontario have multiple postal districts and X is used for both Nunavut and the Northwest Territories. The second character (a digit) tells you if the postal code is urban or rural, a 0 (e.g. A0A) means it's rural, any other number means it's urban. See Postal codes in Canada on Wikipedia for details.

This module supports looking up both full postal codes and FSA codes. There are 1,651 (+1 for Santa) FSA codes and 877,409 (+1 for Santa) FSA+LDU combinations (out of a maximum possible 18*10*20 = 3,600 FSA codes and 18*10*20*10*20*10 = 7,200,000 postal codes).

Usage

>>> from postalcodes_ca import fsa_codes
>>> fsa_codes['V5K']
FSA(code='V5K', name='Vancouver (North Hastings-Sunrise)', province='British Columbia', latitude=49.2807, longitude=-123.0397, accuracy=6)
>>> fsa_codes.get('V5K')
FSA(code='V5K', name='Vancouver (North Hastings-Sunrise)', province='British Columbia', latitude=49.2807, longitude=-123.0397, accuracy=6)
>>> fsa_codes.get('v5k')
[...]
ValueError: invalid FSA, must start with one of ABCEGHJKLMNPRSTVXY: 'v5k'
>>> fsa_codes.get('v5kblahblah', strict=False)  # only the first 3 characters are used
FSA(code='V5K', name='Vancouver (North Hastings-Sunrise)', province='British Columbia', latitude=49.2807, longitude=-123.0397, accuracy=6)

Get a list of FSA codes given a radius in kilometers (multiply by 1.609344 if you have miles). Note that this actually searches a square area, not a circle with a radius:

>>> results = fsa_codes.get_nearby('V5K', radius=4)
>>> for r in results:
...     print(f"{r.code}: {r.name}, {r.province}")
... 
V5K: Vancouver (North Hastings-Sunrise), British Columbia
V5L: Vancouver (North Grandview-Woodlands), British Columbia
V5M: Vancouver (South Hastings-Sunrise / North Renfrew-Collingwood), British Columbia
V5N: Vancouver (South Grandview-Woodlands / NE Kensington), British Columbia
V7L: North Vancouver South Central, British Columbia
V5C: Burnaby (Burnaby Heights / Willingdon Heights / West Central Valley), British Columbia
V5G: Burnaby (Cascade-Schou / Douglas-Gilpin), British Columbia

Search by code, city name or province name using SQL syntax:

>>> fsa_codes.search(name='Calgary')  # exact match
[FSA(code='T3S', name='Calgary', province='Alberta', latitude=50.9153, longitude=-113.8932, accuracy=4)]
>>> len(fsa_codes.search(name='Calgary%'))
35
>>> len(fsa_codes.search(code='T2%'))
20
>>> len(fsa_codes.search(province='Alberta'))
154
>>> fsa_codes.search(province='California')  # returns None
>>> 

There's an identical API for postal codes, but keep in mind that the data is of a lower quality (see below):

>>> from postalcodes_ca import postal_codes
>>> postal_codes['M5V 3L9']
PostalCode(code='M5V 3L9', name='Toronto', province='Ontario', latitude=43.642, longitude=-79.386)
>>> postal_codes.get('M5V 3L9')
PostalCode(code='M5V 3L9', name='Toronto', province='Ontario', latitude=43.642, longitude=-79.386)
>>> postal_codes.get('m5v3l9')
[...]
ValueError: invalid postal code, must be 7 characters: 'm5v3l9'
>>> postal_codes.get('m5v3l9blahblah', strict=False)  # only the first 6 or 7 characters are used
PostalCode(code='M5V 3L9', name='Toronto', province='Ontario', latitude=43.642, longitude=-79.386)

Check if a string matches the format of a postal code or FSA code:

>>> from postalcodes_ca import parse_postal_code, parse_fsa
>>> parse_postal_code('m5v3l9 blah  ')
'M5V 3L9'
>>> parse_postal_code('m5v3l9 blah  ', strict=True)
[...]
ValueError: invalid postal code, must be 7 characters: 'm5v3l9 blah  '
>>> parse_fsa('M5V')
'M5V'
>>> parse_fsa('M5V 3L9')
'M5V'
>>> parse_fsa('M5V 3L9', strict=True)
[...]
ValueError: invalid FSA, must be 3 characters: 'M5V 3L9'

Notes

H0H 0H0 - Santa's postal code

There is a special postal code for Santa Claus which looks like this:

>>> postal_codes["H0H 0H0"]
PostalCode(code='H0H 0H0', name='Reserved (Santa Claus)', province='Quebec', latitude=90.0, longitude=0.0)
>>> fsa_codes['H0H']
FSA(code='H0H', name='Reserved (Santa Claus)', province='Quebec', latitude=90.0, longitude=0.0, accuracy=None)

Even though Santa lives at the North Pole, the province is given as "Quebec" because H starts a Quebec postal code.

postalcodes-ca treats H0H 0H0 like any other postal code because it's a legitimate postal code that gets a million letters each year.

Differences between data in postal_codes and fsa_codes

PostalCode names never have accents but some FSA names do:

>>> fsa_codes["G4X"].name
'Gaspé'
>>> postal_codes["G4X 6T9"].name
'Gaspe'

FSA codes' names can be more descriptive

>>> fsa_codes["V5K"].name
'Vancouver (North Hastings-Sunrise)'
>>> postal_codes["V5K 5G9"].name
'Vancouver'

FSA codes have an accuracy property which is either None or an integer between 1-6 (inclusive) representing the accuracy of their lat/lng coordinates where "1=estimated, 4=geonameid, 6=centroid of addresses or shape"

>>> fsa_codes["G4X"].accuracy
4

About a dozen FSA codes have None as their .accuracy.

For PostalCodes, .accuracy is always 6.

Postal code location data isn't always accurate

There are at least 92 PostalCodes whose latitude/longitude coordinates are completely outside Canada. I found this using basic sanity checking (see import.py), which probably means that there are more datapoints which are wrong. See this post on the GeoNames mailing list for details.

The data has multiple entries for some postal codes

In the original data there are 4 duplicate entries for FSA codes and 842 duplicate entries for postal codes. Usually those contain extra names for the postal code (for codes that cover multiple places) but sometimes the lat/long coordinates can be different as well. postalcodes-ca just uses the first code to appear in the CSV.

Internally reserved codes are not included

There are some FSA codes such as A9X which are "reserved for internal testing", those are not in the data:

>>> fsa_codes['A9X']
[...]
KeyError: 'A9X'

Postal codes and FSAs are not actually points

While this package associates postal codes and FSA codes to points, these codes actually represent areas, as you can see from this map of FSA regions:

Map of Canada split into FSA postal regions as of 2016

(data from the 2016 census visualized using QGIS, see https://github.com/inkjet/pypostalcode/issues/6 for details)

Data is CC BY 4.0

https://download.geonames.org/export/zip/

The data is from GeoNames. It's distributed under a CC BY 4.0 license. Please respect the license if you use this module.

Development

How to contribute to the data

If you notice an issue with the data, you can report it by creating a GitHub account and creating a new issue.

If you want to fix the issue yourself, then look at CA.tsv, figure out what needs to be changed and report the issue to the GeoNames project on their mailing list. Once it is fixed you can create an issue on postalcodes-ca to tell us to update the data.

How to update the vendored data

cd postalcodes-ca/
bash update_data.sh

or you can do it manually:

CA.tsv - FSA codes

  1. cd into the same directory as this readme file
  2. go to https://download.geonames.org/export/zip/
  3. download CA.zip (not CA_full.csv.zip)
  4. unzip the file into this directory with unzip CA.zip CA.txt
  5. compare the file you just downloaded against the one that's already used with diff CA.tsv CA.txt. If that command produces no output, there's nothing more to do
  6. rename CA.txt to CA.tsv with mv CA.txt CA.tsv (we rename the file so that it renders nicely on GitHub)
  7. run python3 postalcodes-ca/import.py to update the postalcodes-ca/postalcodes.db file

CA_full.tsv - postal codes

  1. cd into the same directory as this readme file
  2. go to https://download.geonames.org/export/zip/
  3. download CA_full.csv.zip (not CA.zip)
  4. unzip the file into this directory with unzip CA_full.csv.zip CA_full.txt
  5. run python3 postalcodes-ca/import.py to update the postalcodes-ca/postalcodes.db file

Package size

Just the database of FSA codes (CA.txt/CA.tsv) is negligible, the original data is 40KB zipped, 124KB unzipped and 250KB as sqlite (with indices).

The full postal codes database CA_full.txt (downloaded as CA_full.csv.zip) is 6MB zipped, 48MB unzipped. The sqlite .db file with only the 4 important fields (without indices) is 37MB. With a province field it grows to 46MB and with indices further to 95MB. When uploading to PyPI the package is zipped down to 36 MB which is below PyPI's 60MB limit, but this might cause issues in the future.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

postalcodes-ca-0.0.9.tar.gz (22.9 MB view hashes)

Uploaded Source

Built Distribution

postalcodes_ca-0.0.9-py3-none-any.whl (23.3 MB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page