Toolbox on french cities: set vintage, find departments, find cities...
Project description
french-cities
This repo contains the documentation of the python french-cities package, a package aimed at improving the referencing of municipalities in French 🇫🇷 datasets.
Documentation
A full documentation with usecases is available at https://tgrandje.github.io/french-cities/. Obviously, it is only available in french as yet. Any help is welcome to build a multi-lingual documentation website.
Until then, a basic english documentation will stay available in the present README.
Why french-cities?
Do you have any data:
- which municipal locations are provided through approximate addresses, or via geographical 🗺️ coordinates?
- which municipalities are referenced by their postal codes and their labels 😮?
- which departments are written in full text 🔡?
- which spelling are dubious (for instance, torturing the
LoireLoir-et-Cher) or obsolete (for instance, referencing Templeuve, a city renamed as Templeuve-en-Pévèle since 2015)? - or compiled over the years and where cities' codes are a patchwork of multiple 🤯 vintages?
Then 'french-cities' is for you 🫵!
Installation
pip
pip install french-cities
conda
Configuration
Setting INSEE's API keys
french-cities uses pynsee under the hood. Starting from pynsee 0.2.0 (and french-cities 1.1.0),
an API key is not necessary anymore.
Note that as pynsee is far more than just retrieving information on cities:
by default, it will alert you on missing SIRENE API keys.
french-cities should silence those alerts (as they are not relevant to
the present usecases). If those alerts popup, please get in touch.
Working behind a corporate proxy
Please set those (usual) environment variables to allow working behind a proxy:
- http_proxy (if accessing web behind a corporate proxy)
- https_proxy (if accessing web behind a corporate proxy)
If you can't set those variables directly, you can either have a look at python-dotenv or set those directly using python:
import os
os.environ["https_proxy"] = "http://my_proxy_server:port"
os.environ["http_proxy"] = "http://my_proxy_server:port"
Session management
Note that pynsee and geopy (both used under the hood) use their own web session.
Every Session object you will pass to french-cities will neither be shared with
pynsee nor geopy.
This explains the possibility to pass a session as an argument to french-cities
functions, even if you had to configure the corporate proxy through environment
variables (those will also impact pynsee and geopy).
Basic usage
Retrieve departements' codes
french-cities can retrieve departement's codes from postal codes, official
(COG/INSEE) codes or labels.
Working from postal codes will make use of the BAN (Base Adresse Nationale) and should return correct results. The case of "Cedex" codes is only partially covered by the BAN, so OpenDataSoft's API, constructed upon Christian Quest works. This consumes the freemium API and no authentication is included: the user of the present package should check the current API's legal terms directly on OpenDataSoft's website.
Working from official codes may sometime give empty results (when working on an old dataset and with cities which have changed of departments, which is rarely seen). This is deliberate: it will mostly use the first characters of the cities' codes (which is a fast process and 99% accurate) instead of using an API (which is lengthy though foolproof).
from french_cities import find_departements
import pandas as pd
df = pd.DataFrame(
{
"code_postal": ["59800", "97133", "20000"],
"code_commune": ["59350", "97701", "2A004"],
"communes": ["Lille", "Saint-Barthélémy", "Ajaccio"],
"deps": ["59", "977", "2A"],
}
)
df = find_departements(df, source="code_postal", alias="dep_A", type_field="postcode")
df = find_departements(df, source="code_commune", alias="dep_B", type_field="insee")
df = find_departements(df, source="communes", alias="dep_C", type_field="label")
print(df)
For a complete documentation on find_departements, please type help(find_departements).
Retrieve cities' codes
french-cities can retrieve cities' codes from multiple fields. It will work
out basic mistakes (up to a certain limit).
The columns used by the algorithm can be (in the order of precedence used by the algorithm):
- 'x' and 'y' (in that case, epsg must be explicitly given);
- 'postcode' and 'city'
- 'address', 'postcode' and 'city'
- 'department' and 'city'
Note that the algorithm can (and will) make errors using xy coordinates on a older vintage (ie different from the current one) in the case of historic splitting of cities (the geographic files are not vintaged yet).
The lexical (postcode, city, address, departement) recognition is based on a python fuzzy matching, the BAN API(base adresse nationale) or the Nominatim API of OSM (if activated). The algorithm won't collect underscored results, but failures may still occure.
from french_cities import find_city
import pandas as pd
df = pd.DataFrame(
[
{
"x": 2.294694,
"y": 48.858093,
"location": "Tour Eiffel",
"dep": "75",
"city": "Paris",
"address": "5 Avenue Anatole France",
"postcode": "75007",
"target": "75056",
},
{
"x": 8.738962,
"y": 41.919216,
"location": "mairie",
"dep": "2A",
"city": "Ajaccio",
"address": "Antoine Sérafini",
"postcode": "20000",
"target": "2A004",
},
{
"x": -52.334990,
"y": 4.938194,
"location": "mairie",
"dep": "973",
"city": "Cayenne",
"address": "1 rue de Rémire",
"postcode": "97300",
"target": "97302",
},
{
"x": np.nan,
"y": np.nan,
"location": "Erreur code postal Lille/Lyon",
"dep": "59",
"city": "Lille",
"address": "1 rue Faidherbe",
"postcode": "69000",
"target": "59350",
},
]
)
df = find_city(df, epsg=4326)
print(df)
For a complete documentation on find_city, please type
help(find_city).
Note : to activate geopy (Nominatim API from OpenStreeMap) usage in last
resort, you will need to use the argument use_nominatim_backend=True.
Set vintage to cities' codes
french-cities can try to project a given dataframe into a set vintage,
starting from an unknown vintage (or even a non-vintaged dataset, which is
often the case).
Error may occur for splitted cities as the starting vintage is unknown (or inexistant).
In case of a known starting vintage, you can make use of
INSEE's projection API (with pynsee). Note that this might prove slower as
each row will have to induce a request to the API (which allows up to
30 requests/minute).
Basically, the algorithm of french-cities will try to see if a given city
code exists in the desired vintage:
- if yes, it will be kept (we the aforementionned approximation regarding restored cities);
- if not, it will look in older vintages and make use of INSEE's projection API.
This algorithm will also:
- convert communal districts' into cities' codes;
- convert delegated or associated cities' codes into it's parent's.
from french_cities import set_vintage
import pandas as pd
df = pd.DataFrame(
[
["07180", "Fusion"],
["02077", "Commune déléguée"],
["02564", "Commune nouvelle"],
["75101", "Arrondissement municipal"],
["59298", "Commune associée"],
["99999", "Code erroné"],
["14472", "Oudon"],
],
columns=["A", "Test"],
index=["A", "B", "C", "D", 1, 2, 3],
)
df = set_vintage(df, 2023, field="A")
print(df)
For a complete documentation on set_vintage, please type
help(set_vintage).
External documentation
french-cities makes use of multiple APIs. Please read :
- documentation (in french) on API Adresse
- documentation (in french) on OpenDataSoft API
- Nominatim Usage Policy
Support
In case of bugs, please open an issue on the repo.
Contribution
Any help is welcome. Please refer to the CONTRIBUTING file.
Author
Thomas GRANDJEAN (DREAL Hauts-de-France, service Information, Développement Durable et Évaluation Environnementale, pôle Promotion de la Connaissance).
Licence
Project Status
Stable.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file french_cities-1.1.4.tar.gz.
File metadata
- Download URL: french_cities-1.1.4.tar.gz
- Upload date:
- Size: 41.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.3 CPython/3.9.23 Linux/6.11.0-1018-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7f148c5e46e79578311b905aa0ee0c89e9a31a1a1dd942c3ca4860ac547c55b2
|
|
| MD5 |
edaaae886ae65a0b75b082e1da4df0ec
|
|
| BLAKE2b-256 |
e4ac97f78b93a69281307b90a5aaa052f492aa88b312aa5dc92408551c7651d8
|
File details
Details for the file french_cities-1.1.4-py3-none-any.whl.
File metadata
- Download URL: french_cities-1.1.4-py3-none-any.whl
- Upload date:
- Size: 42.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.3 CPython/3.9.23 Linux/6.11.0-1018-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ee9a92a28cff69b4252c96afcb5a8398a108c7096174805c2982f3801c89229a
|
|
| MD5 |
c3d2061a7a873142c321e3b8449dccd6
|
|
| BLAKE2b-256 |
bdab15b1ff175a55ac4bd56b0cf83b427c1d088c2ee4893c3b327b10f256bd32
|