A library to validate and extract information from national id numbers
Project description
A library for validating national id numbers and extracting any embedded data from them.
Supports multiple countries; each validator can validate format/checksum and (where applicable) extract embedded data (DOB, gender, region codes, etc.).
Installation
From PyPI (end users)
pip install id-validation
Local Development
# Clone the repository
git clone https://github.com/adieyal/id_validation.git
cd id_validation
# Create and activate a virtual environment
python3 -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install in editable mode with dev dependencies
make install
# Or manually:
pip install -e ".[dev]"
# Run tests to verify
pytest
Usage
from id_validation import ValidatorFactory
validator = ValidatorFactory.get_validator("ZW")
# Use the validate method to test whether a number is valid or not according to country-specific rules
assert validator.validate("50-025544-Q-12")
# The extract data method returns any data that might be encoded into the id number. This is country specific.
data = validator.extract_data("50-025544-Q-12")
assert data["registration_region"] == "Mutasa"
assert data["district"] == "Chivi"
assert data["sequence_number"] == "025544"
Countries
The following codes are available:
BW - Botswana
NG - Nigeria
ZA - South Africa
ZA_OLD - South African (Apartheid-era). See the note below for more information
ZW - Zimbabwe
BE - Belgium (NRN)
BG - Bulgaria (EGN)
CZ - Czech Republic (rodné číslo)
DK - Denmark (CPR)
EE - Estonia (isikukood)
FI - Finland (HETU)
FR - France (NIR / Numéro de sécurité sociale)
IT - Italy (Codice Fiscale)
LT - Lithuania (Asmens kodas)
LV - Latvia (personas kods)
NO - Norway (Fødselsnummer)
PL - Poland (PESEL)
RO - Romania (CNP)
SK - Slovakia (rodné číslo)
ES - Spain (DNI/NIE)
SE - Sweden (Personnummer)
TR - Turkey (T.C. Kimlik No)
BR - Brazil (CPF)
CL - Chile (RUT/RUN)
HR - Croatia (OIB)
MX - Mexico (CURP)
NL - Netherlands (BSN)
PT - Portugal (NIF)
SI - Slovenia (EMŠO)
AR - Argentina (CUIT/CUIL)
CA - Canada (SIN)
CO - Colombia (NIT)
EC - Ecuador (cédula)
Supported countries & extracted fields
| Code | Country / ID | Extracted fields (when valid) |
|---|---|---|
| BW | Botswana | gender |
| NG | Nigeria | (none – format only) |
| ZA | South Africa (post-apartheid) | dob, gender, checksum, citizenship |
| ZA_OLD | South Africa (apartheid-era) | dob, gender, checksum, citizenship, race |
| ZW | Zimbabwe | registration_region, district, sequence_number |
| BE | Belgium (NRN) | dob, gender, sequence, checksum |
| BG | Bulgaria (EGN) | dob, gender, birth_order, checksum |
| CZ | Czech Republic (rodné číslo) | dob, gender, century, month_raw, special_series, extension, checksum |
| DK | Denmark (CPR) | dob, gender, century, sequence, checksum_valid (lenient by default) |
| EE | Estonia (isikukood) | dob, gender, serial, checksum |
| FI | Finland (HETU) | dob, gender, century, individual_number, checksum |
| FR | France (NIR) | dob (month-level; day not encoded), gender, department, commune, order, key, year, month |
| IT | Italy (Codice Fiscale) | dob, gender, municipality_code, checksum |
| LT | Lithuania (Asmens kodas) | dob, gender, century, serial, checksum |
| LV | Latvia (personas kods) | dob (legacy only), century, century_digit, serial (legacy only) |
| NO | Norway (fødselsnummer) | dob, gender, individual_number, control_digits |
| PL | Poland (PESEL) | dob, gender, serial, checksum |
| RO | Romania (CNP) | dob, gender, county_code, county_name (best-effort), serial, checksum |
| SK | Slovakia (rodné číslo) | dob, gender, century, month_raw, special_series, extension, checksum |
| ES | Spain (DNI/NIE) | type (DNI/NIE), plus number, letter (and prefix for NIE) |
| SE | Sweden (personnummer) | dob, gender, coordination_number, individual_number, checksum |
| TR | Turkey (TCKN) | checksum10, checksum11 (no DOB/gender encoded) |
| BR | Brazil (CPF) | check_digits |
| CL | Chile (RUT/RUN) | number, dv |
| HR | Croatia (OIB) | checksum |
| MX | Mexico (CURP) | dob, gender, state_code, state_name, homonym, checksum |
| NL | Netherlands (BSN) | (none) |
| PT | Portugal (NIF) | checksum |
| SI | Slovenia (EMŠO) | dob, gender, region_code, serial, checksum |
| AR | Argentina (CUIT/CUIL) | prefix, dni, category, checksum |
| CA | Canada (SIN) | (none) |
| CO | Colombia (NIT) | base, dv, checksum |
| EC | Ecuador (cédula) | province_code, province_name, third_digit, serial, checksum |
References
See docs/references/*.md for per-country reference links and implementation notes.
Botswana (BW)
Note - the validation logic has been implemented from anecdotal information available online and not against official documentation.
>>> import id_validation
>>> from id_validation import ValidatorFactory
>>> validator = ValidatorFactory.get_validator("BW")
>>> validator.validate("379219515")
True
>>> validator.extract_data("379219515")
{'gender': 'Male'}
Nigeria
Nigerian id numbers consist of 11 randomly selected digits. Find the regulations here.
>>> import id_validation
>>> from id_validation import ValidatorFactory
>>> validator = ValidatorFactory.get_validator("NG")
>>> validator.validate("35765421356")
True
South Africa (ZA)
South African ids contain the following information:
- Date of birth
- Gender
- Citizenship (citizen or permanent resident)
>>> import id_validation
>>> from id_validation import ValidatorFactory
>>> validator = ValidatorFactory.get_validator("ZA")
>>> validator.validate("7106245929185")
True
>>> validator.extract_data("7106245929185")
{'dob': datetime.datetime(1971, 6, 24, 0, 0), 'gender': <GENDER.MALE: 1>, 'checksum': 5, 'citizenship': <CITIZENSHIP_TYPE.PERMANENT_RESIDENT: 1>}
South Africa - Apartheid-era (ZA_OLD)
Apartheid-era South African ids contain the following information:
- Date of birth
- Gender
- Race
>>> import id_validation
>>> from id_validation import ValidatorFactory
>>> validator = ValidatorFactory.get_validator("ZA_OLD")
>>> validator.validate("7106245929185")
True
>>> validator.extract_data("7106245929185")
{'dob': datetime.datetime(1971, 6, 24, 0, 0), 'gender': <GENDER.MALE: 1>, 'checksum': 5, 'race': <RACE.CAPE_COLOURED: 1>}
Note
These id numbers were used during the Apartheid-era. They encoded the race of the ID holder. The 1986 Identification Act removed this identifier and all id numbers were changed to the more modern version which only encodes citizenship. This validator is included for completeness. I have never seen an old id number in any dataset I have ever worked with, so avoid using it unless you are sure that your ids are pre-1986. More information can be found here
Zimbabwe (ZW)
Zimbabwe IDs contain the following information:
- Registration region
- Father's district
>>> import id_validation
>>> from id_validation import ValidatorFactory
>>> validator = ValidatorFactory.get_validator("ZW")
>>> validator.validate("50-025544-Q-12")
True
>>> validator.extract_data("50-025544-Q-12")
{'registration_region': 'Mutasa', 'district': 'Chivi', 'sequence_number': '025544'}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file id_validation-0.6.1.tar.gz.
File metadata
- Download URL: id_validation-0.6.1.tar.gz
- Upload date:
- Size: 31.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a8597f2127b07a93e74e5c874d86420cead4d5e7e5b88e8c72f19c2e5f170a92
|
|
| MD5 |
2be508401f98bfd2eb6dc23842e2e0b1
|
|
| BLAKE2b-256 |
8f4965ee58af4eab5d53b3de788420807477deb86218bb849cd5daeac8612f85
|
File details
Details for the file id_validation-0.6.1-py3-none-any.whl.
File metadata
- Download URL: id_validation-0.6.1-py3-none-any.whl
- Upload date:
- Size: 45.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5c8d2063bee24478102db92a4805ba4794f29e132ba55e5c71d16fe99c2dea6e
|
|
| MD5 |
ffcdeb23bf5c2db4a1f42e4e64170290
|
|
| BLAKE2b-256 |
8808c7cf05f3cc6d3b12ce37a7ac408cf6b1b410d0ec242cdfca25c299207731
|