Skip to main content

A Python package to parse UK postcodes from text. Useful in applications such as OCR and IDP.

Project description

UK Postcodes Parsing

Test Upload Python Package Test PyPI Release

Extract UK postcodes from text and get rich geographic data. The only Python library that combines intelligent text parsing with comprehensive postcode database lookup.

Perfect for document processing, OCR applications, address validation, and location services.

🚀 Lightweight & Fast: Core text parsing and ONSPD validation requires no database. Rich geographic data requires a one-time small download.

Stats

Quick Start

pip install uk-postcodes-parsing

30-second example - Extract postcodes from text and get enhanced data:

import uk_postcodes_parsing as ukp

# Extract postcodes from any text (emails, documents, OCR results)
text = "Please send the report to our London office at SW1A 1AA or Manchester at M1 1AD"
postcodes = ukp.parse_from_corpus(text)

# Get rich geographic data for each postcode found
for pc in postcodes:
    enhanced = ukp.lookup_postcode(pc.postcode)
    if enhanced:
        print(f"{pc.postcode}: {enhanced.district}, {enhanced.region}")
        print(f"  📍 {enhanced.latitude:.3f}, {enhanced.longitude:.3f}")
        print(f"  🏛️ {enhanced.constituency}")

# Output:
# SW1A 1AA: Westminster, London
#   📍 51.501, -0.142
#   🏛️ Cities of London and Westminster
# M1 1AD: Manchester, North West
#   📍 53.484, -2.245
#   🏛️ Manchester Central

✨ Key Features

🔍 Intelligent Text Parsing

  • Extract postcodes from any text: emails, documents, OCR results
  • OCR error correction: Automatically fixes common mistakes (O↔0, I↔1, etc.)
  • Accurate parsing: Handles all UK postcode formats and variations
  • Confidence scoring: Know how reliable each extracted postcode is

🗺️ Rich Geographic Database (1.8M Postcodes, Feb 2025)

  • 1.8M active UK postcodes with comprehensive metadata
  • 99.3% coordinate coverage - latitude/longitude for nearly all postcodes
  • 25+ data fields per postcode: administrative, political, healthcare, statistical
  • Smart download: 40MB compressed download, expands to ~700MB with optimized indices for fast queries

📍 Spatial Queries & Analysis

  • Find nearest postcodes to any coordinates
  • Reverse geocoding: coordinates → nearest postcode
  • Distance calculations between postcodes using Haversine formula
  • Area searches: get all postcodes in districts, constituencies, etc.

Zero Dependencies & High Performance

  • Pure Python: Uses only standard library, no external dependencies
  • Fast validation: Basic postcode validation without database dependency
  • Cross-platform: Windows, macOS, Linux support
  • Thread-safe: Concurrent access supported

Setup

Full database and compressed database available in each Release.

Smart database Download:

  • Interactive environments (terminal, Jupyter): Prompts before downloading
  • Non-interactive environments: Set UK_POSTCODES_AUTO_DOWNLOAD=1 for automatic downloads (scripts, CI/CD)

Storage Locations:

  • Windows: %APPDATA%\uk_postcodes_parsing\postcodes.db
  • macOS/Linux: ~/.uk_postcodes_parsing/postcodes.db

Using Custom Database:

# Use a locally-built database instead of downloading
ukp.setup_database(local_db_path='/path/to/your/postcodes.db')

# Or set environment variable for database path
export UK_POSTCODES_DB_PATH=/path/to/your/postcodes.db

# Enable automatic downloads (for CI/CD, scripts)
export UK_POSTCODES_AUTO_DOWNLOAD=1

Usage Examples

🔍 Text Parsing → Enhanced Lookup (Complete Workflow)

The most powerful feature - extract postcodes from messy text and get rich data:

import uk_postcodes_parsing as ukp

# Real-world example: Extract from email/document
document = """
Dear Customer,

Your orders will be shipped to:
- London Office: SW1A 1AA (next to Big Ben)
- Manchester Branch: M1 1AD
- Edinburgh Office: EH1 1AD (city center)

For OCR'd text with errors: "Please send to SW1A OAA" (O instead of 0)

Advanced OCR with multiple fixes: "Send to EH16 50Y or M1 IAD"
"""

# Extract all postcodes
postcodes = ukp.parse_from_corpus(document, attempt_fix=True)
print(f"Found {len(postcodes)} postcodes:\n")

# Get comprehensive data for each
for pc in postcodes:
    enhanced = ukp.lookup_postcode(pc.postcode)
    if enhanced:
        print(f"🏠 {pc.postcode}")
        print(f"   📍 Location: {enhanced.district}, {enhanced.region}")
        print(f"   🗺️ Coordinates: {enhanced.latitude:.3f}, {enhanced.longitude:.3f}")
        print(f"   🏛️ Constituency: {enhanced.constituency}")
        print(f"   🏥 Healthcare: {enhanced.healthcare_region}")
        if pc.fix_distance < 0:  # Was corrected
            print(f"   ⚠️  Fixed from: {pc.original}")
        print()

# Advanced OCR: Get all possible corrections for uncertain text
uncertain_postcodes = ukp.parse_from_corpus("OOO 4SS", attempt_fix=True, try_all_fix_options=True)
print(f"Possible corrections: {[p.postcode for p in uncertain_postcodes]}")

🗺️ Direct Postcode Lookup

Get comprehensive data for known postcodes:

import uk_postcodes_parsing as ukp

result = ukp.lookup_postcode("SW1A 1AA")
if result:
    print(f"Postcode: {result.postcode}")
    print(f"Coordinates: {result.latitude}, {result.longitude}")
    print(f"District: {result.district}")
    print(f"County: {result.county}")
    print(f"Region: {result.region}")
    print(f"Country: {result.country}")
    print(f"Constituency: {result.constituency}")
    print(f"Healthcare Region: {result.healthcare_region}")

# Convert to dictionary for APIs/JSON
data = result.to_dict()
print(f"API Response: {data}")

📍 Spatial Queries & Distance

Find postcodes near coordinates or other postcodes:

import uk_postcodes_parsing as ukp

# Find nearest postcodes to coordinates (e.g., GPS location)
lat, lon = 51.5014, -0.1419  # Parliament Square, London
nearest = ukp.find_nearest(lat, lon, radius_km=1, limit=5)

print("Nearest postcodes:")
for postcode, distance in nearest:
    print(f"{postcode.postcode}: {distance:.2f}km - {postcode.district}")

# Reverse geocoding - coordinates to postcode
postcode = ukp.reverse_geocode(lat, lon)
print(f"Closest postcode: {postcode.postcode}")

# Distance between postcodes
london = ukp.lookup_postcode("SW1A 1AA")  # Parliament
edinburgh = ukp.lookup_postcode("EH16 5AY")  # Edinburgh city center
if london and edinburgh:
    distance = london.distance_to(edinburgh)
    print(f"London to Edinburgh: {distance:.1f}km")

🔎 Search & Area Queries

Search and filter postcodes by various criteria:

import uk_postcodes_parsing as ukp

# Search postcodes by prefix
results = ukp.search_postcodes("SW1A", limit=5)
print(f"Found {len(results)} postcodes starting with SW1A")

# Get all postcodes in administrative areas
westminster = ukp.get_area_postcodes("district", "Westminster", limit=1_000_000)
print(f"Westminster district has {len(westminster)} postcodes")

# Search by constituency
constituency = ukp.get_area_postcodes("constituency", "Cities of London and Westminster")
print(f"Constituency has {len(constituency)} postcodes")

# Get all postcodes in a specific outcode
sw1a_postcodes = ukp.get_outcode_postcodes("SW1A")
print(f"SW1A outcode has {len(sw1a_postcodes)} postcodes")

🔧 Regex-Based Validation Utilities

For lightweight validation without database dependency, use the postcode_utils module:

from uk_postcodes_parsing.postcode_utils import (
    is_valid, to_normalised, to_outcode, to_incode,
    to_area, to_district, to_sector, to_unit
)

# Basic validation (regex-only, no database needed)
print(is_valid("SW1A 1AA"))  # True
print(is_valid("INVALID"))   # False

# Extract postcode components
postcode = "SW1A 1AA"
print(to_outcode(postcode))    # "SW1A"
print(to_incode(postcode))     # "1AA"
print(to_area(postcode))       # "SW"
print(to_district(postcode))   # "SW1"
print(to_sector(postcode))     # "SW1A 1"
print(to_unit(postcode))       # "AA"

# Normalize formatting
print(to_normalised("sw1a1aa"))  # "SW1A 1AA"

📊 Database Management & Info

Control database setup and get statistics:

import uk_postcodes_parsing as ukp

# Get database information
info = ukp.get_database_info()
print(f"Database has {info['record_count']:,} postcodes")
print(f"Database size: {info['size_mb']:.1f} MB")
print(f"Source: {info['metadata']['source_date']}")

# Explicit database setup (usually automatic)
success = ukp.setup_database()
if success:
    print("Database ready!")

# Force redownload if needed (rare)
ukp.setup_database(force_redownload=True)

# Get detailed statistics
from uk_postcodes_parsing.postcode_database import PostcodeDatabase
db = PostcodeDatabase()
stats = db.get_statistics()

print(f"Total postcodes: {stats['total_postcodes']:,}")
print(f"With coordinates: {stats['with_coordinates']:,}")
print(f"Coverage: {stats['coordinate_coverage_percent']}%")
print(f"Countries: {stats['countries']}")

API Reference

Text Parsing: parse_from_corpus(), parse(), is_in_ons_postcode_directory() Rich Lookup: lookup_postcode(), search_postcodes(), get_area_postcodes() Spatial Queries: find_nearest(), reverse_geocode(), get_outcode_postcodes() Database: setup_database(), get_database_info()

Data Fields

Each PostcodeResult contains 25+ fields:

Geographic: latitude, longitude, eastings, northings (99.3% coverage) Administrative: district, county, region, country, constituency Healthcare: healthcare_region, nhs_health_authority Statistical: lower_output_area, middle_output_area Postal: postcode, incode, outcode

Environment Configuration

Environment Variables

UK_POSTCODES_AUTO_DOWNLOAD

  • Purpose: Enable automatic database downloads without prompts
  • Values: 1, true, yes (case-insensitive) to enable
  • Use case: CI/CD pipelines, automated scripts, serverless functions
export UK_POSTCODES_AUTO_DOWNLOAD=1

UK_POSTCODES_DB_PATH

  • Purpose: Use custom database file instead of downloading
  • Value: Absolute path to your .db file
  • Use case: Custom-built databases, offline environments
export UK_POSTCODES_DB_PATH=/path/to/custom/postcodes.db

Download Behavior

Interactive Environments (Terminal, Jupyter):

  • Prompts user before downloading: "Download 40MB database? [y/N]"
  • Shows download progress and setup time
  • One-time setup, cached locally

Non-Interactive Environments (Scripts, CI/CD):

  • Provides clear error with setup instructions
  • Use UK_POSTCODES_AUTO_DOWNLOAD=1
  • Prevents unexpected bandwidth usage

Contributing & Development

# Install in development mode
pip install -e .

# Run tests
pip install pytest && pytest tests/ -v

# pre-commit install

Database Creation: ONSPD Usage Guide | Technical Guide

Data Source & Updates

  • Source: ONS Postcode Directory (ONSPD) - February 2025
  • Coverage: All active UK postcodes including Channel Islands, Isle of Man
  • License: Data derived using postcodes.io extraction methodology (MIT License)
  • Updates: Database can be regenerated with newer ONSPD releases using included tools

Acknowledgments

postcodes.io

This library was originally inspired by the excellent work at postcodes.io by Ideal Postcodes. While postcodes.io focuses on providing a comprehensive REST API service, this library evolved to specialize in text parsing and document processing use cases.

Key contributions from postcodes.io:

  • Database processing logic: Our ONSPD data processing pipeline is based on their proven methodology
  • Test data: Reference test cases adapted from their validation suite (MIT License)
  • Field mappings: Administrative area mappings and data structure insights

How this library differ:

  • Python-native: Pure Python implementation with no external dependencies
  • Text extraction focus: Text corpus parsing
  • Offline-first: Local database with automatic setup, no API dependencies
  • Document processing: Optimized for batch text processing and document digitization

ONS (Office for National Statistics)

All postcode data is derived from the ONS Postcode Directory under the Open Government Licence v3.0.

License

Software License

This software is released under the MIT License. Free for commercial and non-commercial use.

See LICENSE file for full terms.

Data License

This library uses the ONS Postcode Directory (ONSPD) dataset, which carries different licensing terms:

Great Britain Postcodes

  • License: UK Open Government Licence v3.0
  • Usage: ✅ Free for both commercial and non-commercial use
  • Requirement: Must acknowledge ONS as data source

Northern Ireland Postcodes (BT postcodes)

  • Non-commercial use: ✅ Free under ONSPD licence terms
  • Commercial use: ✅ Permitted for "Internal Business Use" under End User Licence
  • Other commercial use: Requires separate licence from Land and Property Services NI

Summary for Most Users

  • Personal/Research: ✅ All data free to use
  • Internal Business: ✅ All data free for internal company use
  • Public-facing Commercial: ✅ Great Britain data free, Northern Ireland may require licence

⚠️ Important: This is a best-effort summary. For authoritative licensing information and compliance with your specific use case, please consult the official ONS licensing documentation and seek legal advice if needed.

Data provided "as is" without warranty

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

uk_postcodes_parsing-2.1.0.tar.gz (4.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

uk_postcodes_parsing-2.1.0-py3-none-any.whl (4.9 MB view details)

Uploaded Python 3

File details

Details for the file uk_postcodes_parsing-2.1.0.tar.gz.

File metadata

  • Download URL: uk_postcodes_parsing-2.1.0.tar.gz
  • Upload date:
  • Size: 4.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.6

File hashes

Hashes for uk_postcodes_parsing-2.1.0.tar.gz
Algorithm Hash digest
SHA256 7af6a9ef9f0caeb3e04a0b42e234ce8bfdec40ff91c92b53166e0e077be19376
MD5 341b761540a02bcfc59a5df94d0d1fab
BLAKE2b-256 8d25ee7a585e46f02c744f941e9016244314029c726e648a423855cde23da38a

See more details on using hashes here.

File details

Details for the file uk_postcodes_parsing-2.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for uk_postcodes_parsing-2.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e60ceee06a7fbb85d37a7ffce35895fbea7ef558477b2512940af0596312baa6
MD5 3ac5b443c145de31bf7d072d7f37bbf5
BLAKE2b-256 d7dbac4456ca1a34d447cf4f4c73ec98a10449fb3939b52316010bcc28d4f683

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page