Python SDK for accessing internacia-db data with support for countries, international blocks, and fuzzy search
Project description
Internacia Python SDK
A Python SDK for accessing internacia-db data with support for countries, international blocks, and fuzzy search across multiple languages.
Features
- 🌍 Countries Data: Access comprehensive country information including codes, names, currencies, languages, and more
- 🏛️ International Blocks: Query international organizations, alliances, and unions
- 🔍 Fuzzy Search: Multi-language search supporting partial matches across names, codes, acronyms, and translations
- 🚀 Thread-Safe: Built on DuckDB with thread-safe connection handling
- 🌐 Multi-Language: Search works across all languages supported in the dataset
- ⚡ Fast: Leverages DuckDB for high-performance queries
- ✅ Well-Tested: Comprehensive test suite with 100% coverage of all public APIs
Installation
pip install internacia
For download progress bars, install with the progress extra:
pip install internacia[progress]
Or install from source:
git clone https://github.com/commondataio/internacia-python.git
cd internacia-python
pip install -e .
Prerequisites
Before using the SDK, you need to have the internacia-db dataset built. The SDK expects a DuckDB database file at:
../internacia-db/data/datasets/internacia.duckdb
To build the database, navigate to the internacia-db repository and run:
python3 scripts/builder.py build --formats duckdb
Downloading the Database
Alternatively, you can download a pre-built database directly from GitHub releases (similar to NLTK's download mechanism):
from internacia import download_database
# Download the latest version
db_path = download_database()
# Download a specific version
db_path = download_database(version="v1.0.0")
# Force re-download even if already cached
db_path = download_database(force=True)
# Download with progress bar (requires tqdm: pip install internacia[progress])
db_path = download_database(show_progress=True)
# Download without progress bar
db_path = download_database(show_progress=False)
The database will be cached in ~/.internacia/internacia.duckdb by default. You can customize the cache location using the INTERNACIA_CACHE_DIR environment variable:
export INTERNACIA_CACHE_DIR="/path/to/cache"
The SDK will automatically use the cached database if found. You can also check for updates:
from internacia import check_for_updates, get_latest_version
# Check if updates are available
update_info = check_for_updates()
if update_info["has_update"]:
print(f"Latest version: {update_info['latest_version']}")
download_database()
# Get latest version without downloading
latest = get_latest_version()
print(f"Latest version: {latest}")
Keeping the Database Up to Date
To keep your SDK database current with the latest data from internacia-db:
-
Check for updates in the
internacia-dbrepository:cd ../internacia-db git pull origin main
-
Rebuild the database with the latest data:
python3 scripts/builder.py build --formats duckdb
-
Verify the database was updated successfully:
ls -lh data/datasets/internacia.duckdb
Helper Script: For convenience, you can use the provided script to automate this process:
./scripts/update_database.sh
This script will automatically pull the latest changes, rebuild the database, and verify the update.
The SDK will automatically use the updated database file on the next client initialization. No code changes are required.
When to update:
- After pulling new changes from the internacia-db repository
- When you need the latest country or international block data
- If you notice data inconsistencies or missing information
Download Method: You can also update using the download function:
from internacia import download_database
download_database() # Downloads latest version
Quick Start
from internacia import InternaciaClient
# Initialize the client
client = InternaciaClient()
# Get a country by code
country = client.countries.get_by_code("US")
print(country["name"]) # "United States"
# Search for countries or blocks
results = client.search.fuzzy("United States")
for result in results:
print(f"{result['type']}: {result['name']}")
# Get an international block
block = client.intblocks.get_by_id("EU")
print(block["name"]) # "European Union"
Usage
Countries
Get Country by Code
# By ISO 3166-1 alpha-2 code
country = client.countries.get_by_code("US")
# By ISO 3166-1 alpha-3 code
country = client.countries.get_by_iso3("USA")
# By numeric code
country = client.countries.get_by_numeric_code("840")
Query Countries
# Get all countries
all_countries = client.countries.get_all()
# Get UN members
un_members = client.countries.get_un_members()
# Get independent countries
independent = client.countries.get_independent()
# Get countries by region
countries = client.countries.get_by_region("NAC") # North America
# Get countries by income level
countries = client.countries.get_by_income_level("OEC") # High income: OECD
# Get countries by continent
countries = client.countries.get_by_continent("Europe")
# Get countries by currency
countries = client.countries.get_by_currency("EUR")
# Get countries by language
countries = client.countries.get_by_language("eng")
# Get total count
count = client.countries.count()
International Blocks
Get Block by ID
block = client.intblocks.get_by_id("EU")
Query Blocks
# Get all blocks
all_blocks = client.intblocks.get_all()
# Get blocks by type
blocks = client.intblocks.get_by_blocktype("economic")
# Get blocks by status
blocks = client.intblocks.get_by_status("formal")
# Get blocks by geographic scope
blocks = client.intblocks.get_by_geographic_scope("regional")
# Get blocks containing a specific country
blocks = client.intblocks.get_by_member("US")
# Get blocks by acronym
blocks = client.intblocks.get_by_acronym("EU")
# Get blocks by tag
blocks = client.intblocks.get_by_tag("trade")
# Get blocks by topic
blocks = client.intblocks.get_by_topic("economy")
# Get blocks founded in a specific year
blocks = client.intblocks.get_by_founded_year(1993)
# Get total count
count = client.intblocks.count()
Search
Fuzzy Search
The fuzzy search function searches across countries and international blocks, supporting:
- Country names (in any language via native_names)
- Country codes (ISO 3166-1 alpha-2, alpha-3, numeric)
- International block names
- Block translations (in any language)
- Block acronyms (in any language)
- Block IDs
- Block tags
# Search across both countries and blocks
results = client.search.fuzzy("United States")
# Search only countries
countries = client.search.search_countries("United")
# Search only blocks
blocks = client.search.search_intblocks("EU")
# Search with custom limit
results = client.search.fuzzy("Europe", limit=5)
# Control search scope
results = client.search.fuzzy("EU", search_countries=False, search_intblocks=True)
# Multi-language search
results = client.search.fuzzy("Европа") # Russian
results = client.search.fuzzy("欧盟") # Chinese
results = client.search.fuzzy("UE") # French acronym
The fuzzy() method supports the following parameters:
query: Search query string (required)limit: Maximum number of results to return (default: 10)search_countries: Whether to search countries (default: True)search_intblocks: Whether to search international blocks (default: True)
Custom Database Path
If your database is in a different location:
from pathlib import Path
client = InternaciaClient(
db_path=Path("/path/to/internacia.duckdb")
)
Configuration
The SDK supports multiple ways to configure database paths and logging:
Environment Variables
# Set custom database path
export INTERNACIA_DB_PATH="/path/to/internacia.duckdb"
# Set log level (DEBUG, INFO, WARNING, ERROR, CRITICAL)
export INTERNACIA_LOG_LEVEL="DEBUG"
# Set custom cache directory for downloaded databases
export INTERNACIA_CACHE_DIR="/path/to/cache"
The SDK will automatically use these environment variables if set. Priority order for database path resolution:
- Explicit
db_pathparameter INTERNACIA_DB_PATHenvironment variable- Default relative path (
../internacia-db/data/datasets/internacia.duckdb) - Cached database from
~/.internacia/internacia.duckdb
Logging
The SDK includes comprehensive logging for debugging and monitoring:
import logging
import os
# Set log level via environment variable
os.environ["INTERNACIA_LOG_LEVEL"] = "DEBUG"
from internacia import InternaciaClient
# Logging is automatically configured on client initialization
client = InternaciaClient()
# Or configure logging programmatically
logging.getLogger("internacia").setLevel(logging.DEBUG)
Log levels:
DEBUG: Detailed information for debugging (queries, results)INFO: General informational messagesWARNING: Warning messages (default)ERROR: Error messages onlyCRITICAL: Critical errors only
Type Hints
The SDK provides comprehensive type hints using TypedDict models for better IDE support and type checking:
from internacia import InternaciaClient, Country, Intblock, SearchResult
client = InternaciaClient()
# All methods return properly typed dictionaries
country: Country = client.countries.get_by_code("US")
block: Intblock = client.intblocks.get_by_id("EU")
results: list[SearchResult] = client.search.fuzzy("United States")
Available types:
Country: Country data structureIntblock: International block data structureSearchResult: Search result with type indicator- Supporting types:
CapitalCity,Region,IncomeLevel,Language,Currency,NativeName,Translation,Acronym,Topic,Member
Error Handling
The SDK provides a custom exception hierarchy for better error handling:
from internacia import InternaciaClient
from internacia.exceptions import (
InternaciaError,
DatabaseError,
NotFoundError,
ValidationError,
DownloadError,
VersionError,
)
client = InternaciaClient()
try:
country = client.countries.get_by_code("XX") # Invalid code
except ValidationError as e:
print(f"Invalid input: {e}")
except NotFoundError as e:
print(f"Not found: {e}")
except DatabaseError as e:
print(f"Database error: {e}")
Exception types:
InternaciaError: Base exception for all SDK errorsDatabaseError: Database connection or query errorsNotFoundError: Resource not found (e.g., country code doesn't exist)ValidationError: Input validation errors (e.g., empty query string)DownloadError: Database download failuresVersionError: Version-related errors
Data Structure
Country Fields
code: ISO 3166-1 alpha-2 code (e.g., "US")name: Common nameiso3code: ISO 3166-1 alpha-3 codenumeric_code: ISO 3166-1 numeric codeofficial_name: Official full namecapital_city:{name, lng, lat}region: World Bank region{id, value}incomeLevel: World Bank income level{id, value}languages: List of{code, name, official}currencies: List of{code, name, symbol}un_member: Booleanindependent: Booleancontinents: List of continent namesborders: List of bordering country codesnative_names: Map of language code ->{official, common}- And more...
International Block Fields
id: Unique identifiername: Name of the blockblocktype: List of types (e.g., ["economic", "political"])status: Status ("formal", "informal", "de-facto")translations: List of{lang, name}acronyms: List of{lang, value}includes: List of member countries{id, name, type, status, ...}founded: Foundation year/dategeographic_scope: Scope ("global", "regional", "sub-regional")tags: List of classification tagstopics: List of{key, name}- And more...
Thread Safety
The SDK is designed to be thread-safe. Each query uses its own read-only DuckDB connection, which is automatically closed after use. This allows safe concurrent access from multiple threads.
import threading
from internacia import InternaciaClient
client = InternaciaClient()
def query_country(code):
country = client.countries.get_by_code(code)
print(f"{code}: {country['name']}")
# Safe to use from multiple threads
threads = []
for code in ["US", "FR", "DE", "GB"]:
t = threading.Thread(target=query_country, args=(code,))
threads.append(t)
t.start()
for t in threads:
t.join()
Examples
See the examples/ directory for more detailed usage examples:
basic_usage.py: Basic operationssearch_examples.py: Search functionalityadvanced_queries.py: Complex queriesmulti_language.py: Multi-language search examples
Development
Setup
git clone https://github.com/commondataio/internacia-python.git
cd internacia-python
pip install -e ".[dev]"
Running Tests
The SDK includes a comprehensive test suite with 104+ tests covering all public APIs, error handling, validation, and edge cases.
# Run all tests
pytest
# Run with verbose output
pytest -v
# Run specific test file
pytest tests/test_countries.py
# Run with coverage report
pytest --cov=internacia --cov-report=html
Test Coverage:
- ✅ All public methods (100% coverage)
- ✅ Error handling and exception scenarios
- ✅ Input validation and edge cases
- ✅ Database connection management
- ✅ Configuration and environment handling
See TEST_COVERAGE_ANALYSIS.md for detailed coverage information.
Code Formatting
# Format code
black internacia/
# Lint code
ruff check internacia/
# Type checking
mypy internacia/
Pre-commit Hooks
The project includes pre-commit hooks for automated code quality checks:
# Install pre-commit hooks
pre-commit install
# Run hooks manually on all files
pre-commit run --all-files
The hooks automatically check for:
- Code formatting (black)
- Linting (ruff)
- Type checking (mypy)
- Test execution (pytest)
- Trailing whitespace and other code quality issues
Alternative Implementation Approaches for Database Updates
While the current SDK uses a manual update process, here are several alternative approaches that could be implemented to automate database updates:
1. Version Metadata Checking
Approach: Store version/timestamp metadata in the database and provide SDK methods to check for updates.
Implementation:
- Add a
metadatatable to the database with version, build date, and checksum - Provide
client.get_database_version()andclient.check_for_updates()methods - Compare local version with remote version (from API or file)
Pros: Simple, lightweight, no external dependencies Cons: Requires coordination with internacia-db to maintain version info Complexity: Low
2. CLI Update Command
Approach: Create a CLI tool (e.g., internacia update) that automates the rebuild process.
Implementation:
- Add a CLI entry point using
clickorargparse - Command navigates to internacia-db, pulls updates, and rebuilds
- Can be run manually or via cron/scheduled tasks
Pros: User-friendly, automates the manual process Cons: Requires internacia-db to be accessible locally Complexity: Medium
3. Auto-Download from Releases
Approach: Automatically download pre-built database files from GitHub releases.
Implementation:
- Publish database files as GitHub release assets in internacia-db
- SDK checks for new releases and downloads the database file
- Cache downloaded files locally
Pros: No build step required, works for all users Cons: Requires release management, larger repository size Complexity: Medium-High
4. Database Version API
Approach: Check against an API endpoint for the latest database version.
Implementation:
- Create a simple API endpoint (or use internacia-api) that returns latest version info
- SDK queries API on initialization or via a method call
- Compare versions and prompt user to update
Pros: Centralized version management, can provide update notifications Cons: Requires API infrastructure, network dependency Complexity: Medium-High
5. Watch Mode / File Monitoring
Approach: Monitor the database file for changes and reload automatically.
Implementation:
- Use file system watchers (e.g.,
watchdoglibrary) to detect database file changes - Automatically reload database connections when file is updated
- Useful for development environments
Pros: Seamless updates during development Cons: File watching overhead, primarily useful for development Complexity: Medium
6. Package Integration with Version Checking
Approach: Include database version checking in SDK initialization with warnings.
Implementation:
- Check database version on client initialization
- Compare against expected/known versions
- Log warnings if database is outdated
- Optionally provide update instructions in the warning
Pros: Proactive user notification, minimal implementation Cons: Requires version metadata in database Complexity: Low-Medium
Recommendation: Start with Version Metadata Checking (#1) combined with Package Integration (#6) for a lightweight solution that provides user feedback. For more advanced use cases, consider CLI Update Command (#2) or Auto-Download from Releases (#3).
License
MIT License - see LICENSE file for details.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Related Projects
- internacia-db: The data repository
- internacia-api: REST API for internacia-db
Changelog
See CHANGELOG.md for a list of changes.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file internacia-0.2.0.tar.gz.
File metadata
- Download URL: internacia-0.2.0.tar.gz
- Upload date:
- Size: 37.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0b1ea9a2762fc98b482111a634a483dc80acadaad8c67871ea5668a7559816cf
|
|
| MD5 |
0e6472223b018e39170abb4edd17beaf
|
|
| BLAKE2b-256 |
f1d115b965f789c21b2be31300ecaaaee83d955fd70eba48e21bcf94537e29ec
|
File details
Details for the file internacia-0.2.0-py3-none-any.whl.
File metadata
- Download URL: internacia-0.2.0-py3-none-any.whl
- Upload date:
- Size: 23.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
aa026688d0e7f97426446d00149686142d71c3c1dd260cf1a7ff9616d887afe5
|
|
| MD5 |
674adaf84095157f9088ba11ba69efa7
|
|
| BLAKE2b-256 |
a5c096d231f36a1db2d474d51b0241a8914fd4d11cb544e9678244ec94a1076b
|