Skip to main content

High-level Python client for INE Portugal (Statistics Portugal) API

Project description

pyptine - INE Portugal API Client

PyPI version Python 3.9+ License: MIT

High-level Python client for Statistics Portugal (INE) API. Query and download statistical data from INE Portugal with a simple, intuitive interface.

Features

  • 🎯 High-level Convenience API: Simple interface for common data retrieval and analysis tasks.
  • Async Support: Non-blocking I/O with AsyncINE for concurrent requests using httpx.
  • 📊 Multiple Output Formats: Export data to pandas DataFrames, JSON, or CSV with ease.
  • 📈 Data Visualization: Interactive plotly charts (line, bar, area, scatter) directly from data.
  • 🔬 Statistical Analysis: Built-in YoY growth, MoM changes, moving averages, and EMA calculations.
  • 💾 Smart Caching: Disk-based caching reduces redundant API calls, speeding up repeated queries.
  • 🔍 Metadata Browsing: Search and discover indicators, themes, and dimensions.
  • 🖥️ Enhanced CLI: Rich formatting with progress bars, tables, and colored output.
  • 📑 True Pagination: Efficient streaming of large datasets with get_all_data().
  • 📖 Modern Python: Fully type-annotated for better developer experience and IDE support.
  • Well-Tested: Comprehensive test suite with 81% code coverage (239 tests).
  • 🔄 API Compatible: Supports both old and new INE API response formats seamlessly.

Installation

pip install pyptine

For development, install with all extra dependencies:

pip install "pyptine[dev,docs]"

Quick Start

from pyptine import INE

# Initialize the client
ine = INE(language="EN")

# 1. Search for an indicator
print("Searching for 'gdp' indicators...")
results = ine.search("gdp")
for indicator in results[:5]:  # Print top 5 results
    print(f"- {indicator.varcd}: {indicator.title}")

# 2. Get data for a specific indicator
varcd = "0004167"  # Resident population
print(f"\nFetching data for indicator {varcd}...")
response = ine.get_data(varcd)

# 3. Convert to a pandas DataFrame
df = response.to_dataframe()
print("\nData as DataFrame:")
print(df.head())

# 4. Export data to a CSV file
output_file = "population_data.csv"
print(f"\nExporting data to {output_file}...")
ine.export_csv(varcd, output_file)
print("Done!")

Async API

For concurrent requests and non-blocking I/O, use the AsyncINE client:

import asyncio
from pyptine import AsyncINE

async def main():
    async with AsyncINE(language="EN") as ine:
        # Fetch single indicator
        response = await ine.get_data("0004167")
        df = response.to_dataframe()
        print(df.head())

        # Fetch multiple indicators concurrently
        import asyncio
        responses = await asyncio.gather(
            ine.get_data("0004167"),
            ine.get_data("0004127"),
            ine.get_data("0008074")
        )

        # Stream large datasets
        async for chunk in ine.get_all_data("0004127", chunk_size=40000):
            df_chunk = chunk.to_dataframe()
            print(f"Processing {len(df_chunk)} rows...")

asyncio.run(main())

AsyncINE Features:

  • Non-blocking I/O for faster concurrent requests
  • Async iterator for memory-efficient pagination
  • Same API as the synchronous INE client
  • Automatic connection pooling and retries

Command-Line Usage

The pyptine CLI provides a convenient way to access INE data from your terminal, with rich formatting and progress indicators for a better user experience.

# Search for indicators related to "pib" (GDP in Portuguese)
pyptine search "pib"

# Get detailed information about a specific indicator
pyptine info 0004127

# Download data for an indicator to a CSV file (with progress bar)
pyptine download 0004127 --output data.csv

# Download data and filter by dimensions
pyptine download 0004167 --output filtered_data.csv -d Dim1=S7A2023 -d Dim2=PT

# List all available statistical themes (in formatted table)
pyptine list-commands themes

# List all indicators (with pagination support)
pyptine list-commands indicators --limit 50

# View available dimensions for an indicator
pyptine dimensions 0004167

# Clear the local cache
pyptine cache clear

CLI Features:

  • Rich Formatting - Tables, panels, and colored output for better readability
  • Progress Indicators - Spinners and progress bars for long-running operations
  • Error Handling - Centralized, user-friendly error messages with context
  • Better Organization - Data displayed in well-formatted tables rather than plain text

Documentation

Initializing the Client

The INE class is the main entry point.

from pyptine import INE
from pathlib import Path

# Default client (language='EN', caching=True)
ine = INE()

# Client with Portuguese language
ine_pt = INE(language="PT")

# Disable caching
ine_no_cache = INE(cache=False)

# Use a custom cache directory
ine_custom_cache = INE(cache_dir=Path("/path/to/custom/cache"))

Working with Indicators

Searching for Indicators

You can search for indicators by keyword and filter by theme or sub-theme.

# Basic search
results = ine.search("unemployment rate")

# Search within a specific theme
results = ine.search("employment", theme="Labour market")

Getting Indicator Metadata

Retrieve detailed information about an indicator, including its dimensions.

metadata = ine.get_metadata("0004167")
print(f"Title: {metadata.title}")
print(f"Unit: {metadata.unit}")
print(f"Source: {metadata.source}")

# List available dimensions
dimensions = ine.get_dimensions("0004167")
for dim in dimensions:
    print(f"\nDimension: {dim.name}")
    for value in dim.values[:5]:  # Show first 5 values
        print(f"- {value.code}: {value.label}")

Fetching and Exporting Data

Getting Data

The get_data method returns a DataResponse object, which can be easily converted to different formats.

response = ine.get_data("0004127")

# Convert to pandas DataFrame
df = response.to_dataframe()

# Convert to a dictionary
data_dict = response.to_dict()

# Get data as a JSON string
json_str = response.to_json()

Filtering Data with Dimensions

Use the dimensions parameter to filter data before downloading.

# Get data for the year 2023 and region "Portugal"
# Note: Dimension values use specific codes (e.g., 'S7A2023' for year 2023)
filtered_response = ine.get_data(
    "0004167",
    dimensions={
        "Dim1": "S7A2023",  # Year 2023
        "Dim2": "PT"        # Geographic region 'Portugal'
    }
)
df_filtered = filtered_response.to_dataframe()

Exporting Data

You can export data directly to CSV or JSON files.

# Export to CSV
ine.export_csv("0004127", "output.csv")

# Export to JSON with pretty printing
ine.export_json("0004127", "output.json", pretty=True)

# Export filtered data
ine.export_csv(
    "0004167",
    "filtered_output.csv",
    dimensions={"Dim1": "S7A2023"}
)

Working with Large Datasets

For large datasets that exceed the default 40,000 data point limit, use the get_all_data() method which automatically handles pagination:

from pyptine.client.data import DataClient

client = DataClient(language="EN")

# Fetch data in chunks (default chunk_size=40,000)
for chunk in client.get_all_data("0004127"):
    df = chunk.to_dataframe()
    print(f"Processed {len(df)} rows")
    # Process each chunk

# Custom chunk size
for chunk in client.get_all_data("0004127", chunk_size=5000):
    # Process smaller chunks
    pass

# Combine all chunks into a single dataset
all_chunks = list(client.get_all_data("0004127"))
all_data = [point for chunk in all_chunks for point in chunk.data]

Visualizing Data

Create interactive visualizations directly from indicator data without exporting to DataFrame:

# Get data and create interactive line chart
response = ine.get_data("0004127")
fig = response.plot(chart_type="line")
fig.show()

# Different chart types
fig_bar = response.plot_bar()
fig_area = response.plot_area()
fig_scatter = response.plot_scatter()

# Customize visualization
fig = response.plot_line(
    markers=True,
    x_column="Period",
    y_column="value"
)

# Color by dimensions (if data has dimension columns)
fig = response.plot_line(color_column="region")

# Save to HTML for sharing
fig.write_html("indicator_plot.html")

# Further customization with plotly
fig.update_layout(height=600, width=1200, title="Custom Title")
fig.show()

Available Visualization Methods:

  • plot(chart_type) - Generic plot with selectable chart type
  • plot_line() - Interactive line chart with optional markers
  • plot_bar() - Bar chart for categorical comparison
  • plot_area() - Stacked area chart for trends
  • plot_scatter() - Scatter plot with optional size and color dimensions

All methods support:

  • Interactive plotly charts with hover, zoom, and pan
  • Custom column selection for x/y axes
  • Color coding by dimension columns
  • Export to HTML, PNG, or other formats

Advanced Data Analysis

Perform statistical calculations on indicator data directly within the library:

# Get data and calculate year-over-year growth
response = ine.get_data("0004127")
yoy_response = response.calculate_yoy_growth()
df_yoy = yoy_response.to_dataframe()
print(df_yoy[['Period', 'value', 'yoy_growth']])

# Calculate month-over-month changes
mom_response = response.calculate_mom_change()
df_mom = mom_response.to_dataframe()

# Calculate simple moving average (3-period)
ma_response = response.calculate_moving_average(window=3)
df_ma = ma_response.to_dataframe()

# Calculate exponential moving average
ema_response = response.calculate_exponential_moving_average(span=5)
df_ema = ema_response.to_dataframe()

# Chain multiple analyses
result = response.calculate_yoy_growth().calculate_moving_average(window=2)
df = result.to_dataframe()
print(df[['Period', 'value', 'yoy_growth', 'moving_avg']])

Available analysis methods on DataResponse:

  • calculate_yoy_growth() - Year-over-year percentage change
  • calculate_mom_change() - Month-over-month percentage change
  • calculate_moving_average(window) - Simple moving average
  • calculate_exponential_moving_average(span) - Exponential weighted moving average

All methods support custom value_column and period_column parameters to work with different data structures.

API Reference

INE Class

The main class for interacting with the INE API.

INE(language: str = "EN", cache: bool = True, cache_dir: Optional[Path] = None, cache_ttl: int = 86400)

Method Description
search(query, ...) Search for indicators.
get_data(varcd, ...) Get data for an indicator as a DataResponse object.
get_metadata(varcd) Get detailed metadata for an indicator.
get_dimensions(varcd) Get available dimensions for an indicator.
get_indicator(varcd) Get catalogue information for a single indicator.
validate_indicator(varcd) Check if an indicator code is valid.
list_themes() Get a list of all available themes.
export_csv(varcd, ...) Export indicator data to a CSV file.
export_json(varcd, ...) Export indicator data to a JSON file.
clear_cache() Clear all cached data.
get_cache_info() Get statistics about the cache.

Links & Resources


Development

Setup

To set up the development environment:

# Clone the repository
git clone https://github.com/nigelrandsley/pyptine.git
cd pyptine

# Install in editable mode with development dependencies
pip install -e ".[dev]"

# Install pre-commit hooks to ensure code quality
pre-commit install

Running Tests

# Run all tests
pytest

# Run tests with coverage report
pytest --cov=src/pyptine --cov-report=term-missing

Code Quality

This project uses black for formatting, ruff for linting, and mypy for type checking.

# Format code
black src/ tests/

# Lint code
ruff check src/ tests/

# Type check
mypy src/

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository.
  2. Create your feature branch (git checkout -b feature/amazing-feature).
  3. Commit your changes (git commit -m 'Add amazing feature').
  4. Push to the branch (git push origin feature/amazing-feature).
  5. Open a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyptine-0.3.0.tar.gz (55.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyptine-0.3.0-py3-none-any.whl (63.5 kB view details)

Uploaded Python 3

File details

Details for the file pyptine-0.3.0.tar.gz.

File metadata

  • Download URL: pyptine-0.3.0.tar.gz
  • Upload date:
  • Size: 55.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for pyptine-0.3.0.tar.gz
Algorithm Hash digest
SHA256 c06383626dd3ea5447dea84e0ffe173e5d14ba125e2de540c59ba6222c4ea07c
MD5 65c3af4a15261b9471804ddbf9faceda
BLAKE2b-256 6712feb6a50ef0a36caeef7307490e36f1cef472558913b9ed1cd64d8d802aef

See more details on using hashes here.

File details

Details for the file pyptine-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: pyptine-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 63.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for pyptine-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 29904bd0cce1594277f16a622d3b900b4d5e8617cc703f6229e389a92c809c51
MD5 24fc437281fde36eb9a35dfacaeafe0c
BLAKE2b-256 c587d70409e86bb3e578606e69ce4d1fd354211780bff03b1d0dc412732345c8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page