Skip to main content

High-level Python client for INE Portugal (Statistics Portugal) API

Project description

pyptine - INE Portugal API Client

PyPI version Build Status codecov Python 3.8+ License: MIT

High-level Python client for Statistics Portugal (INE) API. Query and download statistical data from INE Portugal with a simple, intuitive interface.

Features

  • 🎯 High-level Convenience API: Simple interface for common data retrieval and analysis tasks.
  • 📊 Multiple Output Formats: Export data to pandas DataFrames, JSON, or CSV with ease.
  • 💾 Smart Caching: Disk-based caching reduces redundant API calls, speeding up repeated queries.
  • 🔍 Metadata Browsing: Search and discover indicators, themes, and dimensions.
  • 🖥️ Command-Line Interface: A powerful CLI for quick data access and scripting.
  • 📖 Modern Python: Fully type-annotated for better developer experience and IDE support.
  • Well-Tested: Comprehensive test suite with 73% code coverage.
  • 🔄 API Compatible: Supports both old and new INE API response formats seamlessly.

Installation

pip install pyptine

For development, install with all extra dependencies:

pip install "pyptine[dev,docs]"

Quick Start

from pyptine import INE

# Initialize the client
ine = INE(language="EN")

# 1. Search for an indicator
print("Searching for 'gdp' indicators...")
results = ine.search("gdp")
for indicator in results[:5]:  # Print top 5 results
    print(f"- {indicator.varcd}: {indicator.title}")

# 2. Get data for a specific indicator
varcd = "0004167"  # Resident population
print(f"\nFetching data for indicator {varcd}...")
response = ine.get_data(varcd)

# 3. Convert to a pandas DataFrame
df = response.to_dataframe()
print("\nData as DataFrame:")
print(df.head())

# 4. Export data to a CSV file
output_file = "population_data.csv"
print(f"\nExporting data to {output_file}...")
ine.export_csv(varcd, output_file)
print("Done!")

Command-Line Usage

The pyptine CLI provides a convenient way to access INE data from your terminal, with rich formatting and progress indicators for a better user experience.

# Search for indicators related to "pib" (GDP in Portuguese)
pyptine search "pib"

# Get detailed information about a specific indicator
pyptine info 0004127

# Download data for an indicator to a CSV file (with progress bar)
pyptine download 0004127 --output data.csv

# Download data and filter by dimensions
pyptine download 0004167 --output filtered_data.csv -d Dim1=S7A2023 -d Dim2=PT

# List all available statistical themes (in formatted table)
pyptine list-commands themes

# List all indicators (with pagination support)
pyptine list-commands indicators --limit 50

# View available dimensions for an indicator
pyptine dimensions 0004167

# Clear the local cache
pyptine cache clear

CLI Features:

  • Rich Formatting - Tables, panels, and colored output for better readability
  • Progress Indicators - Spinners and progress bars for long-running operations
  • Error Handling - Centralized, user-friendly error messages with context
  • Better Organization - Data displayed in well-formatted tables rather than plain text

Documentation

Initializing the Client

The INE class is the main entry point.

from pyptine import INE
from pathlib import Path

# Default client (language='EN', caching=True)
ine = INE()

# Client with Portuguese language
ine_pt = INE(language="PT")

# Disable caching
ine_no_cache = INE(cache=False)

# Use a custom cache directory
ine_custom_cache = INE(cache_dir=Path("/path/to/custom/cache"))

Working with Indicators

Searching for Indicators

You can search for indicators by keyword and filter by theme or sub-theme.

# Basic search
results = ine.search("unemployment rate")

# Search within a specific theme
results = ine.search("employment", theme="Labour market")

Getting Indicator Metadata

Retrieve detailed information about an indicator, including its dimensions.

metadata = ine.get_metadata("0004167")
print(f"Title: {metadata.title}")
print(f"Unit: {metadata.unit}")
print(f"Source: {metadata.source}")

# List available dimensions
dimensions = ine.get_dimensions("0004167")
for dim in dimensions:
    print(f"\nDimension: {dim.name}")
    for value in dim.values[:5]:  # Show first 5 values
        print(f"- {value.code}: {value.label}")

Fetching and Exporting Data

Getting Data

The get_data method returns a DataResponse object, which can be easily converted to different formats.

response = ine.get_data("0004127")

# Convert to pandas DataFrame
df = response.to_dataframe()

# Convert to a dictionary
data_dict = response.to_dict()

# Get data as a JSON string
json_str = response.to_json()

Filtering Data with Dimensions

Use the dimensions parameter to filter data before downloading.

# Get data for the year 2023 and region "Portugal"
# Note: Dimension values use specific codes (e.g., 'S7A2023' for year 2023)
filtered_response = ine.get_data(
    "0004167",
    dimensions={
        "Dim1": "S7A2023",  # Year 2023
        "Dim2": "PT"        # Geographic region 'Portugal'
    }
)
df_filtered = filtered_response.to_dataframe()

Exporting Data

You can export data directly to CSV or JSON files.

# Export to CSV
ine.export_csv("0004127", "output.csv")

# Export to JSON with pretty printing
ine.export_json("0004127", "output.json", pretty=True)

# Export filtered data
ine.export_csv(
    "0004167",
    "filtered_output.csv",
    dimensions={"Dim1": "S7A2023"}
)

Working with Large Datasets

For large datasets that exceed the default 40,000 data point limit, use the get_all_data() method which automatically handles pagination:

from pyptine.client.data import DataClient

client = DataClient(language="EN")

# Fetch data in chunks (default chunk_size=40,000)
for chunk in client.get_all_data("0004127"):
    df = chunk.to_dataframe()
    print(f"Processed {len(df)} rows")
    # Process each chunk

# Custom chunk size
for chunk in client.get_all_data("0004127", chunk_size=5000):
    # Process smaller chunks
    pass

# Combine all chunks into a single dataset
all_chunks = list(client.get_all_data("0004127"))
all_data = [point for chunk in all_chunks for point in chunk.data]

Visualizing Data

Create interactive visualizations directly from indicator data without exporting to DataFrame:

# Get data and create interactive line chart
response = ine.get_data("0004127")
fig = response.plot(chart_type="line")
fig.show()

# Different chart types
fig_bar = response.plot_bar()
fig_area = response.plot_area()
fig_scatter = response.plot_scatter()

# Customize visualization
fig = response.plot_line(
    markers=True,
    x_column="Period",
    y_column="value"
)

# Color by dimensions (if data has dimension columns)
fig = response.plot_line(color_column="region")

# Save to HTML for sharing
fig.write_html("indicator_plot.html")

# Further customization with plotly
fig.update_layout(height=600, width=1200, title="Custom Title")
fig.show()

Available Visualization Methods:

  • plot(chart_type) - Generic plot with selectable chart type
  • plot_line() - Interactive line chart with optional markers
  • plot_bar() - Bar chart for categorical comparison
  • plot_area() - Stacked area chart for trends
  • plot_scatter() - Scatter plot with optional size and color dimensions

All methods support:

  • Interactive plotly charts with hover, zoom, and pan
  • Custom column selection for x/y axes
  • Color coding by dimension columns
  • Export to HTML, PNG, or other formats

Advanced Data Analysis

Perform statistical calculations on indicator data directly within the library:

# Get data and calculate year-over-year growth
response = ine.get_data("0004127")
yoy_response = response.calculate_yoy_growth()
df_yoy = yoy_response.to_dataframe()
print(df_yoy[['Period', 'value', 'yoy_growth']])

# Calculate month-over-month changes
mom_response = response.calculate_mom_change()
df_mom = mom_response.to_dataframe()

# Calculate simple moving average (3-period)
ma_response = response.calculate_moving_average(window=3)
df_ma = ma_response.to_dataframe()

# Calculate exponential moving average
ema_response = response.calculate_exponential_moving_average(span=5)
df_ema = ema_response.to_dataframe()

# Chain multiple analyses
result = response.calculate_yoy_growth().calculate_moving_average(window=2)
df = result.to_dataframe()
print(df[['Period', 'value', 'yoy_growth', 'moving_avg']])

Available analysis methods on DataResponse:

  • calculate_yoy_growth() - Year-over-year percentage change
  • calculate_mom_change() - Month-over-month percentage change
  • calculate_moving_average(window) - Simple moving average
  • calculate_exponential_moving_average(span) - Exponential weighted moving average

All methods support custom value_column and period_column parameters to work with different data structures.

API Reference

INE Class

The main class for interacting with the INE API.

INE(language: str = "EN", cache: bool = True, cache_dir: Optional[Path] = None, cache_ttl: int = 86400)

Method Description
search(query, ...) Search for indicators.
get_data(varcd, ...) Get data for an indicator as a DataResponse object.
get_metadata(varcd) Get detailed metadata for an indicator.
get_dimensions(varcd) Get available dimensions for an indicator.
get_indicator(varcd) Get catalogue information for a single indicator.
validate_indicator(varcd) Check if an indicator code is valid.
list_themes() Get a list of all available themes.
export_csv(varcd, ...) Export indicator data to a CSV file.
export_json(varcd, ...) Export indicator data to a JSON file.
clear_cache() Clear all cached data.
get_cache_info() Get statistics about the cache.

Development

Setup

To set up the development environment:

# Clone the repository
git clone https://github.com/nigelrandsley/pyptine.git
cd pyptine

# Install in editable mode with development dependencies
pip install -e ".[dev]"

# Install pre-commit hooks to ensure code quality
pre-commit install

Running Tests

# Run all tests
pytest

# Run tests with coverage report
pytest --cov=src/pyptine --cov-report=term-missing

Code Quality

This project uses black for formatting, ruff for linting, and mypy for type checking.

# Format code
black src/ tests/

# Lint code
ruff check src/ tests/

# Type check
mypy src/

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository.
  2. Create your feature branch (git checkout -b feature/amazing-feature).
  3. Commit your changes (git commit -m 'Add amazing feature').
  4. Push to the branch (git push origin feature/amazing-feature).
  5. Open a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyptine-0.2.0.tar.gz (52.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyptine-0.2.0-py3-none-any.whl (61.2 kB view details)

Uploaded Python 3

File details

Details for the file pyptine-0.2.0.tar.gz.

File metadata

  • Download URL: pyptine-0.2.0.tar.gz
  • Upload date:
  • Size: 52.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for pyptine-0.2.0.tar.gz
Algorithm Hash digest
SHA256 4f55293fa7fc71d531489254d225c88e44bc3955cfe1218149a99641edcd29b0
MD5 e37c7c2959f05b3a86029e553341aa25
BLAKE2b-256 417ce595b56941d1329e75097765f5086a123042720d5b7a80d9ebfa13a98e98

See more details on using hashes here.

File details

Details for the file pyptine-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: pyptine-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 61.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for pyptine-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8048cfff6c1bde2a9ebc3c72ead01a28b385f62ffdcd1e3c2d092eb1778958ca
MD5 2ab2afaec0d7c9c3d98c25fc28cd2468
BLAKE2b-256 8fcb70c9e41eea1612a322aa7d01caed79d157812d948dbc433653b579bd4904

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page