Python classes for fetching data from the European Commission's SEDIA API endpoints

These details have not been verified by PyPI

Project links

Project description

SEDIA API Fetchers

Python classes for fetching data from the European Commission's SEDIA API endpoints.

Overview

This package includes 5 specialized fetchers for different types of EU data:

Fetcher	Purpose	API Key	Data Type
`SEDIA_GET_PROJECTS`	EU funded projects	`SEDIA_NONH2020_PROD`	Project details, metadata, participants
`SEDIA_GET_PARTICIPANTS`	Organizations & persons	`SEDIA_PERSON`	Participant profiles, collaborations
`SEDIA_GET_FUNDING_TENDERS`	Calls & tenders	`SEDIA`	Grant opportunities, tender notices
`SEDIA_GET_TOPICS`	Topic details	`SEDIA`	Research topic specifications
`SEDIA_GET_FAQ`	FAQ system	`SEDIA_FAQ`	Frequently asked questions

Installation & Setup

Prerequisites

pip install requests pandas numpy tqdm pathlib urllib3

Directory Structure

src/EUFT_retrieve/
├── EUFT_retrieve_projects.py          # Projects fetcher
├── EUFT_retrieve_participants.py      # Participants fetcher  
├── EUFT_retrieve_funding_tenders.py   # Funding & tenders fetcher
├── EUFT_retrieve_topics.py            # Topics fetcher
├── EUFT_retrieve_faq.py               # FAQ fetcher
├── demo_all_fetchers.py               # Comprehensive demo
├── helpers/
│   └── functions.py                   # Utility functions
└── README.md                          # This file

Architecture

Base Classes

The fetchers use an inheritance-based architecture:

SEDIABaseFetcher (Abstract): Common functionality for all fetchers
SEDIAPaginatedFetcher: For POST-based endpoints with pagination
SEDIASimpleFetcher: For GET-based endpoints

Common Features

All fetchers inherit these capabilities:

Flexible Programme Input

# Single programme by name
data = fetcher.get('h2020')

# Single programme by ID  
data = fetcher.get(31045243)

# Multiple programmes
data = fetcher.get(['h2020', 'horizon'])

# Mixed input
data = fetcher.get(['h2020', 43108390])

Configuration Options

fetcher = SEDIA_GET_PROJECTS(
    flatten_metadata=True,      # Flatten nested JSON structures
    enrich_with_details=False   # Fetch detailed info (projects only)
)

Data Management

Automatic timestamping: Files saved with timestamp
Progress tracking: Real-time progress bars
Error handling: Robust retry mechanisms
Memory efficient: Chunked processing for large datasets

Consistent API Pattern

# Basic usage
data = fetcher.get(programmes, save=True)

# Advanced usage with filters
data = fetcher.get(
    programmes=['h2020', 'horizon'],
    additional_filters='value',
    save=True
)

Detailed Usage Guide

1. Projects Fetcher (`SEDIA_GET_PROJECTS`)

Fetches project data with optional enrichment.

from EUFT_retrieve_projects import SEDIA_GET_PROJECTS

# Basic usage
fetcher = SEDIA_GET_PROJECTS(flatten_metadata=True)
data = fetcher.get('edf', save=True)

# With detailed project enrichment
fetcher = SEDIA_GET_PROJECTS(
    flatten_metadata=True,
    enrich_with_details=True  # Fetches detailed project info
)
data = fetcher.get(['h2020', 'horizon'], save=True)

Features:

Handles >10K records via date-range partitioning
Optional project detail enrichment
Metadata flattening
Automatic duplicate handling

Architecture: Inherits from SEDIAPaginatedFetcher

2. Participants Fetcher (`SEDIA_GET_PARTICIPANTS`)

Fetches organization and person data from EU programmes.

from EUFT_retrieve_participants import SEDIA_GET_PARTICIPANTS

fetcher = SEDIA_GET_PARTICIPANTS(flatten_metadata=True)

# Fetch all participants for EDF programme
data = fetcher.get('edf', save=True)

# Multiple programmes
data = fetcher.get(['h2020', 'horizon'], save=True)

Features:

Fetches ORGANISATION and PERSON types
Participant metadata flattening
Programme-specific filtering
Collaboration network data

Architecture: Inherits from SEDIAPaginatedFetcher

3. Funding & Tenders Fetcher (`SEDIA_GET_FUNDING_TENDERS`)

Fetches grant opportunities and tender notices.

from EUFT_retrieve_funding_tenders import SEDIA_GET_FUNDING_TENDERS

fetcher = SEDIA_GET_FUNDING_TENDERS(flatten_metadata=True)

# Open grants for Horizon Europe
data = fetcher.get(
    programmes='horizon',
    funding_type='grants',    # 'grants', 'tenders', 'all'
    status='open',           # 'open', 'closed', 'all'
    save=True
)

# All tenders regardless of programme
data = fetcher.get(
    programmes=None,         # All programmes
    funding_type='tenders',
    status='all',
    save=True
)

# With additional filters
data = fetcher.get(
    programmes='h2020',
    programmePeriod='2014 - 2020',
    crossCuttingPriorities=['OCEAN'],
    save=True
)

Available Options:

Funding types: grants, tenders, all
Status: open, closed, all
Additional filters: Any valid API parameter

Architecture: Inherits from SEDIAPaginatedFetcher

4. Topics Fetcher (`SEDIA_GET_TOPICS`)

Fetches detailed information about specific research topics.

from EUFT_retrieve_topics import SEDIA_GET_TOPICS

fetcher = SEDIA_GET_TOPICS(flatten_metadata=True)

# Single topic
data = fetcher.get('HORIZON-CL3-2022-BM-01-01', save=True)

# Multiple topics
topics = [
    'HORIZON-CL3-2022-BM-01-01',
    'HORIZON-CL4-2022-RESILIENCE-01-08'
]
data = fetcher.get(topics, save=True)

Features:

Topic-specific detailed information
Batch processing for multiple topics
Missing topic tracking
Research area categorization

Architecture: Inherits from SEDIASimpleFetcher (uses GET requests)

5. FAQ Fetcher (`SEDIA_GET_FAQ`)

Fetches FAQ index and detailed FAQ content.

from EUFT_retrieve_faq import SEDIA_GET_FAQ

fetcher = SEDIA_GET_FAQ(flatten_metadata=True)

# FAQ index for specific programme
data = fetcher.get(
    programmes='h2020',
    faq_type='all',          # 'active', 'archived', 'all'
    status='all',            # 'active', 'archived', 'all'
    save=True
)

# FAQ index with detailed content
data = fetcher.get(
    programmes='horizon',
    fetch_details=True,      # Fetch full FAQ content
    save=True
)

# Specific FAQ details by NID
data = fetcher.get(
    nid_list=['755', '12350'],
    save=True
)

Available Options:

FAQ types: active, archived, all
Status: active, archived, all
Details: fetch_details=True for complete content

Architecture: Uses SEDIAPaginatedFetcher (when migrated)

Quick Start Demo

Run the demo:

cd src/EUFT_retrieve
python demo_all_fetchers.py

Demonstrates:

All 5 fetchers with example usage
Flexible input handling
Data processing capabilities
Error handling features
Advanced usage patterns

Programme IDs Reference

Programme	Name	ID
`h2020`	Horizon 2020	31045243
`horizon`	Horizon Europe	43108390
`digital`	Digital Europe	43152860
`edf`	European Defence Fund	44181033

Advanced Usage

Custom Query Parameters

All fetchers support additional query parameters via **kwargs:

# Funding & Tenders with custom filters
data = fetcher.get(
    programmes='horizon',
    funding_type='grants',
    programmePeriod='2021 - 2027',
    crossCuttingPriorities=['CLIMATE'],
    destination=['43650651'],
    save=True
)

Error Handling

try:
    data = fetcher.get('invalid_programme')
except ValueError as e:
    print(f"Invalid programme: {e}")
except Exception as e:
    print(f"API error: {e}")

Memory Management

For large datasets:

# Disable metadata flattening for faster processing
fetcher = SEDIA_GET_PROJECTS(flatten_metadata=False)

# Process in smaller chunks
data = fetcher.get('h2020', save=True)  # Automatically chunked

Data Processing Pipeline

from helpers.functions import Functions

# Load and process cached data
df = Functions.load_cached_dataframe('cache/my_data.feather')

# Apply custom flattening
df_flat = Functions.flatten_dataframe_metadata(df)

# Clean empty containers
df_clean = Functions.clean_empty_containers(df_flat)

Output Files

All fetchers generate timestamped CSV files:

data/
├── project_data_44181033_20241201_143022.csv
├── participant_data_44181033_20241201_143155.csv
├── funding_tenders_data_horizon_grants_open_20241201_143301.csv
├── topic_details_HORIZON-CL3-2022-BM-01-01_20241201_143445.csv
└── faq_data_31045243_all_all_20241201_143612.csv

Important Notes

API Rate Limits

Retry mechanisms handle rate limiting
Automatic backoff for server errors
Session management

Data Size Considerations

Projects fetcher handles >10K records via partitioning
Other fetchers may hit 10K API limits
Use programme filters to reduce dataset size

Memory Usage

Metadata flattening increases memory usage
Disable flattening for very large datasets
Use chunked processing

Contributing

To extend the fetchers:

Follow the existing class structure
Implement consistent API patterns
Add error handling
Include progress tracking
Update this README

License

This project is part of the EU thesis research toolkit. Please refer to the main project license.

Support

For issues or questions:

Check the demo script for usage examples
Review error messages for specific issues
Verify programme IDs and API parameters
Ensure network connectivity to EU APIs

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.0

Jul 6, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sedia_api_fetchers-1.0.0.tar.gz (34.4 kB view details)

Uploaded Jul 6, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sedia_api_fetchers-1.0.0-py3-none-any.whl (39.1 kB view details)

Uploaded Jul 6, 2025 Python 3

File details

Details for the file sedia_api_fetchers-1.0.0.tar.gz.

File metadata

Download URL: sedia_api_fetchers-1.0.0.tar.gz
Upload date: Jul 6, 2025
Size: 34.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for sedia_api_fetchers-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`7322475f5a691c72db5758180c65df9550b813c2773eb7ba2bdc23b6011894f2`
MD5	`6aae4780c8e176f101816814c0e038d1`
BLAKE2b-256	`c31be704fb20338bfef0bc24bdbc4d149a4adc398d29a42042a3ca606758433e`

See more details on using hashes here.

File details

Details for the file sedia_api_fetchers-1.0.0-py3-none-any.whl.

File metadata

Download URL: sedia_api_fetchers-1.0.0-py3-none-any.whl
Upload date: Jul 6, 2025
Size: 39.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for sedia_api_fetchers-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5b3c9dd34775ee10cad32fa3bbbe782a26c55580aded548754a3a358abcdd4a8`
MD5	`24f7d5a34e0d2172a496839452bcc90c`
BLAKE2b-256	`fd0ba418011b0aeec505647efed53892610f15a75340d16266d893af2861a914`

See more details on using hashes here.

sedia-api-fetchers 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

SEDIA API Fetchers

Overview

Installation & Setup

Prerequisites

Directory Structure

Architecture

Base Classes

Common Features

Flexible Programme Input

Configuration Options

Data Management

Consistent API Pattern

Detailed Usage Guide

1. Projects Fetcher (SEDIA_GET_PROJECTS)

2. Participants Fetcher (SEDIA_GET_PARTICIPANTS)

3. Funding & Tenders Fetcher (SEDIA_GET_FUNDING_TENDERS)

4. Topics Fetcher (SEDIA_GET_TOPICS)

5. FAQ Fetcher (SEDIA_GET_FAQ)

Quick Start Demo

Programme IDs Reference

Advanced Usage

Custom Query Parameters

Error Handling

Memory Management

Data Processing Pipeline

Output Files

Important Notes

API Rate Limits

Data Size Considerations

Memory Usage

Contributing

License

Support

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

1. Projects Fetcher (`SEDIA_GET_PROJECTS`)

2. Participants Fetcher (`SEDIA_GET_PARTICIPANTS`)

3. Funding & Tenders Fetcher (`SEDIA_GET_FUNDING_TENDERS`)

4. Topics Fetcher (`SEDIA_GET_TOPICS`)

5. FAQ Fetcher (`SEDIA_GET_FAQ`)