CLI to extract state aids data from public sources and produce CSV files
Project description
Description
eu-state-aids
is a package to import state aids related data from single countries sources
and produce CSV files, according to a common data structure.
The tool provides both a Command Line Interface (the eu-state-aids
command),
and an API. See the Usage section.
The common CSV format used for the export:
Name | Type | Meaning |
---|---|---|
Name of the beneficiary | String | The name of the aid's beneficiary |
ID of the beneficiary | Long Integer | The unique ID of the aid's beneficiary |
European operation program (ID) | String | The unique CCI code of the european program, see details here |
Amounts (€) | Float with 2 digits precision | Total amount of the project (in Euro) |
Date | Date YYYY[-MM-DD] |
Date of the beginning of the aid program (at least the year) |
State aid Scheme | String | The aid scheme code. The format is SA.XXXXX , wher the Xs are digits. |
Installation
Python versions from 3.7 are supported.
The package depends on these python packages:
- typer
- openpyxl
- pandas
- requests
- validators
So, it's better to create a virtualenv before installation.
The package is hosted on pypi, and can be installed, for example using pip:
pip install eu-state-aids
Usage
Command Line Interface
The eu-state-aids
binary command will be available after installation.
It offers help with:
eu-state-aids --help
The eu-state-aids
command can be used to extract the data from the official sources,
and populate the CSV files.
For each country, data files will firstly be fetched and stored locally, and thereafter used in order to export CSV files.
This two-step procedure is useful, since it is not always possible to download source files (Excel, XML, ...) from BI systems of nation states, as it has been seen that they tend to time-out whenever the number of records is high enough.
The logic of these two phases can vary for each single european state, so each country will have a dedicated module, that will be executable as a sub-command.
Bulgary
To retrieve data and produce a CSV file for Bulgary (bg), 2015:
eu-state-aids bg fetch 2015
eu-state-aids bg export 2015
To launch the scripts for all years for Bulgary (bg):
# download all years' excel files into local storage
for Y in $(seq 2014 2022)
do
eu-state-aids bg fetch $Y
done
# process all years' excel files and export CSV records into local storage
#./data/bg/$Y.csv files
for Y in $(seq 2014 2022)
do
python -m eu_state_aids bg export $Y
done
Italy
Italy needs a slightly different procedure, as before invoking the fetch/export commands,
a misure.csv
file needs to be generated, so that all aids records found in XML files can be
compared with found CE_CODE and filtered.
eu-state-aids bg generate_measures
To retrieve data and produce a CSV file for Italy (it), 2015, there is actually no need to fetch the file, as files have been copied on a reliable source.
eu-state-aids bg export 2015 --delete-processed
This will generate a loop over all months of 2015, fetch the files, if they're not already fetched,
extract, transform and filter the records for each month and emit a CSV file with all the records found.
The amount of money is summed for each beneficiary (over all records in that year). The fetched file will be deleted
after the procedure, if required through the --delete-processed
option.
To launch the scripts for all years for Italy (it):
# download all years' excel files into local storage
for Y in $(seq 2014 2022)
do
eu-state-aids it export $Y --delete-processed
done
API
The fetch and export logics can be used from within a python program, importing the packages. All options values must be explicited in API calls.
from eu_state_aids import bg
for year in ['2015', '2016', '2017']:
bg.fetch(year, local_path='./data/bg')
bg.export(
year, local_path='./data/bg',
stateaid_url="https://stateaid.minfin.bg/document/860",
program_start_year="2014"
)
Note on italian data
Italian government sources suffer from two issues.
- XML files are not automatically downloadable from single dedicated URLS, but must be downloaded manually, as the softare solution adopted for the open data section of the web site does not allow such individual downloads. They have been mirrored on a public AWS resource, and will be fetched from there.
- XML files have not been compressed and the
OpenData_Aiuto_*.xml
files are huge (~1GB). Once compressed, their size reduce to 1/25th of the original size. So they will be stored on the AWS mirror in zipped format.
Support
There is no guaranteed support available, but authors will try to keep up with issues and merge proposed solutions into the code base.
Project Status
This project is funded by the European Commission and is currently (2021) under active developement.
Contributing
In order to contribute to this project:
- verify that python 3.7+ is being used (or use pyenv)
- verify or install poetry, to handle packages and dependencies in a leaner way, with respect to pip and requirements
- clone the project
git clone git@github.com:openpolis/eu-state-aids.git
- install the dependencies in the virtualenv, with
poetry install
, this will also install the dev dependencies - develop wildly, running tests and coverage with
coverage run -m pytest
- create a pull request
- wait for the maintainers to review and eventually merge your pull request into the main repository
Testing
Tests are under the tests folder. requests-mock is used to mock requests to remote data files, in order to avoid slow remote connections during tests.
Authors
Guglielmo Celata - guglielmo@openpolis.it
Licensing
This package is released under an MIT License, see details in the LICENSE.txt file.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file eu-state-aids-0.2.3.tar.gz
.
File metadata
- Download URL: eu-state-aids-0.2.3.tar.gz
- Upload date:
- Size: 13.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.6 CPython/3.9.4 Darwin/20.5.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 867af13c912b734ec75b72fd9f9d6688dec5c9c782bc7278ad5a0fc2c7930b0e |
|
MD5 | c1771a66e34115cd1812ef00c7ef089d |
|
BLAKE2b-256 | 5bd446388942a351a886635190b8eb7752b3a5e6846999a9f87dc2be7a67ed94 |
File details
Details for the file eu_state_aids-0.2.3-py3-none-any.whl
.
File metadata
- Download URL: eu_state_aids-0.2.3-py3-none-any.whl
- Upload date:
- Size: 11.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.6 CPython/3.9.4 Darwin/20.5.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 516ab94a1ccf9933284fb4385ad628fa74fa9c7805c0cb4ca1bf42951e999d3c |
|
MD5 | 00a6d44fd73df74d50ad97af0daa254b |
|
BLAKE2b-256 | 47e8e9976b8ea190da618277f40807eb1c2f96e6cd05b3a39d542901c8b6dd45 |