Eurostat Python Package

These details have not been verified by PyPI

Project links

Source

Project description

Eurostat Python Package

Tools to read data from Eurostat website.

Features

Read Eurostat data and metadata as list of tuples or as a pandas dataframe.
MIT license.

Documentation

Getting started:

Requires Python 3.6+

pip install eurostat

In case you need to use a proxy (new in v.0.1.4):

Before doing anything else, you must configure the proxies.

eurostat.setproxy(proxyinfo)

It requires in input proxyinfo, a dictionary with two keys ('http' and 'https') and values containing the connection parameters in lists.
If authentication is not needed, set username and password to None.

Example:

>>> import eurostat
>>> proxyinfo = {'http': ['myuser', 'mypassword', '123.456.789.012:8012'],
                 'https': ['myuser', 'mypassword', 'url:port']}
>>> setproxy(proxyinfo)

It always returns None.

Read the table of contents of the main database:

As a list of tuples:

eurostat.get_toc()

Read the table of contents and return a list of tuples. The first element of the list contains the header line. Dates are represented as strings.

Example:

>>> import eurostat
>>> toc = eurostat.get_toc()
>>> toc[0]
('title', 'code', 'type', 'last update of data', 'last table structure change', 'data start', 'data end')
>>> toc[10:13]
[('Industry - quarterly data', 'ei_bsin_q_r2', 'dataset', '30.10.2019', '30.10.2019', '1980Q1', '2019Q4'),
 ('Construction - monthly data', 'ei_bsbu_m_r2', 'dataset', '30.10.2019', '30.10.2019', '1980M01', '2019M10'),
 ('Construction - quarterly data', 'ei_bsbu_q_r2', 'dataset', '30.10.2019', '30.10.2019', '1981Q1', '2019Q4')]

As a pandas dataframe:

eurostat.get_toc_df()

Read the table of contents of the main database and return a dataframe. Dates are represented as strings.

Example:

>>> import eurostat
>>> toc_df = eurostat.get_toc_df()
>>> toc_df
                                                  title  ... data end
0                                    Database by themes  ...         
1                       General and regional statistics  ...         
2     European and national indicators for short-ter...  ...         
3     Business and consumer surveys (source: DG ECFIN)   ...         
4                   Consumer surveys (source: DG ECFIN)  ...         
                                                ...  ...      ...
9860  Enterprises that provided training to develop/...  ...     2018
9861  Participation in education and training - cont...  ...         
9862  Enterprises providing training by type of trai...  ...     2015
9863  Participants in CVT courses by sex and size cl...  ...     2015
9864  Main skills targeted by CVT courses by type of...  ...     2015

You may also want to extract the datasets that pertains a topic. In that case, you can use:

eurostat.subset_toc_df(toc_df, keyword)

Extract from toc_df the row where 'title' contains 'keyword' (case-insensitive).

Example:

>>> f = eurostat.subset_toc_df(toc_df, 'fleet')
>>> f
title, code, type, last update of data, last table structure change, data start, data end
                                               title              code       type  ... data end
5631                                   Fishing fleet        fish_fleet     folder  ...         
5632  Fishing fleet by age, length and gross tonnage    fish_fleet_alt    dataset  ...     2018
5633  Fishing fleet by type of gear and engine power     fish_fleet_gp    dataset  ...     2018
6246   Commercial aircraft fleet by type of aircraft   avia_eq_arc_typ    dataset  ...     2017
6247    Commercial aircraft fleet by age of aircraft   avia_eq_arc_age    dataset  ...     2017
7849                    Fishing fleet, total tonnage          tag00083      table  ...     2018
7850                Fishing Fleet, Number of Vessels          tag00116      table  ...     2018

Note that, in the above example, the first returned row represents a folder, not a dataset.

Read a dataset from the main database:

As a list of tuples:

eurostat.get_data(code, flags=False)

Read a dataset from the main database (available from the bulk download facility) and returns it as a list of tuples. The first element of the list ("the first row") is the data header. Pay attention: the data format changes if flags is True or not. Flag meanings can be found here.

Example:

>>> import eurostat
>>> data = eurostat.get_data('demo_r_d2jan')
>>> data
[('unit', 'sex', 'age', 'geo\\time', 2018, 2017, 2016, 2015, 2014, ...),
 ('NR', 'F', 'TOTAL', 'AL', 1431715.0, None, 1417141.0, 1424597.0, 1430827.0, ...),
  ...]
>>> data = eurostat.get_data('demo_r_d2jan', True)
>>> data
[('unit', 'sex', 'age', 'geo\\time', '2018_value', '2017_flag', '2017_value', '2018_flag', '2016_value', '2016_flag', ...),
 ('NR', 'F', 'TOTAL', 'AL', 1431715.0, '', 1423050.0, 'c', 1417141.0, '', 1424597.0, '', ...),
  ...]

As a pandas dataframe:

eurostat.get_data_df(code, flags=False)

Read a dataset from the main database (available from the bulk download facility) and returns it as a pandas dataframe. Flag meanings can be found here.

Example:

>>> import eurostat
>>> df = eurostat.get_data_df('demo_r_d2jan')
>>> df
       unit sex     age geo\time  ...     1993     1992  1991  1990
0        NR   F   TOTAL       AL  ...      NaN      NaN   NaN   NaN
1        NR   F   TOTAL      AL0  ...      NaN      NaN   NaN   NaN
2        NR   F   TOTAL     AL01  ...      NaN      NaN   NaN   NaN
3        NR   F   TOTAL     AL02  ...      NaN      NaN   NaN   NaN
4        NR   F   TOTAL     AL03  ...      NaN      NaN   NaN   NaN
    ...  ..     ...      ...  ...      ...      ...   ...   ...
168607   NR   T  Y_OPEN     UKM7  ...      NaN      NaN   NaN   NaN
168608   NR   T  Y_OPEN     UKM8  ...      NaN      NaN   NaN   NaN
168609   NR   T  Y_OPEN     UKM9  ...      NaN      NaN   NaN   NaN
168610   NR   T  Y_OPEN      UKN  ...  17934.0  17566.0   NaN   NaN
168611   NR   T  Y_OPEN     UKN0  ...  17934.0  17566.0   NaN   NaN
>>> df = eurostat.get_data_df('demo_r_d2jan', True)
>>> df
       unit sex     age geo\time  ...  1992_value 1992_flag  1991_value 1991_flag  1990_value 1990_flag
0        NR   F   TOTAL       AL  ...        NaN         :         NaN         :         NaN         :
1        NR   F   TOTAL      AL0  ...        NaN         :         NaN         :         NaN         :
2        NR   F   TOTAL     AL01  ...        NaN         :         NaN         :         NaN         :
3        NR   F   TOTAL     AL02  ...        NaN         :         NaN         :         NaN         :
4        NR   F   TOTAL     AL03  ...        NaN         :         NaN         :         NaN         :
    ...  ..     ...      ...  ...         ...       ...       ...         ...       ...
168607   NR   T  Y_OPEN     UKM7  ...        NaN         :         NaN         :         NaN         :
168608   NR   T  Y_OPEN     UKM8  ...        NaN         :         NaN         :         NaN         :
168609   NR   T  Y_OPEN     UKM9  ...        NaN         :         NaN         :         NaN         :
168610   NR   T  Y_OPEN      UKN  ...    17566.0                   NaN         :         NaN         :
168611   NR   T  Y_OPEN     UKN0  ...    17566.0                   NaN         :         NaN         :

Get an Eurostat dictionary:

eurostat.get_dic(code)

Read the metadata related to a particular code. Return a list of tuples, where the first element of each tuple is the code value and the second one is its description.

Example:

>>> import eurostat
>>> dic = eurostat.get_dic('sex')
>>> dic
[('T', 'Total'),
 ('M', 'Males'),
 ('F', 'Females'),
 ('DIFF', 'Absolute difference between males and females'),
 ('NAP', 'Not applicable'),
 ('NRP', 'No response'),
 ('UNK', 'Unknown')]

Check what datasets are available via SDMX:

As a list of tuples:

eurostat.get_avail_sdmx()

Return a list of tuples. The first element of the list contains the header line. Example:

>>> avail_sdmx = eurostat.get_avail_sdmx()
>>> avail_sdmx
[('dataflow', 'name'),
 ('DS-008573', 'Sold production, exports and imports for steel by PRODCOM list (NACE Rev. 1.1) - monthly data'),
 ('DS-016890', 'EU trade since 1988 by CN8'),
 ('DS-016893', 'EU trade since 1988 by HS6')
 ...]

As a pandas dataframe:

eurostat.get_avail_sdmx_df()

Return a dataframe with one column. Dataflow (i.e. dataset) codes are in the dataframe index. Example:

>>> avail_sdmx_df = eurostat.get_avail_sdmx_df()
>>> avail_sdmx_df
                                                             name
dataflow                                                         
DS-008573       Sold production, exports and imports for steel...
DS-016890                              EU trade since 1988 by CN8
DS-016893                              EU trade since 1988 by HS6
DS-016894                          EU trade since 1988 by HS2-HS4
DS-018995                             EU trade since 1988 by SITC
                                                          ...
yth_incl_120    Young people living in households with very lo...
yth_part_010    Frequency of getting together with relatives o...
yth_part_020    Frequency of contacts with relatives or friend...
yth_part_030    Participation of young people in activities of...
yth_volunt_010  Participation of young people in informal volu...

You may also want to find the datasets that pertains a topic. In that case, you can use:

eurostat.subset_avail_sdmx_df(avail_sdmx_df, keyword)

Extract the rows where 'name' contains 'keyword' (case-insensitive). Example:

>>> keyword = 'fleet'
>>> subset = eurostat.subset_avail_sdmx_df(avail_sdmx_df, keyword)
>>> subset
                                                           name
dataflow                                                       
avia_eq_arc_age    Commercial aircraft fleet by age of aircraft
avia_eq_arc_typ   Commercial aircraft fleet by type of aircraft
fish_fleet_alt   Fishing fleet by age, length and gross tonnage
fish_fleet_gp    Fishing fleet by type of gear and engine power
tag00083                           Fishing fleet, total tonnage
tag00116                       Fishing Fleet, Number of Vessels

Read the Eurostat dimensions of a dataset that is available via SDMX service:

eurostat.get_sdmx_dims(code)

Read the dimension names of a dataset that is provided via SDMX service. Require the dataset code and return a list.

Example:

>>> import eurostat
>>> dims = eurostat.get_sdmx_dims('DS-066341')
>>> dims
['DECL', 'FREQ', 'INDICATORS', 'PERIOD', 'PRCCODE']

Read an Eurostat dictionary for a given SDMX dimension:

eurostat.get_sdmx_dic(code, dim)

Read the Eurostat dimension values with their meaning for a dataset provided via SDMX service. Return them as a dictionary.

Example:

>>> import eurostat
>>> dic = eurostat.get_sdmx_dic('DS-066341', 'FREQ')
>>> dic
{'A': 'Annual',
 'D': 'Daily',
 'H': 'Half-year',
 'M': 'Monthly',
 'Q': 'Quarterly',
 'S': 'Semi-annual',
 'W': 'Weekly'}
>>> flags = eurostat.get_sdmx_dic('DS-066341', 'OBS_STATUS')
>>> flags
{'-': 'not applicable or real zero or zero by default',
 '0': 'less than half of the unit used',
 'na': 'not available'}

Read a dataset from the SDMX service:

As a list of tuples:

eurostat.get_sdmx_data(code, StartPeriod, EndPeriod, filter_pars, flags=False, verbose=True)

Read a dataset from SDMX service, with or without the flags. Return a list of tuples. The first tuple (row) contains the header.
It allows to download some datasets that are not available from the main database (e.g., Comext).
This service is slow, so you will better select the subset you need and set the filter parameters along the available dimensions by setting filter_pars (a dictionary where keys are dimensions names, values are lists).
To see a rough progress status, set verbose = True.

Example:

>>> import eurostat
>>> StartPeriod = 2007
>>> EndPeriod = 2008
>>> filter_pars = {'FREQ': ['A',], 'PRCCODE': ['08111250','08111150']}
>>> data = eurostat.get_sdmx_data('DS-066341', StartPeriod, EndPeriod, filter_pars, flags = False, verbose=True)
Progress: 0.0%
Progress:50.0%
Progress:100.0%
>>> data
[('INDICATORS', 'DECL', 'PRCCODE', 'FREQ', 2007, 2008),
 ('EXPQNT', '001', '08111250', 'A', 10219200.0, 16082600.0),
 ('EXPVAL', '001', '08111250', 'A', 1697160.0, 1875920.0),
 ...]

As a pandas dataframe:

eurostat.get_sdmx_data_df(code, StartPeriod, EndPeriod, filter_pars, flags=False, verbose=True)

Read a dataset from SDMX service, with or without the flags. Return a pandas dataframe.
It allows to download some datasets that are not available from the main database (e.g., Comext).
This service is slow, so you will better select the subset you need and set the filter parameters along the available dimensions by setting filter_pars (a dictionary where keys are dimensions names, values are lists).
To see a rough progress status, set verbose = True.

Example:

>>> import eurostat
>>> StartPeriod = 2007
>>> EndPeriod = 2008
>>> filter_pars = {'FREQ': ['A',], 'PRCCODE': ['08111250','08111150']}
>>> df = eurostat.get_sdmx_data_df('DS-066341', StartPeriod, EndPeriod, filter_pars, flags = True, verbose=True)
Progress: 0.0%
Progress:50.0%
Progress:100.0%
>>> df
    INDICATORS DECL   PRCCODE FREQ        2007 2007_OBS_STATUS        2008 2008_OBS_STATUS
0       EXPQNT  001  08111250    A  10219200.0                  16082600.0                
1       EXPVAL  001  08111250    A   1697160.0                   1875920.0                
2       IMPQNT  001  08111250    A   7526000.0                   4272200.0                
3       IMPVAL  001  08111250    A   1802940.0                   1208030.0                
4     PQNTBASE  001  08111250    A         0.0                         0.0                
..         ...  ...       ...  ...         ...             ...         ...             ...
875    PRODQNT  600  08111150    A         0.0                         0.0                
876    PRODVAL  600  08111150    A         0.0                         0.0                
877   PVALBASE  600  08111150    A         0.0                         0.0                
878   PVALFLAG  600  08111150    A         NaN              na         NaN              na
879    QNTUNIT  600  08111150    A         NaN                         NaN

Bug reports and feature requests:

Please open an issue or send a message to noemi.cazzaniga [at] polimi.it .

Disclaimer:

Download and usage of Eurostat data is subject to Eurostat's general copyright notice and licence policy (see Policies). Please also be aware of the European Commission's general conditions.

Data sources:

Eurostat database: online catalog and bulk download facility.
Eurostat nomenclatures: RAMON metadata.
Eurostat Interactive Data Explorer: Data Explorer.
Eurostat Interactive Tool for Comext Data: Easy Comext.
Eurostat acronyms: Symbols and abbreviations.

References:

R package eurostat: R Tools for Eurostat Open Data.
Python package pandaSDMX: Statistical Data and Metadata eXchange.
Python package pandas: Python Data Analysis Library.

History:

version 0.1.5 (08 Jan. 2020):

Bug fix (proxy info).
get_avail_sdmx, get_avail_sdmx_df, subset_avail_sdmx_df added.

version 0.1.4 (20 Dec. 2019):

Added support to proxy.

version 0.1.3 (17 Dec. 2019):

Bug fix (non-annual data headers).

version 0.1.2 (25 Nov. 2019):

Possibility to download flags introduced.
get_toc_df, subset_toc_df added.

verion 0.1.1 (21 Nov. 2019):

First official release.

Project details

These details have not been verified by PyPI

Project links

Source

Release history Release notifications | RSS feed

1.1.1

Jun 13, 2024

1.1.0

Mar 9, 2024

1.0.4

Apr 3, 2023

1.0.3

Mar 30, 2023

1.0.2

Mar 14, 2023

1.0.1

Oct 12, 2022

1.0.0

Oct 8, 2022

0.2.3

Apr 6, 2021

0.2.1

Nov 10, 2020

0.2.0

May 22, 2020

This version

0.1.5

Jan 8, 2020

0.1.4

Dec 20, 2019

0.1.3

Dec 17, 2019

0.1.2

Nov 25, 2019

0.1.1

Nov 21, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

eurostat-0.1.5.tar.gz (15.4 kB view details)

Uploaded Jan 8, 2020 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

eurostat-0.1.5-py3-none-any.whl (10.4 kB view details)

Uploaded Jan 8, 2020 Python 3

File details

Details for the file eurostat-0.1.5.tar.gz.

File metadata

Download URL: eurostat-0.1.5.tar.gz
Upload date: Jan 8, 2020
Size: 15.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.4.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for eurostat-0.1.5.tar.gz
Algorithm	Hash digest
SHA256	`b60eb6d1ce7ec0a18a91ea933f4f1fb4017804a39fbcde2995fe7812eb5bcf27`
MD5	`70ed9d8689819e4e5b432163200d9d43`
BLAKE2b-256	`fe04a116c364a8bcddc88fbc9dfa295d806eb5afd4094358dee3004094156c8e`

See more details on using hashes here.

File details

Details for the file eurostat-0.1.5-py3-none-any.whl.

File metadata

Download URL: eurostat-0.1.5-py3-none-any.whl
Upload date: Jan 8, 2020
Size: 10.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.4.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for eurostat-0.1.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c163181aba7e83f77f3e0aff0ce101344c234c8be18df2b29b401cbbd0bee1f3`
MD5	`c166abb60ab7a493856a021134faed8f`
BLAKE2b-256	`868f1b439f26d832479b171761bf9c02b21691462e62782ad21898795bb9f6e0`

See more details on using hashes here.

eurostat 0.1.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Eurostat Python Package

Features

Documentation

Getting started:

In case you need to use a proxy (new in v.0.1.4):

Read the table of contents of the main database:

As a list of tuples:

As a pandas dataframe:

Read a dataset from the main database:

As a list of tuples:

As a pandas dataframe:

Get an Eurostat dictionary:

Check what datasets are available via SDMX:

As a list of tuples:

As a pandas dataframe:

Read the Eurostat dimensions of a dataset that is available via SDMX service:

Read an Eurostat dictionary for a given SDMX dimension:

Read a dataset from the SDMX service:

As a list of tuples:

As a pandas dataframe:

Bug reports and feature requests:

Disclaimer:

Data sources:

References:

History:

version 0.1.5 (08 Jan. 2020):

version 0.1.4 (20 Dec. 2019):

version 0.1.3 (17 Dec. 2019):

version 0.1.2 (25 Nov. 2019):

verion 0.1.1 (21 Nov. 2019):

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes