Api for getting data from dane.gov.pl
Project description
danegovpl
Tool for getting data from dane.gov.pl
Installation
pip install danegovpl
Usage
CLI
usage: __main__.py [-h] [-v] [-d DIR] [-t NUM] [-l LVL] [-f FORMAT] [-w TIME]
[-W TIME] [-r NUM] [--retry-delay TIME]
[--retry-all-errors] [-m TIMEOUT] [-k] [-L]
[--max-redirs NUM] [-A UA] [-x PROXY] [-H HEADER]
[-b COOKIE] [-B BROWSER]
[RESOURCE ...]
Tool for getting data from dane.gov.pl
positional arguments:
RESOURCE starting point for getting resources i.e.
institutions, institution.{ID}, datasets,
dataset.{ID}, resources, resource.{ID}
General:
-h, --help Show this help message and exit
-v, --version Print program version and exit
Files:
-d, --directory DIR Change directory to DIR
Settings:
-t, --threads NUM use NUM of threads
-l, --lvl LVL Get resources metadata up to level
-f, --format FORMAT Download files in specified format preference i.e.
all; jsonld; csv; xlsx, csv,jsonld,xls (if not set,
files are not downloaded)
Request settings:
-w, --wait TIME Set waiting time for each request
-W, --wait-random TIME
Set random waiting time for each request to be from 0
to TIME
-r, --retry NUM Set number of retries for failed request to NUM
--retry-delay TIME Set interval between each retry
--retry-all-errors Retry no matter the error
-m, --timeout TIMEOUT
Set request timeout, if in TIME format it'll be set
for the whole request. If in TIME,TIME format first
TIME will specify connection timeout, the second read
timeout. If set to '-' timeout is disabled
-k, --insecure Ignore ssl errors
-L, --location Allow for redirections, can be dangerous if
credentials are passed in headers
--max-redirs NUM Set the maximum number of redirections to follow
-A, --user-agent UA Sets custom user agent
-x, --proxy PROXY Use the specified proxy, can be used multiple times.
If set to URL it'll be used for all protocols, if in
PROTOCOL URL format it'll be set only for given
protocol, if in URL URL format it'll be set only for
given path. If first character is '@' then proxies are
read from file
-H, --header HEADER Set curl style header, can be used multiple times e.g.
-H 'User: Admin' -H 'Pass: 12345', if first character
is '@' then headers are read from file e.g. -H @file
-b, --cookie COOKIE Set curl style cookie, can be used multiple times e.g.
-b 'auth=8f82ab' -b 'PHPSESSID=qw3r8an829', without
'=' character argument is read as a file
-B, --browser BROWSER
Get cookies from specified browser e.g. -B firefox
dane.gov.pl groups it's data as a tree where nodes at each next level are: institution, dataset, resource.
Get metadata for all institutions and datasets and resources published by it
danegovpl institutions
This is also equivalent to
danegovpl institutions --lvl 3
Get metadata using 8 threads
danegovpl institutions -t 8
Get metadata for all institutions
danegovpl institutions --lvl 1
Get metadata for all institutions and datasets published by it
danegovpl institutions --lvl 2
Get metadata for specific institution and datasets and resources published by it
danegovpl institution.2522
Get metadata for all datasets and resources under it
danegovpl datasets
Get metadata for specific dataset
danegovpl dataset.6935
Get metadata for all datasets
danegovpl datasets --lvl 1
Get metadata for all resources
danegovpl resources
Get metadata for specific resource
danegovpl resource.3814
Get all metadata and download all resource files using 8 threads
danegovpl institutions -t 8 -f all
Get metadata for all resources and download only csv files using 8 threads
danegovpl institutions -t 8 -f csv
Get metadata for all resources and download csv files or jsonld files if csv files aren't available
danegovpl institutions -t 8 -f csv,jsonld
Get metadata for all resources and download csv files or jsonld files or xlsx files, while compressing csv and jsonld files with zstd
danegovpl institutions -t 8 -f csv,jsonld,xlsx
Output example
Can be found in examples directory and are excerpt taken from running
danegovpl institutions
this illustrates all provided formats, using datasets or resources would create a single directory with thousands of subdirectories in it.
Library
Code
from danegovpl import Api, Error, ArgError, RequestError
api = Api(timeout=30) # arguments for treerequests can be passed
try:
for datasets in api.datasets(page=2,params=[("title[prefix]","imiona")]):
for dataset in datasets['data']:
print(dataset['id'])
except RequestError as e:
print(repr(e))
Exceptions
All exceptions raised by this library are derived from Error, ArgError is raised if functions are called with incorrect arguments and RequestError is raised for errors when handling requests.
Api
Api class provides methods for interacting with dane.gov.pl, at it's initialization it accepts parameters for treerequests session.
Methods
Methods are named in fashion similar to the endpoints, some names were changed from plural to singular form to denote operation on single item.
All of them accept optional argument params: List[Tuple[str]] which represents parameters passed in url params. It's done this way, because they aren't always consistent and allow for expressions not easily representable in python code. If you know what you need you can add them manually (protip: https://dane.gov.pl/ site uses it's own api for the requests, so the params can taken from requests made by it e.g. in searches).
dga_aggregated(self, i_id: int, params: List[Tuple[str, str]] = []) -> dict
Returns data about Aggregated DGA resource - especially resource_id and dataset_id
Methods for items
The following take i_id: int denoting id of element
institution(self, i_id: int, params: List[Tuple[str, str]] = []) -> dict
Returns institution with given ID
dataset(self, i_id: int, params: List[Tuple[str, str]] = []) -> dict
Returns dataset with given ID
resource(self, i_id: int, params: List[Tuple[str, str]] = []) -> dict
Returns resource with given ID
resource_data_row(self, i_id: int, row_id: int, params: List[Tuple[str]] = []) -> str
Returns single row
showcase(self, i_id: int, params: List[Tuple[str]] = []) -> dict
Returns showcase with given ID
history(self, i_id: int, params: List[Tuple[str]] = []) -> dict
Returns history item with given ID
Methods for pages
The following take page: int = 1 and per_page: int = 100 denoting starting page and number of results per page, and return iterator yielding pages starting from page
institutions(self, params: List[Tuple[str, str]] = [], page: int = 1, per_page: int = 100) -> Iterator[dict]
Gives the ability to browse, filter and search for institutions
institution_datasets(self, i_id: int, params: List[Tuple[str, str]] = [], page: int = 1, per_page: int = 100) -> Iterator[dict]
Gives the ability to browse, filter and search for datasets of given institution
datasets(self, params: List[Tuple[str, str]] = [], page: int = 1, per_page: int = 100) -> Iterator[dict]
Gives the ability to browse, filter and search datasets
dataset_resources(self, i_id: int, params: List[Tuple[str, str]] = [], page: int = 1, per_page: int = 100) -> Iterator[dict]
Gives the ability to browse, filter and search for resources of given dataset
dataset_showcases(self, i_id: int, params: List[Tuple[str, str]] = [], page: int = 1, per_page: int = 100) -> Iterator[dict]
Gives the ability to browse, filter and search for showcases of given dataset
resources(self, params: List[Tuple[str, str]] = [], page: int = 1, per_page: int = 100) -> Iterator[dict]
Gives the ability to browse, filter and search resources
resource_data(self, i_id: int, params: List[Tuple[str, str]] = [], page=1, per_page=100) -> Iterator[dict]
Returns list of rows
search(self, params: List[Tuple[str, str]] = [], page: int = 1, per_page: int = 100) -> Iterator[dict]
Gives the ability to filter and search objects of various types: articles, datasets, institutions, resources, showcases
showcases(self, params: List[Tuple[str, str]] = [], page: int = 1, per_page: int = 100) -> Iterator[dict]
Gives the ability to browse, filter and search showcases
histories(self, params: List[Tuple[str, str]] = [], page: int = 1, per_page: int = 100) -> Iterator[dict]
Gives the ability to browse, filter and search histories
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file danegovpl-0.0.2.tar.gz.
File metadata
- Download URL: danegovpl-0.0.2.tar.gz
- Upload date:
- Size: 23.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
513767f8641a8c39c06759515ee61908dc8d7a5771d280398dd193d5e2bc6dc0
|
|
| MD5 |
13cc4a5a14a1bb723b42de86334ef62a
|
|
| BLAKE2b-256 |
b1507e0c3b80eebded8a97f3aea8c723290b1d08d05bd71ae192e3c1af1b1c7a
|
File details
Details for the file danegovpl-0.0.2-py3-none-any.whl.
File metadata
- Download URL: danegovpl-0.0.2-py3-none-any.whl
- Upload date:
- Size: 22.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e58ee050204db4a6128b177a3ba637243e201357bb2df213fd9e1fb5371d1710
|
|
| MD5 |
e2f574c0ed031e3a191b2bdf7058dac7
|
|
| BLAKE2b-256 |
cc6aa20b257615fd0b8ee26a852e5e7cae4c6a55f543ad0fe2147c945f3612e7
|