apibackuper: a command-line tool and python library for API backuping
Project description
apibackuper is a command line tool to archive/backup API calls. It’s goal to download all data behind REST API and to archive it to local storage. This tool designed to backup API data, so simple as possible.
1 History
This tool was developed optimize backup/archival procedures for Russian government information from E-Budget portal budget.gov.ru and some other government IT systems too. Examples of tool usage could be found in “examples” directory
2 Main features
Any GET/POST iterative API supported
Allows to estimate time required to backup API
Stores data inside ZIP container
Supports export of backup data as JSON lines file
Documentation
Test coverage
3 Installation
3.1 Linux
Most Linux distributions provide a package that can be installed using the system package manager, for example:
# Debian, Ubuntu, etc.
$ apt install apibackuper
# Fedora
$ dnf install apibackuper
# CentOS, RHEL, ...
$ yum install apibackuper
# Arch Linux
$ pacman -S apibackuper
3.2 Windows, etc.
A universal installation method (that works on Windows, Mac OS X, Linux, …, and always provides the latest version) is to use pip:
# Make sure we have an up-to-date version of pip and setuptools:
$ pip install --upgrade pip setuptools
$ pip install --upgrade apibackuper
(If pip installation fails for some reason, you can try easy_install apibackuper as a fallback.)
3.3 Python version
Python version 3.6 or greater is required.
4 Quickstart
This example is about backup of Russian certificate authorities. List of them published at e-trust.gosuslugi.ru and available via undocumented API.
$ apibackuper create etrust
$ cd etrust
Edit apibackuper.cfg as:
[settings]
initialized = True
name = etrust
[project]
description = E-Trust UC list
url = https://e-trust.gosuslugi.ru/app/scc/portal/api/v1/portal/ca/list
http_mode = POST
work_modes = full,incremental,update
iterate_by = page
[params]
page_size_param = recordsOnPage
page_size_limit = 100
page_number_param = page
[data]
total_number_key = total
data_key = data
item_key = РеестровыйНомер
change_key = СтатусАккредитации.ДействуетС
[storage]
storage_type = zip
Add file params.json with parameters used with POST requests
{"page":1,"orderBy":"id","ascending":false,"recordsOnPage":100,"searchString":null,"cities":null,"software":null,"cryptToolClasses":null,"statuses":null}
Execute command “estimate” to see how long data will be collected and how much space needed
$ apibackuper estimate full
Output:
Total records: 502
Records per request: 100
Total requests: 6
Average record size 32277.96 bytes
Estimated size (json lines) 16.20 MB
Avg request time, seconds 66.9260
Estimated all requests time, seconds 402.8947
Execute command “run” to collect the data. Result stored in “storage.zip”
$ apibackuper run full
Exports data from storage and saves as jsonl file called “etrust.jsonl”
$ apibackuper export jsonl etrust.jsonl
5 Config options
Example config file
[settings]
initialized = True
name = <name>
[project]
description = <description>
url = <url>
http_mode = GET
work_modes = full,incremental,update
iterate_by = page
[params]
page_size_param = <page size param>
page_size_limit = <page size limit>
page_number_param = <page number>
[data]
total_number_key = <total number key>
data_key = <data key>
item_key = <item key>
change_key = <change key>
[storage]
storage_type = zip
compression = True
5.1 settings
name - short name of the project
5.2 project
description - text that explains what for is this project
url - API endpoint url
http_mode - one of HTTP modes: GET or POST
work_modes - type of operations: full - archive everything, incremental - add new records only, update - collect changed data only
iterate_by - type of iteration of records. By ‘page’ - default, page by page or by ‘range’ if skip value provided
5.3 params
page_size_param - parameter with page size
page_size_limit - limit of records provided by API
page_number_param = parameter with page number
5.4 data
total_number_key - key in data with total number of records
data_key - key in data with list of records
item_key - key in data with unique identifier of the record. Could be group of keys separated with comma
change_key - key in data that indicates that record changed. Could be group of keys separated with comma
5.5 storage
storage_type - type of local storage. ‘zip’ is local zip file is default one
compression - if True than compressed ZIP file used, less space used, more CPU time processing data
6 Usage
Synopsis:
$ apibackuper [flags] [command] inputfile
See also apibackuper --help.
6.1 Examples
Create project “budgettofk”:
$ apibackuper create budgettofk
Estimate execution time for ‘budgettofk’ project. Should be called in project dir or project dir provided via -p parameter:
$ apibackuper estimate full -p budgettofk
Output
Total records: 12282
Records per request: 500
Total requests: 25
Average record size 1293.60 bytes
Estimated size (json lines) 15.89 MB
Avg request time, seconds 1.8015
Estimated all requests time, seconds 46.0536
Run project. Should be called in project dir or project dir provided via -p parameter
$ apibackuper run full
Export data from project. Should be called in project dir or project dir provided via -p parameter
$ apibackuper export jsonl hhemployers.jsonl -p hhemployers
7 Advanced
TBD
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.