Skip to main content

Keeps a local data repository up to date with different remote data sources.

Project description


Periodically retrieve data from different sources.

The databird package only provides a framework to plan and run the tasks needed to keep a local data-file-store up do date with various remote sources. The remote sources can be anything (e.g. FTP Server, ECMWF, HTTP Api, SQL database, ...), as long as there is a databird-driver available for the specific source.


Databird is configured with configuration files and invoked by

$ databird retrieve -c /etc/databird/databird.conf

# or (as the above is the default)
$ databird retrieve

You can store the configuration files anywhere and for example run the above command periodically as cron job.

Also, some rq workers are required:

$ rq worker databird

This will start one worker. You should use a supervisor to start multiple workers.


The following example configuration defines a repository, which is populated with daily GNSS data from

The main configuration file (usually databird.conf) could look like that:

  root: /data/repos # root path for data repositories
  num-workers: 16   # max number of async workers
  include: "databird.conf.d/*.conf"  # include config files

Generally you can configure anything in any file, as all configuration files are merged to one configuration tree. The include option is an exception, as it can only be declared in the top config file.

Then in databird.conf.d/cddis.conf you can configure a profile and a repository:

    driver: standard.FtpDriver
      user: anonymous
      password: ""
      tls: False

    description: Data from NASAs Archive of Space Geodesy Data
    profile: nasa_cddis
    period: 1 day
    delay: 2 days
    start: 2019-01-01
      status: "{time:%Y}/cddis_gnss_{iso_date}.status"
      user: anonymous  # this could override 'user' from profile
      root: "/gnss/data/daily"
        status: "{time:%Y}/{time:%j}/{time:%y%j}.status"

When calling databird with this configuration the following is achieved:

  • A repository in the folder /data/repos/nasa_gnss/ is created
  • For every day, a file like 2019/nasa_gnss_2019-01-20.status is expected
  • If that file is missing, retrieve it from
  • If there are many files missing, the data is retrieved asynchronously

This example used the standard.FTPDriver.


Use databird webmonitor [PORT] to start the web interface.

Since databird uses RQ for managing jobs, you also check the options at RQ/docs/monitoring.


Anyone can write drivers (see below). Currently, the following drivers are available:


  • standard.FilesystemDriver: Retrieve data from the local filesystem
  • standard.CommandDriver: Run an arbitrary shell command
  • standard.FtpDriver: Retrieve data from an FTP server


  • climate.EcmwfDriver: Retrieve data from the European Centre for Medium-Range Weather Forecasts (ECMWF) via their API
  • climate.C3SDriver: Retrieve data from the Copernicus Climate Change Service (C3S) via their API
  • climate.GesDiscDriver: Retrieve data from the NASA EarthData GES DISC service.


  1. Create a Python environment and activate it
    $ python3 -m venv . && source bin/activate
  2. Install the development environment:
    (databird) $ pip install -r requirements-dev.txt

Writing a new driver

Drivers are published in a namespace package databird-drivers. Everyone can develop drivers and share them.

Install databird and run mr.bob to create a new driver package:

(databird) $ cd $HOME/projects
(databird) $ python -m mrbob.cli databird.blueprints:driver

After answering some questions, a new directory databird-driver-<chosen_name> is created. Lets asume <chosen_name> = foo, then your driver is usually implemented in databird/drivers/foo/ in a class named FooDriver(). Until more documentation is available, you have to look at the code to figure out how to write a driver.

Other people will be able to use it with driver: foo.FooDriver.

Tell me if you wrote a new driver, so I can include it in the list.

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for databird, version 0.7.1
Filename, size File type Python version Upload date Hashes
Filename, size databird-0.7.1.tar.gz (13.5 kB) File type Source Python version None Upload date Hashes View
Filename, size databird-0.7.1-py3-none-any.whl (16.3 kB) File type Wheel Python version py3 Upload date Hashes View

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page