Skip to main content

Tool to manipulate and aggregate data

Project description

Tomate

Tool to Manipulate and Aggregate data

PyPI version Release status

Tomate is a Python package that provides ways to manipulate data under the form of multi-dimensional arrays. It manages multiples variables, as well as the coordinates along which those variables vary. It also provides multiple convenience functions to retrieve subparts of the data, do simple computations, or plot the data.

The data can be retrieved from disk, where it can be arranged in multiple ways and formats. Information on the data, such as variable attributes, or coordinates values can be retrieved automatically.

Features

For data in memory:

  • Keep information about the data, the variables, the coordinates. All this information is in sync with the data.
  • Select subparts of data easily, by index or by value.
  • Support for date & time dimensions.
  • Use and create convenience function for analysis, plotting,...

For data on disk:

  • Load data that spans multiple files and comes from different sources easily. Different file format ? different structure: rows or columns first ? indexing origin lower or upper ? a varying number of time steps for each file ? This is now all a breeze !
  • Scan the files automatically to find values of coordinates, variables attributes, data indexing,...
  • Load only subparts of data.
  • Logs will ensure you are loading what you want to load.

And in general:

  • Highly modulable, can be tailored to your needs.
  • Fully documented.

Get started up with a couple of lines and a NetCDF file:

from tomate.scan_library.nc import scan_file
db = scan_file("/your_file.nc")

print(db)
db.load()

For a simple showcase of some of Tomate capabilities, take a look at the get_started notebook.

Documentation

Documentation is available online at ReadTheDocs

Warning

As of now, this only supports NetCDF files out of the box. But the package can be easily extended for other file formats. See the section 'Expanding Tomate' of the documentation.

Only tested for linux.

The code has not been extensively tested for all the possible use cases it supports, and is evolving quickly. I recommend you check thorougly in the logs that the correct files are opened, and that the correct slices of data are taken from thoses files. See the documentation on logging for more information.

Features supplied in 'data_write', that allow to save a database information in a json file to avoid re-scanning it each time, is to be considered very experimental, (and is currently heavily out-of-date).

Requirements

Tomate requires python >=3.7. Tomate requires the following python packages:

  • numpy

Optional dependencies:

  • [time] cftime>=1.1.3 - To manage dates in time coordinates
  • [netcdf] netcdf4-python - To open netCDF4 files
  • [plot] matplotlib - To create plots easily
  • [compute] scipy - To do various computation on the data

Install

The package is distributed through PyPI. To install, run:

pip install tomate-data

To add optional dependencies:

pip install tomate-data [feature name]

Feature name can be Time, NetCDF, Plot, Compute.

The code is evolving quickly, it is recommended to upgrade it regurlarly:

pip install --upgrade tomate-data

Or to even install it directly from the development branch. This will place the package files in ./src, from where you just have to do a git pull to update from the latest commit:

pip install -e git+https://github.com/Descanonge/tomate.git@develop#egg=tomate-data

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tomate-data-2.2.2.tar.gz (85.2 kB view hashes)

Uploaded Source

Built Distribution

tomate_data-2.2.2-py3-none-any.whl (105.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page