A PyCOMPSs library for Big Data scenarios.
Project description
- The Distributed DataFrame Library provides distributed algorithms and operations ready to use as a
library implemented over PyCOMPSs programming model. Currently, it is highly focused on ETL (extract- transform-load) and Machine Learning algorithms to Data Science tasks. DDF is greatly inspired by Spark’s DataFrame and its operators.
Currently, an operation can be of two types, transformations or actions. Action operations are those that produce a final result (whether to save to a file or to display on screen). Transformation operations are those that will transform an input DDF into another output DDF. Besides this classification, there are operations with one processing stage and those with two or more stages of processing (those that need to exchange information between the partitions).
When running DDF operation/algorithms, a context variable (COMPSs Context) will check the possibility of optimizations during the scheduling of COMPS tasks. These optimizations can be of the type: grouping one stage operations to a single task COMPSs and stacking operations until an action operation is found.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.