Generic framework for running data pipelines
Project description
image::./logo/logo.png[Sisyphus silhouette]
== Sísifo - Task runner
Sísifo is the Spanish form of Sisyphus, in ancient Greek: Σίσυφος. This poor
guy was punished for his self-aggrandizing craftiness and deceitfulness by
being forced to roll an immense boulder up a hill only for it to roll down
every time it neared the top, repeating this action for eternity. More
information in https://en.wikipedia.org/wiki/Sisyphus[Wikipedia].
This poor library is doomed to an eternity of performing tasks with no other
purpose in its pitiful and miserable life. I hope you didn't make fun of this
insignificant library, our existence is not much more encouraging...
=== How does it work?
Essentially, Sísifo is just a library that allows you to run tasks on a data
collection. Therefore, the most important classes of the library are:
* `sisifo.DataCollection`. A DataCollection is like a dictionary. Use a key to
store/retrieve any kind of value from a data collection. The values stored in a
data collection are called **entities**.
* `sisifo.Task`. A task is a class with a `run(data_collection)` method that,
usually, modifies the entities in a data collection.
Let's dive into an example. The fist step is to import the core of the library.
It's as simple as:
[source,python]
----
import sisifo
----
You can access all the relevant classes from the core of sisifo just with one
import. Everything else is optional, an extension of the core.
Let's create our first data collection with a couple of entities.
[source,python]
----
data = sisifo.DataCollection()
data["entity1"] = 1
data["entity2"] = 2
----
As you can see, a data collection has the same interface as a dictionary.
Try to use `keys()`, `items()` or `<str> in data`:
[source,python]
----
data.keys() # KeysView({'entity1': 1, 'entity2': 2})
data.items() # ItemsView({'entity1': 1, 'entity2': 2})
"entity1" in data # True
"entity3" in data # False
----
Nothing fancy so far, uh? Just a dictionary.
Imagine you want to add 1 to the `entity1`. You can do something like
`data["entity1"] += 1` or we can use a `sisifo.Task` for this.
[source,python]
----
class AddOne(sisifo.Task):
def run(self, data):
data["entity1"] += 1
----
On the one hand we have the data (`data` variable) and, on the other hand, we
have an operation defined inside a class (`AddOne` class). If we want to run a
task over a concrete data collection we need to call the `run` method in an
object of the class:
[source,python]
----
task = AddOne()
print(data) # {'entity1': 1, 'entity2': 2}
task.run(data)
print(data) # {'entity1': 2, 'entity2': 2}
----
We wrote a really specific transformation, it only works for a given entity
name `entity1`, what if we want to reuse the task also for adding one to the
`entity2`? Instead of using a hard-coded entity name in the run method we can
create a property in the `AddOne` class.
[source,python]
----
class AddOne(sisifo.Task):
def __init__(self, entity, **kwargs):
super().__init__(**kwargs) # this is needed to initialize the super
# class sisifo.Task
self.entity = entity
def run(self, data):
data[self.entity] += 1
----
Now we can reuse the same task on different entities:
[source,python]
----
data = sisifo.DataCollection()
data["entity1"] = 1
data["entity2"] = 2
task1 = AddOne("entity1")
task2 = AddOne("entity2")
print(data) # {'entity1': 1, 'entity2': 2}
task1.run(data)
task2.run(data)
print(data) # {'entity1': 2, 'entity2': 3}
----
Instead of running all tasks one by one we can use a pipeline. A pipeline is
an extension of the core sisifo code, so we need to import the common namespace
and create the task from there:
[source,python]
----
import sisifo.namespaces.common as common_tasks
data = sisifo.DataCollection()
data["entity1"] = 1
data["entity2"] = 2
pipe = common_tasks.Pipeline([
AddOne("entity1"),
AddOne("entity2"),
])
print(data) # {'entity1': 1, 'entity2': 2}
pipe.run(data)
print(data) # {'entity1': 2, 'entity2': 3}
----
Can we easily read a definition of this pipeline from a configuration file? Yes!
sisifo has a decorator that allows you to register a class in the sisifo task
register and you can easily create instances from that class using a dynamic
approach, that is, reading the name of the class from a string instead of calling
the specific class, like: `sisifo.create_task(dict(task="AddOne", entity="entity1"))`.
explain concepts of namespaces... TO BE CONTINUED.
== Can Sísifo do X?
If you have any concerns about whether or not sisifo can do anything, think
about the answer to this other question: *Can Python do X?*. If you can do it
with Python it means it can be done using sísifo. Maybe not out-of-the-box with
the existing tasks, but you can do it for sure after some development.
Sisifo is just a way of calling functions one after another. I think the main
advantages of the sisifo approach is that you are some kind forced to split you
code in small reusable pieces of code (tasks) and you can run those tasks just
reading from a configuration file.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
sisifo-0.1.1.tar.gz
(9.9 kB
view details)
Built Distribution
sisifo-0.1.1-py3-none-any.whl
(10.3 kB
view details)
File details
Details for the file sisifo-0.1.1.tar.gz
.
File metadata
- Download URL: sisifo-0.1.1.tar.gz
- Upload date:
- Size: 9.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.5 CPython/3.8.0 Linux/5.0.0-32-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9740a1bf61ff5717a02d2fe4fb6ee03aca75cda466c9f53db2164e962d8269d9 |
|
MD5 | d11d9e9dd73f54dc47da7330ea0e82a7 |
|
BLAKE2b-256 | 47ad27f933f2920563db9b9a9ea5ecfdf2fe2b8b631ff90dad71e47a204c45bf |
File details
Details for the file sisifo-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: sisifo-0.1.1-py3-none-any.whl
- Upload date:
- Size: 10.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.5 CPython/3.8.0 Linux/5.0.0-32-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e446d732d337229e26ab702df9c4fb4f64f1126b0f384cb1b92f8f988a5d62d3 |
|
MD5 | 5180bb77841cb6342faa4c09c2a78d02 |
|
BLAKE2b-256 | 3ef04983a3d7fe24dfd35f0eef0d2b3fc76588d547b8925426962a2b7c56e33e |