Skip to main content

Planet-scale distributed computing in Python.

Project description

Pythagoras

PyPI version Python versions License: MIT Downloads Code style: pep8 Docstring Style: Google Ruff

Planet-scale distributed computing in Python.

!!! RESEARCH PREVIEW !!!

What is it?

Pythagoras is a super-scalable, easy-to-use, and low-maintenance framework for (1) massive algorithm parallelization and (2) hardware usage optimization in Python. It simplifies and speeds up data science, machine learning, and AI workflows.

Pythagoras excels at complex, long-running, resource-demanding computations. It’s not recommended for real-time, latency-sensitive workflows.

For a comprehensive list of terms and definitions, see the Glossary.

Tutorials

Pythagoras elevates two popular techniques — memoization and parallelization — to a global scale and then fuses them, unlocking performance and scalability that were previously out of reach.

Drawing from many years of functional-programming practice, Pythagoras extends these proven ideas to the next level. In a Pythagoras environment, you can seamlessly employ your preferred functional patterns, augmented by new capabilities.

!!! BOOKMARK THIS PAGE AND COME BACK LATER, WE WILL PUBLISH MORE TUTORIALS SOON !!!

Videos

Usage Examples

Importing Pythagoras:

from pythagoras.core import *
import pythagoras as pth

Creating a portal based on a (shared) folder:

my_portal = get_portal("./my_local_folder")

Checking the state of a portal:

my_portal.describe()

Decorating a function:

@pure()
def my_long_running_function(a:float, b:float) -> float:
  from time import sleep # imports must be placed inside a pure function
  sleep(5)
  return a+10*b

Using a decorated function synchronously:

result = my_long_running_function(a=1, b=2) # only named arguments are allowed

Using a decorated function asynchronously:

future_result_address = my_long_running_function.swarm(a=10, b=20)
if ready(future_result_address):
    result = get(future_result_address)

Pre-conditions for executing a function:

@pure(pre_validators=[
    unused_ram(Gb=5),
    installed_packages("scikit-learn","pandas"),
    unused_cpu(cores=10)])
def my_long_running_function(a:float, b:float) -> float:
  from time import sleep
  sleep(5)
  return a+10*b

Recursion:

@pure(pre_validators=[recursive_parameters("n")])
def factorial(n:int)->int:
  if n == 1:
    return 1
  else:
    return n*factorial(n=n-1) # only named arguments are allowed

Partial function application:

@pure()
def my_map(input_list:list, transformer: PureFn)->list:
  result = []
  for element in input_list:
    transformed_element = transformer(x=element)
    result.append(transformed_element)
  return result

@pure()
def my_square(x):
  return x*x

result = my_map(input_list=[1,2,3,4,5], transformer=my_square)

my_square_map = my_map.fix_kwargs(transformer = my_square)

result = my_square_map(input_list=[1,2,3,4,5])

Mutually recursive functions:

@pure(pre_validators=recursive_parameters("n"))
def is_even(n:int, is_odd ,is_even)->bool:
  if n in {0,2}:
    return True
  else:
    return is_odd(n=n-1, is_even=is_even, is_odd=is_odd)

@pure(pre_validators=recursive_parameters("n"))
def is_odd(n:int, is_even, is_odd)->bool:
  if n in {0,2}:
    return False
  else:
    return is_even(n=n-1, is_odd=is_odd, is_even=is_even)

(is_even, is_odd) = (
  is_even.fix_kwargs(is_odd=is_odd, is_even=is_even)
  , is_odd.fix_kwargs(is_odd=is_odd, is_even=is_even) )

assert is_even(n=10)
assert is_odd(n=11)

Core Concepts

  • Portal: A persistent gateway that connects your application to the world beyond the current execution session. Portals link runtime state to persistent storage that survives across multiple runs and machines. They provide a unified interface for data persistence, caching, and state management, abstracting away storage backend complexities (local filesystem, cloud storage, etc.) and handling serialization transparently. A program can use multiple portals, each with its own storage backend, and each portal can serve multiple applications. Portals define the execution context for pure functions, enabling result caching and retrieval.

  • Autonomous Function: A self-contained function with no external dependencies. All imports must be done inside the function body. These functions cannot use global objects (except built-ins), yield statements, or nonlocal variables, and must be called with keyword arguments only. This design ensures complete isolation and portability, making autonomous functions ideal building blocks for distributed computing—they carry all dependencies with them and maintain clear interfaces.

  • Pure Function: An autonomous function that has no side effects and always returns the same result for the same arguments. Pythagoras caches pure function results using content-based addressing: if a function is called multiple times with identical arguments, it executes only once, and cached results are returned for subsequent calls. This memoization works seamlessly across machines in a distributed system, enabling significant performance improvements for computationally intensive workflows.

  • Validator: An autonomous function that checks conditions before or after executing a pure function. Pre-validators run before execution, post-validators run after. Validators can be passive (e.g., check available RAM) or active (e.g., install a missing library). They help ensure reliable distributed execution by validating requirements and system state. Multiple validators can be combined using standard decorator syntax.

  • Value Address: A globally unique, content-derived address for an immutable value. It consists of a human-readable descriptor (based on type and shape/length) and a hash signature (SHA-256) split into parts for storage efficiency. Creating a ValueAddr(data) computes the content hash and stores the value in the active portal's storage, allowing later retrieval via the address. Value addresses identify stored results and reference inputs/outputs across distributed systems.

  • Execution Result Address: A Value Address representing the result of a pure function execution. It combines the function's signature with input parameters to create a unique identifier. In swarm mode, functions immediately return an Execution Result Address that acts as a "future" reference for checking execution status and retrieving results. These addresses remain valid across application restarts and can be shared between machines.

  • Swarming: An asynchronous execution model where you don't know when, where, or how many times your function will execute. Pythagoras guarantees eventual execution at least once but offers no further guarantees. This model maximizes flexibility by decoupling function calls from execution—functions can be queued, load-balanced, retried on failure, and parallelized automatically. The trade-off is reduced control over timing and location in exchange for improved scalability, fault tolerance, and resource utilization.

For a complete list of terms and detailed definitions, see the Glossary.

How to get it?

The source code is hosted on GitHub at: https://github.com/pythagoras-dev/pythagoras

Installers for the latest released version are available at the Python package index at: https://pypi.org/project/pythagoras

Using uv :

uv add pythagoras

Using pip (legacy alternative to uv):

pip install pythagoras

Dependencies

Project Statistics

Metric Main code Unit Tests Total
Lines Of Code (LOC) 11181 15736 26917
Source Lines Of Code (SLOC) 4524 9785 14309
Classes 53 48 101
Functions / Methods 479 1286 1765
Files 62 224 286

Contributing

Interested in contributing to Pythagoras? Please see our Contributing Guidelines.

For project documentation standards, see:

Key Contacts

About The Name

Pythagoras of Samos was a famous ancient Greek thinker and scientist who was the first man to call himself a philosopher ("lover of wisdom"). He is most recognised for his many mathematical findings, including the Pythagorean theorem.

Not everyone knows that in antiquity, Pythagoras was also credited with major astronomical discoveries, such as sphericity of the Earth and the identity of the morning and evening stars as the planet Venus.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pythagoras-0.54.2.tar.gz (214.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pythagoras-0.54.2-py3-none-any.whl (136.6 kB view details)

Uploaded Python 3

File details

Details for the file pythagoras-0.54.2.tar.gz.

File metadata

  • Download URL: pythagoras-0.54.2.tar.gz
  • Upload date:
  • Size: 214.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for pythagoras-0.54.2.tar.gz
Algorithm Hash digest
SHA256 16068dbc890352dfb0d30168bec6f71bd0f23076e955e42879480ff63f296177
MD5 6a986295604c872b2a13fbbddc402079
BLAKE2b-256 816bc2d84172a67e0fbd186bae784af7982a1a764feb8291fb157d22dde699f4

See more details on using hashes here.

File details

Details for the file pythagoras-0.54.2-py3-none-any.whl.

File metadata

  • Download URL: pythagoras-0.54.2-py3-none-any.whl
  • Upload date:
  • Size: 136.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for pythagoras-0.54.2-py3-none-any.whl
Algorithm Hash digest
SHA256 09b34e0e4bf084fe9b21ab18e2f3f6e7b5f80ac938a48a2f191dbaf3fac1c96d
MD5 d822f331df2ac70eaf87687130b192f2
BLAKE2b-256 a39c282bb4db33dad59a1229b8ad41f0fa40342e529bc17440c9183e4cf475d6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page