pyfora - Compiled, parallel python
pyfora is the client package for Ufora_ - a compiled, automatically parallel Python for data science
and numerical computing.
Ufora achieves speed and scale by reasoning about your python code to compile
it to machine code (so it's fast) and find parallelism in it (so that it scales). The Ufora
runtime is fully fault tolerant, and handles all the details of data
management and task scheduling transparently to the user.
The Ufora runtime is invoked by enclosing code in a "ufora.remote" block. Code
and objects are shipped to a Ufora cluster and executed in parallel across
those machines. Results are then injected back into the host python
environment either as native python objects, or as handles (in case the
objects are very large). This allows you to pick the subset of your code that
will benefit from running in Ufora - the remainder can run in your regular
For all of this to work properly, Ufora places one major restriction on
the code that it runs: it must be "pure", meaning that it cannot modify data
structures or have side effects. This restriction allows the Ufora runtime to
agressively reorder calculations, which is crucial for
parallelism, and allows it to perform compile-time
optimizations than would not be possible otherwise. For more on the subset of python
that Ufora supports, see `python restrictions`_.
.. _python restrictions: https://ufora.github.io/ufora/documentation/python-restrictions.html
The pyfora client is a pure python package and can be installed by running:
pip install pyfora
Getting Started with Ufora
The ufora backend is available as a docker image that can be run locally on your machine, or in a
cluster of machines on a local network or in the cloud.
- `Getting started with local Ufora`_
- `Getting started with Ufora on AWS`_
- `Running Ufora on a local cluster`_
.. _Getting started with local Ufora: https://ufora.github.io/ufora/tutorials/getting-started-local.html
.. _Getting started with Ufora on AWS: https://ufora.github.io/ufora/tutorials/getting-started-aws.html
.. _Running Ufora on a local cluster: https://ufora.github.io/ufora/tutorials/getting-started-cluster.html
Pyfora is developed and maintained by the Ufora_ team. Find us on Github_.
.. _Distribute: http://pypi.python.org/pypi/distribute
.. _Ufora: https://ufora.github.io/ufora
.. _Github: https://github.com/ufora/ufora
* Release date Sep-09-2016
* [bug #279] Line number collisions can create PyObject rehydration errors
* [feature] Added pyfora.aws.Cluster for programmatically managing pyfora clusters
* Release date Aug-30-2016
* [bug] Fixed a sorting stability issue for objects that contain vectors
* [bug] Fixed a memory-management bug during large-scale sorting
* [bug] Fix a bug causing comparison of bools to produce invalid large-scale sort results
* [bug #275] Sorting doesn't do well on vectors with many equal values
* [feature] "print" statements now flow through pyfora
* [feature] Added support for time.time
* [bug #273] pyfora_aws fails to add/stop on-demand instances
* Release date Aug-22-2016
* [bug] fix a bug in pyfora sorting
* Release date Aug-15-2016
* [feature] use DistributedDataTasks sorting for large lists
* Release date June-27-2016
* [feature] add method to return number of workers in cluster
* [feature] separate ufora-worker log into main and "core"
* [feature] reduce verbosity of main ufora-worker log
* Release date June-3-2016
* [feature] add pyfora support for numpy.random.mtrand.RandomState
* [feature] rewrite the scheduler
* (various other backend bugfixes and stability improvements)
* Release date May-7-2016
* [feature] Added log rotation to ufora/service docker image
* [bug #201] `pyfora_aws stop` cancels unfulfilled spot instance requests
* [feature] Added --name option to pyfora_aws
* [bug #263] pyfora.connect hangs when given invalid host/port
* [feature] Added iPython/jupyter support
* Release date Apr-25-2016
* [bug #258] can't call base class __init__ if we have no data members.
* [bug #255] implementing __getitem__() doesn't make a class iterable
* [bug] defer more unconvertible values to runtime.
* Release date Apr-14-2016
* [feature] implement various numpy functions: abs, eigh, inv, eye,
* [feature] implement PyTuple `__lt__()`, `__gt__()`
* [feature #23] implement named-arg calling in pyfora
* [feature] improve performance of string -> int in multithreaded contexts
* [feature] speed up DistributedDataTasks
* [feature] various compiler optimizations
* [features] various pyfora_aws features:
* add a command to cycle all the workers, managers, and their logs
* tooling to extract expressions from ufora-worker.log on remote machines
* tooling to run htop on all pyfora workers
* [feature] Change pyfora download stream to memoize individual python objects
* [feature] make TrustRegionConjugateGradient solver the default logistic regression solver
* [feature] Ensure that lapack routines can't kill the process when they error out
* [bug #253] `pyfora_aws add` command is broken
* [bug] Fix a bug in VectorAxioms causing bad vector->string data
* [bug] Add some exception handlers to linalgModule.fora so that we catch fortran exceptions
* [bug] Fix a bug in syev.fora causing invalid FORTRAN calls
* Release date Mar-14-2016
* [feature] hook up numpy.all
* [feature] implement __int__(), __float__() for PyBool
* [feature] hook up numpy.isfinite
* [feature] add more support for numpy.isnan
* [feature] adding support for GPU math funtions: sin, cos, exp
* [bug] fixup class-specific overrides to __int__() and __float__()
* Release date Mar-10-2016
* [bug #229]: we assume that base class expressions in classDefs are just names
* [feature #197]: Implement PYFORAPATH environment variable
* [feature]: Supporting 64-bit logarithms on GPU
* [feature]: implementing matrix exponential for diagonalizable matrices
* [feature]: partial support for numpy.norm
* [feature]: add numpy.lstsq
* [bug #187]: pyfora `max`, `min` don't work on lists, tuples, or iterables
* [feature]: improved performance of string indexing and comparison
* [feature]: Migrate docs from gh-pages to sphinx docs.
* Release date Mar-4-2016
* [bug #203]: Avoid socketIO exception we were hitting
* [bug #205]: Ensure that we can pass Futures into with and submit blocks naturally
* [bug #206]: Ensure that we propagate S3 errors correctly in pyfora
* [bug #234]: Ensure we visit paged vectors in correct order
* [bug #221]: Class instances have consistent ordering for their members
* [bug #228]: Ensure mutually recursive objects have stable definitions
* [bug]: Fix some bugs moving large lists from server to client
* [feature]: Implement trust region congugate gradient solver for logistic regression
* [feature]: Bring back if(`split) model for dynamic parallelism
* [feature]: Starting a compiler cache
* [feature]: Preliminary features for GPU computing
* [feature]: Adding 'pyfora_aws deploy' command
* [feature]: Adding vpc, subnet, and security-group args for all pyfora_aws commands
* [enhancement]: Improved compiler performance
* [enhancement]: Improve error messages for accessing nonexistent S3 buckets
* [enhancement]: Improve withBlockExecutor behavior when passed futures containing exceptions
* [enhancement]: Raise the right kind of exception when we try to convert a "with" block
* [enhancement]: Ensure that hashes of Pyfora list objects are stable.
* Release date: Feb-24-2016
* [feature]: Supporting member initialization in base-class __init__ functions
* [feature]: Adding support for numpy.linalg.svd
* [bug #208]: Can't convert bound instance methods from base classes
*Release date: Feb-17-2016
* [feature #78]: Improved error reporting for untranslatable code
* [feature #133]: Initial support for object inheritance
* [enhancement]: New compiler implementation produces much more efficient code
* [enhancement]: Implementation of beta function better matches scipy
* speed up fora compiler
* speed up pyfora data upload time
* fix bug in hyp2f1
* hook up many more scipy/numpy special (math) functions
*Release date: Jan-27-2016
* Make scipy optional
*Release date: Jan-26-2016
* Add support for scipy.special.gamma and scipy.special.hyp2f1
*Release date: Jan-22-2016
* [bug #17]: Can’t call static methods on instances in fora, can in python
* [bug #83]: Possibly Uninitialized Variable Analysis cannot deal with complex data-flow
* [bug #107]: Bad error message when non-bound function gets too many call args
* [feature #124]: Implement `assert`
* [bug #134]: PyInt.fora doesn't have an implementation of __mod__
* [bug #138]: Dictionary comprehensions don't work
* [feature #153]: Read files from local file-system
* [feature #154]: Logistic regression in pyfora
* [feature #155]: Gradient-boosted trees in pyfora
* [feature #159]: Add 'add worker' command to pyfora_aws
* [bug #163]: pyfora_aws has problems if "ufora" security group is already created
* [feature #168]: No feedback in pyfora_aws when things go wrong on an instance
* [bug #170]: Confusing error message when client and server versions don't match
* [feature #172]: Operator Coalescing
* [bug #176]: `isinstance` bug
* [feature #179]: Inline fora in pyfora
*Release date: Dec-10-2015
* [feature] provide pyfora wrapper for scipy.special.beta
* [feature] provide pyfora wrapper for math.log
* [feature] perf improvements for mixin binding calculations.
*Release date: Dec-08-2015
* [bug #165]: Set good default value for EXTERNAL_DATASET_LOADER_SERVICE_THREADS.
* [bug #162]: pyfora_aws docs indicate that ec2 region is optional, but parameter is in fact required.
* [feature]: pyfora_aws should propagate AWS credentials.
* [bug #145]: Cannot access data in S3.
* [bug #144]: pyfora_aws raises exception when --num-instances is 1.
* [bug #140]: ufora-worker launched with pyfora_aws only uses 8GB of memory.
* [bug #136]: Collisions with pandas and numpy on case-insensitive file-systems.
* [bug #127]: Correctly propegating communication errors up to Executor.
* [feature]: Support @property decorator.
* [feature]: Improved download performance of large lists of small objects.
* [bug #122]: Wrong exception type from `list + non_list`.
* [bug #120]: Failure when trying to convert a list of mapped functions.
* [bug #119]: Can't convert bound instance methods.
* [bug #116]: Builtin "reduce" function is not parallelizable when applied over lists, xrange, etc.
* [bug #115]: Fixing __getitem__ for strings and tuples
* [bug #111]: Wrong exception when accessing unbound variables.
* [bug #110]: Incorrect conversion of class functions in user-defined classes.
* [bug #109]: list __getitem__ doesn't throw with step 0
* [feature]: Implement `map` builtin
* [feature]: Support `isinstance` on user-defined classes.
* [feature]: Add versioning scheme to socket.io protocol.
* [feature]: Add support for the python REPL.
* [bug #90]: Improved error message for unbound free variables.
* [bug #89]: Ctrl+C doesn't break out of `with` block.
* [bug #68]: Disallow `return` statements in pyfora `with` blocks.
* [bug #67]: tuple unpacking doesn't work
* [feature]: basic linear regression on data-frames
* [feature]: basic CSV parsing
* [feature]: basic data-frames
* [bug #59]: `sequence(0)` not iterable
* [bug #47]: int/float mismatch in `**` operator
* [bug #21]: certain python variables "survive" longer than fora values
* `def` order is important in non-module function definition (closures). If functions
`g()` and `h()` are defined inside of function `f` and `g()` calls `h()`, then `def h():` must
appear BEFORE `def g():`.
This also implies that mutually-recursive functions are only possible at module or class level.
* Class static methods cannot be used as values. They can be invoked, but it's not possible
to pass a class static method as an argument to another function.
* Named argument calls are not supported. If you have a function `def f(x):...` you can call it as
`f(42)` but you can't use `f(x=42)`.
* Keyword arguments are not supported.
* Class members can only be initialized inside of `__init__`. If `__init__` calls another function
that initializes members, those members will not be seen by pyfora.
* `return` statements not allowed in `__init__()`
* @classmethod decorator is not supported.
* No support for `*args`.
* `assert` is not implemented.
* Bad error message when using `self` inside of `__init__` for things other than setting or getting
members. For example, calling `str(self)` inside of `__init__` results in
"PythonToForaConversionError: An internal error occurred: we didn't provide a definition for the following variables: ['self'].
Most likely, there is a mismatch between our analysis of the python code and the generated FORA code underneath. Please file a bug report."
* No support for object inheritance.
*Release date: Nov-06-2015
* Initial release of pyfora!
* Includes support for core language features and builtin types.
* Some support for builtin functions like all, any, sum, etc.
* pyfora.aws module and pyfora_aws script help setup a Ufora cluster in EC2.
TODO: Brief introduction on what you do with files - including link to relevant help section.