Skip to main content

MARS: a tensor-based unified framework for large-scale data computation.

Project description

https://raw.githubusercontent.com/mars-project/mars/master/docs/source/images/mars-logo-title.png

PyPI version Docs Build Coverage Quality License

Mars is a tensor-based unified framework for large-scale data computation which scales Numpy, Pandas and Scikit-learn.

Documentation, 中文文档

Installation

Mars is easy to install by

pip install pymars

When you need to install dependencies needed by the distributed version, you can use the command below.

pip install 'pymars[distributed]'

For now, distributed version is only available on Linux and Mac OS.

Developer Install

When you want to contribute code to Mars, you can follow the instructions below to install Mars for development:

git clone https://github.com/mars-project/mars.git
cd mars
pip install -e ".[dev]"

More details about installing Mars can be found at getting started section in Mars document.

Mars tensor

Mars tensor provides a familiar interface like Numpy.

Numpy

Mars tensor

import numpy as np
N = 200_000_000
a = np.random.uniform(-1, 1, size=(N, 2))
print((np.linalg.norm(a, axis=1) < 1)
      .sum() * 4 / N)
import mars.tensor as mt
N = 200_000_000
a = mt.random.uniform(-1, 1, size=(N, 2))
print(((mt.linalg.norm(a, axis=1) < 1)
        .sum() * 4 / N).execute())
3.14151712
CPU times: user 12.5 s, sys: 7.16 s,
           total: 19.7 s
Wall time: 21.8 s
3.14161908
CPU times: user 17.5 s, sys: 3.56 s,
           total: 21.1 s
Wall time: 5.59 s

Mars can leverage multiple cores, even on a laptop, and could be even faster for a distributed setting.

Mars DataFrame

Mars DataFrame provides a familiar interface like pandas.

Pandas

Mars DataFrame

import numpy as np
import pandas as pd
df = pd.DataFrame(
    np.random.rand(100000000, 4),
    columns=list('abcd'))
print(df.sum())
import mars.tensor as mt
import mars.dataframe as md
df = md.DataFrame(
    mt.random.rand(100000000, 4),
    columns=list('abcd'))
print(df.sum().execute())
CPU times: user 10.9 s, sys: 2.69 s,
           total: 13.6 s
Wall time: 11 s
CPU times: user 16.5 s, sys: 3.52 s,
           total: 20 s
Wall time: 3.6 s

Mars learn

Mars learn provides a familiar interface like scikit-learn.

Scikit-learn

Mars learn

from sklearn.datasets import make_blobs
from sklearn.decomposition import PCA
X, y = make_blobs(
    n_samples=100000000, n_features=3,
    centers=[[3, 3, 3], [0, 0, 0],
             [1, 1, 1], [2, 2, 2]],
    cluster_std=[0.2, 0.1, 0.2, 0.2],
    random_state=9)
pca = PCA(n_components=3)
pca.fit(X)
print(pca.explained_variance_ratio_)
print(pca.explained_variance_)
from mars.learn.datasets import make_blobs
from mars.learn.decomposition import PCA
X, y = make_blobs(
    n_samples=100000000, n_features=3,
    centers=[[3, 3, 3], [0, 0, 0],
              [1, 1, 1], [2, 2, 2]],
    cluster_std=[0.2, 0.1, 0.2, 0.2],
    random_state=9)
pca = PCA(n_components=3)
pca.fit(X)
print(pca.explained_variance_ratio_)
print(pca.explained_variance_)

Mars remote

Mars remote allows users to execute functions in parallel.

Normal function calls

Mars remote

import numpy as np


def calc_chunk(n, i):
    rs = np.random.RandomState(i)
    a = rs.uniform(-1, 1, size=(n, 2))
    d = np.linalg.norm(a, axis=1)
    return (d < 1).sum()

def calc_pi(fs, N):
    return sum(fs) * 4 / N

N = 200_000_000
n = 10_000_000

fs = [calc_chunk(n, i)
      for i in range(N // n)]
pi = calc_pi(fs, N)
print(pi)
import numpy as np
import mars.remote as mr

def calc_chunk(n, i):
    rs = np.random.RandomState(i)
    a = rs.uniform(-1, 1, size=(n, 2))
    d = np.linalg.norm(a, axis=1)
    return (d < 1).sum()

def calc_pi(fs, N):
    return sum(fs) * 4 / N

N = 200_000_000
n = 10_000_000

fs = [mr.spawn(calc_chunk, args=(n, i))
      for i in range(N // n)]
pi = mr.spawn(calc_pi, args=(fs, N))
print(pi.execute().fetch())
3.1416312
CPU times: user 32.2 s, sys: 4.86 s,
           total: 37.1 s
Wall time: 12.4 s
3.1416312
CPU times: user 16.9 s, sys: 5.46 s,
           total: 22.3 s
Wall time: 4.83 s

Eager Mode

Mars supports eager mode which makes it friendly for developing and easy to debug.

Users can enable the eager mode by options, set options at the beginning of the program or console session.

>>> from mars.config import options
>>> options.eager_mode = True

Or use a context.

>>> from mars.config import option_context
>>> with option_context() as options:
>>>     options.eager_mode = True
>>>     # the eager mode is on only for the with statement
>>>     ...

If eager mode is on, tensor, DataFrame etc will be executed immediately by default session once it is created.

>>> import mars.tensor as mt
>>> import mars.dataframe as md
>>> from mars.config import options
>>> options.eager_mode = True
>>> t = mt.arange(6).reshape((2, 3))
>>> t
array([[0, 1, 2],
       [3, 4, 5]])
>>> df = md.DataFrame(t)
>>> df.sum()
0    3
1    5
2    7
dtype: int64

Easy to scale in and scale out

Mars can scale in to a single machine, and scale out to a cluster with thousands of machines. Both the local and distributed version share the same piece of code, it’s fairly simple to migrate from a single machine to a cluster due to the increase of data.

Running on a single machine including thread-based scheduling, local cluster scheduling which bundles the whole distributed components. Mars is also easy to scale out to a cluster by starting different components of mars distributed runtime on different machines in the cluster.

Threaded

execute method will by default run on the thread-based scheduler on a single machine.

>>> import mars.tensor as mt
>>> a = mt.ones((10, 10))
>>> a.execute()

Users can create a session explicitly.

>>> from mars.session import new_session
>>> session = new_session()
>>> (a * 2).execute(session=session)
>>> # session will be released when out of with statement
>>> with new_session() as session2:
>>>     (a / 3).execute(session=session2)

Local cluster

Users can start the local cluster bundled with the distributed runtime on a single machine. Local cluster mode requires mars distributed version.

>>> from mars.deploy.local import new_cluster

>>> # cluster will create a session and set it as default
>>> cluster = new_cluster()

>>> # run on the local cluster
>>> (a + 1).execute()

>>> # create a session explicitly by specifying the cluster's endpoint
>>> session = new_session(cluster.endpoint)
>>> (a * 3).execute(session=session)

Distributed

After installing the distributed version on every node in the cluster, A node can be selected as scheduler and another as web service, leaving other nodes as workers. The scheduler can be started with the following command:

mars-scheduler -a <scheduler_ip> -p <scheduler_port>

Web service can be started with the following command:

mars-web -a <web_ip> -s <scheduler_endpoint> --ui-port <ui_port_exposed_to_user>

Workers can be started with the following command:

mars-worker -a <worker_ip> -p <worker_port> -s <scheduler_endpoint>

After all mars processes are started, users can run

>>> sess = new_session('http://<web_ip>:<ui_port>')
>>> a = mt.ones((2000, 2000), chunk_size=200)
>>> b = mt.inner(a, a)
>>> b.execute(session=sess)

Getting involved

Thank you in advance for your contributions!

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pymars-0.4.4.tar.gz (1.8 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pymars-0.4.4-cp38-cp38-win_amd64.whl (4.3 MB view details)

Uploaded CPython 3.8Windows x86-64

pymars-0.4.4-cp38-cp38-manylinux1_x86_64.whl (10.9 MB view details)

Uploaded CPython 3.8

pymars-0.4.4-cp38-cp38-macosx_10_15_x86_64.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (4.5 MB view details)

Uploaded CPython 3.8macOS 10.10+ Intel (x86-64, i386)macOS 10.10+ x86-64macOS 10.15+ x86-64macOS 10.9+ Intel (x86-64, i386)macOS 10.9+ x86-64

pymars-0.4.4-cp37-cp37m-win_amd64.whl (4.2 MB view details)

Uploaded CPython 3.7mWindows x86-64

pymars-0.4.4-cp37-cp37m-manylinux1_x86_64.whl (10.5 MB view details)

Uploaded CPython 3.7m

pymars-0.4.4-cp37-cp37m-macosx_10_15_x86_64.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (4.5 MB view details)

Uploaded CPython 3.7mmacOS 10.10+ Intel (x86-64, i386)macOS 10.10+ x86-64macOS 10.15+ x86-64macOS 10.9+ Intel (x86-64, i386)macOS 10.9+ x86-64

pymars-0.4.4-cp36-cp36m-win_amd64.whl (4.2 MB view details)

Uploaded CPython 3.6mWindows x86-64

pymars-0.4.4-cp36-cp36m-manylinux1_x86_64.whl (10.6 MB view details)

Uploaded CPython 3.6m

pymars-0.4.4-cp36-cp36m-macosx_10_15_x86_64.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (4.6 MB view details)

Uploaded CPython 3.6mmacOS 10.10+ Intel (x86-64, i386)macOS 10.10+ x86-64macOS 10.15+ x86-64macOS 10.9+ Intel (x86-64, i386)macOS 10.9+ x86-64

pymars-0.4.4-cp35-cp35m-win_amd64.whl (4.0 MB view details)

Uploaded CPython 3.5mWindows x86-64

pymars-0.4.4-cp35-cp35m-manylinux1_x86_64.whl (10.3 MB view details)

Uploaded CPython 3.5m

pymars-0.4.4-cp35-cp35m-macosx_10_6_x86_64.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (4.4 MB view details)

Uploaded CPython 3.5mmacOS 10.10+ Intel (x86-64, i386)macOS 10.10+ x86-64macOS 10.6+ x86-64macOS 10.9+ Intel (x86-64, i386)macOS 10.9+ x86-64

File details

Details for the file pymars-0.4.4.tar.gz.

File metadata

  • Download URL: pymars-0.4.4.tar.gz
  • Upload date:
  • Size: 1.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.2.0.post20200714 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.8.3

File hashes

Hashes for pymars-0.4.4.tar.gz
Algorithm Hash digest
SHA256 4c1723c94a35eb26eb70bfd1ee0e10c1888db7ea65ae0cef8a7de1d34e73f339
MD5 500243b848be113ae91be7c04760db55
BLAKE2b-256 efa17b571a6bbd52d628b9acd30d98f75dac736e432e9e4acfebcbf4d7a1d6a2

See more details on using hashes here.

File details

Details for the file pymars-0.4.4-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: pymars-0.4.4-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 4.3 MB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.2.0.post20200714 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.8.3

File hashes

Hashes for pymars-0.4.4-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 e060b9206fe34ae60309f347365d64a3bee9cde1f9df856e9bb156263883e72e
MD5 8de445b6a4b73d5323e6034fe9148e91
BLAKE2b-256 c85e35ad8e9bb900503444552f2569c0031ffe7d9161e2607b13c777602bd507

See more details on using hashes here.

File details

Details for the file pymars-0.4.4-cp38-cp38-manylinux1_x86_64.whl.

File metadata

  • Download URL: pymars-0.4.4-cp38-cp38-manylinux1_x86_64.whl
  • Upload date:
  • Size: 10.9 MB
  • Tags: CPython 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.2.0.post20200714 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.8.3

File hashes

Hashes for pymars-0.4.4-cp38-cp38-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 7a3876a84bf98e8aada88f12f4ec2b28f51161c4299a1fa63b6e7059d73d81a3
MD5 f28bfed8e192300a8fceded79b3cc3bd
BLAKE2b-256 4a7f7484d5195df1c688243a25315f915fadecc0f55dc0b7124c2e89f7a85f17

See more details on using hashes here.

File details

Details for the file pymars-0.4.4-cp38-cp38-macosx_10_15_x86_64.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl.

File metadata

File hashes

Hashes for pymars-0.4.4-cp38-cp38-macosx_10_15_x86_64.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
Algorithm Hash digest
SHA256 6aa7e15254bd43fac87722e3d68dc659d94b56052aa2b3740fa546a3b6ede970
MD5 34c6ecf6d7e3560171dff16cd98572e5
BLAKE2b-256 853876d96f5ffc50f519de672704999882bd0b40940cba7c13e3bba2d6415dac

See more details on using hashes here.

File details

Details for the file pymars-0.4.4-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: pymars-0.4.4-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 4.2 MB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.2.0.post20200714 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.7.7

File hashes

Hashes for pymars-0.4.4-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 c736e919134a22c152c7eab62dbd80e80566c159960dd2be3a78daabcd759d77
MD5 18511ead6225f7bc5cc94898eb9f48dc
BLAKE2b-256 2135d0142ed7755a9502827f6dcb0677ca99fa8013463ed61cbc713b69a7c928

See more details on using hashes here.

File details

Details for the file pymars-0.4.4-cp37-cp37m-manylinux1_x86_64.whl.

File metadata

  • Download URL: pymars-0.4.4-cp37-cp37m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 10.5 MB
  • Tags: CPython 3.7m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.2.0.post20200714 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.7.7

File hashes

Hashes for pymars-0.4.4-cp37-cp37m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 51a6c8480d600a62ac897fbe0d7f9030c88237a7403e1b5fcd446ee599cc29ca
MD5 c3e5ea5fddd2bbd14c76f240b51c7c50
BLAKE2b-256 a562b18cda75fc6c9077727eafd4a51771544e01fc32bd642e488966b836e966

See more details on using hashes here.

File details

Details for the file pymars-0.4.4-cp37-cp37m-macosx_10_15_x86_64.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl.

File metadata

File hashes

Hashes for pymars-0.4.4-cp37-cp37m-macosx_10_15_x86_64.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
Algorithm Hash digest
SHA256 88f11fb0c0e96de73d3853077bf6519d2dc75f37fc4fe3aa4aa06b91491c21c2
MD5 d585c758ec82251a4a2ca6dd791bd53a
BLAKE2b-256 a90d8637e8e48b97221a376f0d0023699812634ae7d62de77d0a4b00763269a0

See more details on using hashes here.

File details

Details for the file pymars-0.4.4-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: pymars-0.4.4-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 4.2 MB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.2.0.post20200714 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.6.10

File hashes

Hashes for pymars-0.4.4-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 66b0a2f03cc095edaa252bd92747720c25c099ceb997c0e61a4cf82a9b41c8a1
MD5 8d7091a410dbcf4c231d5a3eb58c12d9
BLAKE2b-256 fd55c92e1dbd7995f01a51b1dac8090cd72e9e03131f9324b7b3ff80f1d46b25

See more details on using hashes here.

File details

Details for the file pymars-0.4.4-cp36-cp36m-manylinux1_x86_64.whl.

File metadata

  • Download URL: pymars-0.4.4-cp36-cp36m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 10.6 MB
  • Tags: CPython 3.6m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.2.0.post20200714 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.6.10

File hashes

Hashes for pymars-0.4.4-cp36-cp36m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 7a4e1734620dd6a63616352b7c3d5888f5f6dd78fd6be5772bb5450671578bb3
MD5 956fefd67dd34a297640b2e22e4ada22
BLAKE2b-256 aba72b2d2b17a47f627a3d55520b2257fe4f404966f61ec296fbe2b1b316dff1

See more details on using hashes here.

File details

Details for the file pymars-0.4.4-cp36-cp36m-macosx_10_15_x86_64.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl.

File metadata

File hashes

Hashes for pymars-0.4.4-cp36-cp36m-macosx_10_15_x86_64.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
Algorithm Hash digest
SHA256 64f218ae473ddb767e4107ba3ee627fa0c22c68a1645769b0c3d47872d2a37bf
MD5 98fd0dea4c1c23f9ad41dc817e8b3a5d
BLAKE2b-256 3e46439b4724ff49488f7177930f41f6184b0007f97ae65aff0ab06e16dea58b

See more details on using hashes here.

File details

Details for the file pymars-0.4.4-cp35-cp35m-win_amd64.whl.

File metadata

  • Download URL: pymars-0.4.4-cp35-cp35m-win_amd64.whl
  • Upload date:
  • Size: 4.0 MB
  • Tags: CPython 3.5m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.2.0 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.5.6

File hashes

Hashes for pymars-0.4.4-cp35-cp35m-win_amd64.whl
Algorithm Hash digest
SHA256 fe0b5ac4c1acdb7c5aee65872f5b4d18cc2e3f5f56e507e2ef6dbc5d6021de85
MD5 4c21664265c953dc4a314d00643f21ad
BLAKE2b-256 5c832bf4785754ee2cb38756d568ef0e67c52ddd3a604436d498b1f6b83ccac0

See more details on using hashes here.

File details

Details for the file pymars-0.4.4-cp35-cp35m-manylinux1_x86_64.whl.

File metadata

  • Download URL: pymars-0.4.4-cp35-cp35m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 10.3 MB
  • Tags: CPython 3.5m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.2.0 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.5.6

File hashes

Hashes for pymars-0.4.4-cp35-cp35m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 f956d1da988bbd7cc0b6e20ec69be6b94c032ab87dea985d32aad0e5a529fc79
MD5 08d10a1770d1fbee91580ed8c73b33f8
BLAKE2b-256 6786c03e1a586bb287d6c67cfbb44893083b0bc545e507eac064e0f74608b446

See more details on using hashes here.

File details

Details for the file pymars-0.4.4-cp35-cp35m-macosx_10_6_x86_64.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl.

File metadata

File hashes

Hashes for pymars-0.4.4-cp35-cp35m-macosx_10_6_x86_64.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
Algorithm Hash digest
SHA256 1aa300453831ebdbe5a942783ca4514117589f6f7a06d94af547f01f7d7828bd
MD5 fd7157ce099f12bdc0b18d2df77006a5
BLAKE2b-256 3f5dca77e56a4f8690738d51ea4effcdd951bbebe15d9dc2277ce6d305ccc2fd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page