Skip to main content

MARS: a tensor-based unified framework for large-scale data computation.

Project description

https://raw.githubusercontent.com/mars-project/mars/master/docs/source/images/mars-logo-title.png

PyPI version Docs Build Coverage Quality License

Mars is a tensor-based unified framework for large-scale data computation which scales Numpy, Pandas and Scikit-learn.

Documentation, 中文文档

Installation

Mars is easy to install by

pip install pymars

When you need to install dependencies needed by the distributed version, you can use the command below.

pip install 'pymars[distributed]'

For now, distributed version is only available on Linux and Mac OS.

Developer Install

When you want to contribute code to Mars, you can follow the instructions below to install Mars for development:

git clone https://github.com/mars-project/mars.git
cd mars
pip install -e ".[dev]"

More details about installing Mars can be found at getting started section in Mars document.

Mars tensor

Mars tensor provides a familiar interface like Numpy.

Numpy

Mars tensor

import numpy as np
N = 200_000_000
a = np.random.uniform(-1, 1, size=(N, 2))
print((np.linalg.norm(a, axis=1) < 1)
      .sum() * 4 / N)
import mars.tensor as mt
N = 200_000_000
a = mt.random.uniform(-1, 1, size=(N, 2))
print(((mt.linalg.norm(a, axis=1) < 1)
        .sum() * 4 / N).execute())
3.14151712
CPU times: user 12.5 s, sys: 7.16 s,
           total: 19.7 s
Wall time: 21.8 s
3.14161908
CPU times: user 17.5 s, sys: 3.56 s,
           total: 21.1 s
Wall time: 5.59 s

Mars can leverage multiple cores, even on a laptop, and could be even faster for a distributed setting.

Mars DataFrame

Mars DataFrame provides a familiar interface like pandas.

Pandas

Mars DataFrame

import numpy as np
import pandas as pd
df = pd.DataFrame(
    np.random.rand(100000000, 4),
    columns=list('abcd'))
print(df.sum())
import mars.tensor as mt
import mars.dataframe as md
df = md.DataFrame(
    mt.random.rand(100000000, 4),
    columns=list('abcd'))
print(df.sum().execute())
CPU times: user 10.9 s, sys: 2.69 s,
           total: 13.6 s
Wall time: 11 s
CPU times: user 16.5 s, sys: 3.52 s,
           total: 20 s
Wall time: 3.6 s

Mars learn

Mars learn provides a familiar interface like scikit-learn.

Scikit-learn

Mars learn

from sklearn.datasets import make_blobs
from sklearn.decomposition import PCA
X, y = make_blobs(
    n_samples=100000000, n_features=3,
    centers=[[3, 3, 3], [0, 0, 0],
             [1, 1, 1], [2, 2, 2]],
    cluster_std=[0.2, 0.1, 0.2, 0.2],
    random_state=9)
pca = PCA(n_components=3)
pca.fit(X)
print(pca.explained_variance_ratio_)
print(pca.explained_variance_)
from mars.learn.datasets import make_blobs
from mars.learn.decomposition import PCA
X, y = make_blobs(
    n_samples=100000000, n_features=3,
    centers=[[3, 3, 3], [0, 0, 0],
              [1, 1, 1], [2, 2, 2]],
    cluster_std=[0.2, 0.1, 0.2, 0.2],
    random_state=9)
pca = PCA(n_components=3)
pca.fit(X)
print(pca.explained_variance_ratio_)
print(pca.explained_variance_)

Mars remote

Mars remote allows users to execute functions in parallel.

Normal function calls

Mars remote

import numpy as np


def calc_chunk(n, i):
    rs = np.random.RandomState(i)
    a = rs.uniform(-1, 1, size=(n, 2))
    d = np.linalg.norm(a, axis=1)
    return (d < 1).sum()

def calc_pi(fs, N):
    return sum(fs) * 4 / N

N = 200_000_000
n = 10_000_000

fs = [calc_chunk(n, i)
      for i in range(N // n)]
pi = calc_pi(fs, N)
print(pi)
import numpy as np
import mars.remote as mr

def calc_chunk(n, i):
    rs = np.random.RandomState(i)
    a = rs.uniform(-1, 1, size=(n, 2))
    d = np.linalg.norm(a, axis=1)
    return (d < 1).sum()

def calc_pi(fs, N):
    return sum(fs) * 4 / N

N = 200_000_000
n = 10_000_000

fs = [mr.spawn(calc_chunk, args=(n, i))
      for i in range(N // n)]
pi = mr.spawn(calc_pi, args=(fs, N))
print(pi.execute().fetch())
3.1416312
CPU times: user 32.2 s, sys: 4.86 s,
           total: 37.1 s
Wall time: 12.4 s
3.1416312
CPU times: user 16.9 s, sys: 5.46 s,
           total: 22.3 s
Wall time: 4.83 s

Eager Mode

Mars supports eager mode which makes it friendly for developing and easy to debug.

Users can enable the eager mode by options, set options at the beginning of the program or console session.

>>> from mars.config import options
>>> options.eager_mode = True

Or use a context.

>>> from mars.config import option_context
>>> with option_context() as options:
>>>     options.eager_mode = True
>>>     # the eager mode is on only for the with statement
>>>     ...

If eager mode is on, tensor, DataFrame etc will be executed immediately by default session once it is created.

>>> import mars.tensor as mt
>>> import mars.dataframe as md
>>> from mars.config import options
>>> options.eager_mode = True
>>> t = mt.arange(6).reshape((2, 3))
>>> t
array([[0, 1, 2],
       [3, 4, 5]])
>>> df = md.DataFrame(t)
>>> df.sum()
0    3
1    5
2    7
dtype: int64

Easy to scale in and scale out

Mars can scale in to a single machine, and scale out to a cluster with thousands of machines. Both the local and distributed version share the same piece of code, it’s fairly simple to migrate from a single machine to a cluster due to the increase of data.

Running on a single machine including thread-based scheduling, local cluster scheduling which bundles the whole distributed components. Mars is also easy to scale out to a cluster by starting different components of mars distributed runtime on different machines in the cluster.

Threaded

execute method will by default run on the thread-based scheduler on a single machine.

>>> import mars.tensor as mt
>>> a = mt.ones((10, 10))
>>> a.execute()

Users can create a session explicitly.

>>> from mars.session import new_session
>>> session = new_session()
>>> (a * 2).execute(session=session)
>>> # session will be released when out of with statement
>>> with new_session() as session2:
>>>     (a / 3).execute(session=session2)

Local cluster

Users can start the local cluster bundled with the distributed runtime on a single machine. Local cluster mode requires mars distributed version.

>>> from mars.deploy.local import new_cluster

>>> # cluster will create a session and set it as default
>>> cluster = new_cluster()

>>> # run on the local cluster
>>> (a + 1).execute()

>>> # create a session explicitly by specifying the cluster's endpoint
>>> session = new_session(cluster.endpoint)
>>> (a * 3).execute(session=session)

Distributed

After installing the distributed version on every node in the cluster, A node can be selected as scheduler and another as web service, leaving other nodes as workers. The scheduler can be started with the following command:

mars-scheduler -a <scheduler_ip> -p <scheduler_port>

Web service can be started with the following command:

mars-web -a <web_ip> -s <scheduler_endpoint> --ui-port <ui_port_exposed_to_user>

Workers can be started with the following command:

mars-worker -a <worker_ip> -p <worker_port> -s <scheduler_endpoint>

After all mars processes are started, users can run

>>> sess = new_session('http://<web_ip>:<ui_port>')
>>> a = mt.ones((2000, 2000), chunk_size=200)
>>> b = mt.inner(a, a)
>>> b.execute(session=sess)

Getting involved

Thank you in advance for your contributions!

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pymars-0.4.7.tar.gz (1.8 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pymars-0.4.7-cp38-cp38-win_amd64.whl (4.3 MB view details)

Uploaded CPython 3.8Windows x86-64

pymars-0.4.7-cp38-cp38-manylinux1_x86_64.whl (11.0 MB view details)

Uploaded CPython 3.8

pymars-0.4.7-cp38-cp38-macosx_10_15_x86_64.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (4.6 MB view details)

Uploaded CPython 3.8macOS 10.10+ Intel (x86-64, i386)macOS 10.10+ x86-64macOS 10.15+ x86-64macOS 10.9+ Intel (x86-64, i386)macOS 10.9+ x86-64

pymars-0.4.7-cp37-cp37m-win_amd64.whl (4.2 MB view details)

Uploaded CPython 3.7mWindows x86-64

pymars-0.4.7-cp37-cp37m-manylinux1_x86_64.whl (10.6 MB view details)

Uploaded CPython 3.7m

pymars-0.4.7-cp37-cp37m-macosx_10_15_x86_64.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (4.5 MB view details)

Uploaded CPython 3.7mmacOS 10.10+ Intel (x86-64, i386)macOS 10.10+ x86-64macOS 10.15+ x86-64macOS 10.9+ Intel (x86-64, i386)macOS 10.9+ x86-64

pymars-0.4.7-cp36-cp36m-win_amd64.whl (4.2 MB view details)

Uploaded CPython 3.6mWindows x86-64

pymars-0.4.7-cp36-cp36m-manylinux1_x86_64.whl (10.6 MB view details)

Uploaded CPython 3.6m

pymars-0.4.7-cp36-cp36m-macosx_10_15_x86_64.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (4.6 MB view details)

Uploaded CPython 3.6mmacOS 10.10+ Intel (x86-64, i386)macOS 10.10+ x86-64macOS 10.15+ x86-64macOS 10.9+ Intel (x86-64, i386)macOS 10.9+ x86-64

pymars-0.4.7-cp35-cp35m-win_amd64.whl (4.1 MB view details)

Uploaded CPython 3.5mWindows x86-64

pymars-0.4.7-cp35-cp35m-manylinux1_x86_64.whl (10.4 MB view details)

Uploaded CPython 3.5m

pymars-0.4.7-cp35-cp35m-macosx_10_9_x86_64.macosx_10_9_intel.macosx_10_10_intel.macosx_10_10_x86_64.whl (4.5 MB view details)

Uploaded CPython 3.5mmacOS 10.10+ Intel (x86-64, i386)macOS 10.10+ x86-64macOS 10.9+ Intel (x86-64, i386)macOS 10.9+ x86-64

File details

Details for the file pymars-0.4.7.tar.gz.

File metadata

  • Download URL: pymars-0.4.7.tar.gz
  • Upload date:
  • Size: 1.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.6.0.post20200814 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.5

File hashes

Hashes for pymars-0.4.7.tar.gz
Algorithm Hash digest
SHA256 f8aa65116e595c97afa911d2c8489faaa6f4e8845ae9c61b148d358fccdbd827
MD5 21adc886d6999af251cce1022716122a
BLAKE2b-256 14c74413769d726fbd47e1f1a2a14d45e5ae99bcdde45c0b5eb45378ef81c5a9

See more details on using hashes here.

File details

Details for the file pymars-0.4.7-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: pymars-0.4.7-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 4.3 MB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.6.0.post20200814 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.5

File hashes

Hashes for pymars-0.4.7-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 b3cb4e829f8c287f18f1840bfc30546a65b80731cbf2c843ba302071050f1f41
MD5 ae87f36e38d3933f4bd23ca5399ba6f4
BLAKE2b-256 a77d14cac7164aa01685ea077a1b034d2d2d1444c113108aca9de70eecb4c7e5

See more details on using hashes here.

File details

Details for the file pymars-0.4.7-cp38-cp38-manylinux1_x86_64.whl.

File metadata

  • Download URL: pymars-0.4.7-cp38-cp38-manylinux1_x86_64.whl
  • Upload date:
  • Size: 11.0 MB
  • Tags: CPython 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.6.0.post20200814 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.5

File hashes

Hashes for pymars-0.4.7-cp38-cp38-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 c4334fc7e7de838a8da16301bddf0ab9e4c9c2b3b107a3c6bfbfb7be0d1331e0
MD5 e7e2f317fcd0105ea4e3dc16d9b1ba33
BLAKE2b-256 1f61a025dc1d912fc73f58e450bb900ada07256d2a3a281a901c3482c45280c3

See more details on using hashes here.

File details

Details for the file pymars-0.4.7-cp38-cp38-macosx_10_15_x86_64.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl.

File metadata

File hashes

Hashes for pymars-0.4.7-cp38-cp38-macosx_10_15_x86_64.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
Algorithm Hash digest
SHA256 f7b78cf7e2e4b3d09bd70d1509cae9a85d7a99dce63d3b144d9c0cebcc73766b
MD5 8adba55245ba0548c42e0cfe1835be78
BLAKE2b-256 0e807e54c0da027c77c08df700b84670636a39c900f83822f410d1cfef584ef0

See more details on using hashes here.

File details

Details for the file pymars-0.4.7-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: pymars-0.4.7-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 4.2 MB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.6.0.post20200814 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.7

File hashes

Hashes for pymars-0.4.7-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 8d5530aae60311dff693c3d4a7561bc1a75b82ac599662022a52d83237b41c19
MD5 3bfd1d946faa2c05061b7d7d049de412
BLAKE2b-256 11a95ef42d56620f2db80e7e5418eb6417808d7374ba17cc714e7e19925475fa

See more details on using hashes here.

File details

Details for the file pymars-0.4.7-cp37-cp37m-manylinux1_x86_64.whl.

File metadata

  • Download URL: pymars-0.4.7-cp37-cp37m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 10.6 MB
  • Tags: CPython 3.7m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.6.0.post20200814 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.7

File hashes

Hashes for pymars-0.4.7-cp37-cp37m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 d98426dce81a3c2ecfb6a6468812bcab9c1b1bd20752d6aab252fbb49fcda55c
MD5 14315f5465d9d0f3f6c87d69a1533e72
BLAKE2b-256 8fa1a5b8e4d3459b701e19e24c28f98a746b5221a92d02bb66a2d8050596d368

See more details on using hashes here.

File details

Details for the file pymars-0.4.7-cp37-cp37m-macosx_10_15_x86_64.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl.

File metadata

File hashes

Hashes for pymars-0.4.7-cp37-cp37m-macosx_10_15_x86_64.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
Algorithm Hash digest
SHA256 3b265fb44cce630d45b4f30040b924a1948ea1683ba6cbc7a19018e8c140e665
MD5 21e77ff2f01d5e02f62498bb9da3421a
BLAKE2b-256 5a83ae9f0a1a2c8e07bee81a8b2e3945c817b513cf270d6d45a92db2e546ca31

See more details on using hashes here.

File details

Details for the file pymars-0.4.7-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: pymars-0.4.7-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 4.2 MB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.6.0.post20200814 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.6.10

File hashes

Hashes for pymars-0.4.7-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 f2e74ae82bd7c194e704dd51ed89f9703227850691076397f58177db956060f0
MD5 54dbdde0fcd3f8e8ba8dd3e13efb39f2
BLAKE2b-256 a919e868642de8d19671533f9f0020b1656788302f69892d4273a975cb75298e

See more details on using hashes here.

File details

Details for the file pymars-0.4.7-cp36-cp36m-manylinux1_x86_64.whl.

File metadata

  • Download URL: pymars-0.4.7-cp36-cp36m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 10.6 MB
  • Tags: CPython 3.6m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.6.0.post20200814 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.6.10

File hashes

Hashes for pymars-0.4.7-cp36-cp36m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 bc405f946ce02ab9abf850a1d091e67c947314cfab8cf28d9903456e97676641
MD5 d8aff0ba6dbce0895f49603a6f19350a
BLAKE2b-256 35c2a1c7f12fb096ffaac74d01a76ca0b06b75b1d46d144844d1d4149682fca7

See more details on using hashes here.

File details

Details for the file pymars-0.4.7-cp36-cp36m-macosx_10_15_x86_64.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl.

File metadata

File hashes

Hashes for pymars-0.4.7-cp36-cp36m-macosx_10_15_x86_64.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
Algorithm Hash digest
SHA256 80a562d67f5851709cd66735b673fa55075e762c8bc646b48d5f31ac5542e920
MD5 ede0c63e71aded35361deddc26c9085f
BLAKE2b-256 31c3d3fb05c0e1e2b1a589933bd9fc4602ffb4303bb15de834b08b8191b3cb01

See more details on using hashes here.

File details

Details for the file pymars-0.4.7-cp35-cp35m-win_amd64.whl.

File metadata

  • Download URL: pymars-0.4.7-cp35-cp35m-win_amd64.whl
  • Upload date:
  • Size: 4.1 MB
  • Tags: CPython 3.5m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.6.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.5.6

File hashes

Hashes for pymars-0.4.7-cp35-cp35m-win_amd64.whl
Algorithm Hash digest
SHA256 14ee0f2e7be9b4607e472bb354c81e9d52a8903952445050706ee0cd2855c537
MD5 98695d8bb258ea4bfc03fb850a30a998
BLAKE2b-256 59800c9edd7c992da4951c7c236ed9d1d8f0f2d349acf64a344fb821b31a5c27

See more details on using hashes here.

File details

Details for the file pymars-0.4.7-cp35-cp35m-manylinux1_x86_64.whl.

File metadata

  • Download URL: pymars-0.4.7-cp35-cp35m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 10.4 MB
  • Tags: CPython 3.5m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.6.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.5.6

File hashes

Hashes for pymars-0.4.7-cp35-cp35m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 3c3472f389e2f3ed3f98c93dc4a07f536746333affc1ac9845d298964e2ad0b1
MD5 19482df88df3443b4a783c8da61ac927
BLAKE2b-256 7fc4e3d0c031663f2d3ba6ce856ab30bc6367220f93bd11a62831695826df7dc

See more details on using hashes here.

File details

Details for the file pymars-0.4.7-cp35-cp35m-macosx_10_9_x86_64.macosx_10_9_intel.macosx_10_10_intel.macosx_10_10_x86_64.whl.

File metadata

File hashes

Hashes for pymars-0.4.7-cp35-cp35m-macosx_10_9_x86_64.macosx_10_9_intel.macosx_10_10_intel.macosx_10_10_x86_64.whl
Algorithm Hash digest
SHA256 b1c379c4f1e110ad9dc7170552fbab9a78ec4e3d399eb6fa6e903df742fad2f1
MD5 fa648a973f9513bab4392d6ee254c93d
BLAKE2b-256 3016d8a00a8a1ab1abea312881b5f48a41c0ccb41ef7a9c1d9b62f3bcb424eb2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page