Skip to main content

MARS: a tensor-based unified framework for large-scale data computation.

Project description

https://raw.githubusercontent.com/mars-project/mars/master/docs/source/images/mars-logo-title.png

PyPI version Docs Build Coverage Quality License

Mars is a tensor-based unified framework for large-scale data computation which scales Numpy, Pandas and Scikit-learn.

Documentation, 中文文档

Installation

Mars is easy to install by

pip install pymars

When you need to install dependencies needed by the distributed version, you can use the command below.

pip install 'pymars[distributed]'

For now, distributed version is only available on Linux and Mac OS.

Developer Install

When you want to contribute code to Mars, you can follow the instructions below to install Mars for development:

git clone https://github.com/mars-project/mars.git
cd mars
pip install -e ".[dev]"

More details about installing Mars can be found at getting started section in Mars document.

Mars tensor

Mars tensor provides a familiar interface like Numpy.

Numpy

Mars tensor

import numpy as np
N = 200_000_000
a = np.random.uniform(-1, 1, size=(N, 2))
print((np.linalg.norm(a, axis=1) < 1)
      .sum() * 4 / N)
import mars.tensor as mt
N = 200_000_000
a = mt.random.uniform(-1, 1, size=(N, 2))
print(((mt.linalg.norm(a, axis=1) < 1)
        .sum() * 4 / N).execute())
3.14151712
CPU times: user 12.5 s, sys: 7.16 s,
           total: 19.7 s
Wall time: 21.8 s
3.14161908
CPU times: user 17.5 s, sys: 3.56 s,
           total: 21.1 s
Wall time: 5.59 s

Mars can leverage multiple cores, even on a laptop, and could be even faster for a distributed setting.

Mars DataFrame

Mars DataFrame provides a familiar interface like pandas.

Pandas

Mars DataFrame

import numpy as np
import pandas as pd
df = pd.DataFrame(
    np.random.rand(100000000, 4),
    columns=list('abcd'))
print(df.sum())
import mars.tensor as mt
import mars.dataframe as md
df = md.DataFrame(
    mt.random.rand(100000000, 4),
    columns=list('abcd'))
print(df.sum().execute())
CPU times: user 10.9 s, sys: 2.69 s,
           total: 13.6 s
Wall time: 11 s
CPU times: user 16.5 s, sys: 3.52 s,
           total: 20 s
Wall time: 3.6 s

Mars learn

Mars learn provides a familiar interface like scikit-learn.

Scikit-learn

Mars learn

from sklearn.datasets import make_blobs
from sklearn.decomposition import PCA
X, y = make_blobs(
    n_samples=100000000, n_features=3,
    centers=[[3, 3, 3], [0, 0, 0],
             [1, 1, 1], [2, 2, 2]],
    cluster_std=[0.2, 0.1, 0.2, 0.2],
    random_state=9)
pca = PCA(n_components=3)
pca.fit(X)
print(pca.explained_variance_ratio_)
print(pca.explained_variance_)
from mars.learn.datasets import make_blobs
from mars.learn.decomposition import PCA
X, y = make_blobs(
    n_samples=100000000, n_features=3,
    centers=[[3, 3, 3], [0, 0, 0],
              [1, 1, 1], [2, 2, 2]],
    cluster_std=[0.2, 0.1, 0.2, 0.2],
    random_state=9)
pca = PCA(n_components=3)
pca.fit(X)
print(pca.explained_variance_ratio_)
print(pca.explained_variance_)

Mars remote

Mars remote allows users to execute functions in parallel.

Normal function calls

Mars remote

import numpy as np


def calc_chunk(n, i):
    rs = np.random.RandomState(i)
    a = rs.uniform(-1, 1, size=(n, 2))
    d = np.linalg.norm(a, axis=1)
    return (d < 1).sum()

def calc_pi(fs, N):
    return sum(fs) * 4 / N

N = 200_000_000
n = 10_000_000

fs = [calc_chunk(n, i)
      for i in range(N // n)]
pi = calc_pi(fs, N)
print(pi)
import numpy as np
import mars.remote as mr

def calc_chunk(n, i):
    rs = np.random.RandomState(i)
    a = rs.uniform(-1, 1, size=(n, 2))
    d = np.linalg.norm(a, axis=1)
    return (d < 1).sum()

def calc_pi(fs, N):
    return sum(fs) * 4 / N

N = 200_000_000
n = 10_000_000

fs = [mr.spawn(calc_chunk, args=(n, i))
      for i in range(N // n)]
pi = mr.spawn(calc_pi, args=(fs, N))
print(pi.execute().fetch())
3.1416312
CPU times: user 32.2 s, sys: 4.86 s,
           total: 37.1 s
Wall time: 12.4 s
3.1416312
CPU times: user 16.9 s, sys: 5.46 s,
           total: 22.3 s
Wall time: 4.83 s

Eager Mode

Mars supports eager mode which makes it friendly for developing and easy to debug.

Users can enable the eager mode by options, set options at the beginning of the program or console session.

>>> from mars.config import options
>>> options.eager_mode = True

Or use a context.

>>> from mars.config import option_context
>>> with option_context() as options:
>>>     options.eager_mode = True
>>>     # the eager mode is on only for the with statement
>>>     ...

If eager mode is on, tensor, DataFrame etc will be executed immediately by default session once it is created.

>>> import mars.tensor as mt
>>> import mars.dataframe as md
>>> from mars.config import options
>>> options.eager_mode = True
>>> t = mt.arange(6).reshape((2, 3))
>>> t
array([[0, 1, 2],
       [3, 4, 5]])
>>> df = md.DataFrame(t)
>>> df.sum()
0    3
1    5
2    7
dtype: int64

Easy to scale in and scale out

Mars can scale in to a single machine, and scale out to a cluster with thousands of machines. Both the local and distributed version share the same piece of code, it’s fairly simple to migrate from a single machine to a cluster due to the increase of data.

Running on a single machine including thread-based scheduling, local cluster scheduling which bundles the whole distributed components. Mars is also easy to scale out to a cluster by starting different components of mars distributed runtime on different machines in the cluster.

Threaded

execute method will by default run on the thread-based scheduler on a single machine.

>>> import mars.tensor as mt
>>> a = mt.ones((10, 10))
>>> a.execute()

Users can create a session explicitly.

>>> from mars.session import new_session
>>> session = new_session()
>>> (a * 2).execute(session=session)
>>> # session will be released when out of with statement
>>> with new_session() as session2:
>>>     (a / 3).execute(session=session2)

Local cluster

Users can start the local cluster bundled with the distributed runtime on a single machine. Local cluster mode requires mars distributed version.

>>> from mars.deploy.local import new_cluster

>>> # cluster will create a session and set it as default
>>> cluster = new_cluster()

>>> # run on the local cluster
>>> (a + 1).execute()

>>> # create a session explicitly by specifying the cluster's endpoint
>>> session = new_session(cluster.endpoint)
>>> (a * 3).execute(session=session)

Distributed

After installing the distributed version on every node in the cluster, A node can be selected as scheduler and another as web service, leaving other nodes as workers. The scheduler can be started with the following command:

mars-scheduler -a <scheduler_ip> -p <scheduler_port>

Web service can be started with the following command:

mars-web -a <web_ip> -s <scheduler_endpoint> --ui-port <ui_port_exposed_to_user>

Workers can be started with the following command:

mars-worker -a <worker_ip> -p <worker_port> -s <scheduler_endpoint>

After all mars processes are started, users can run

>>> sess = new_session('http://<web_ip>:<ui_port>')
>>> a = mt.ones((2000, 2000), chunk_size=200)
>>> b = mt.inner(a, a)
>>> b.execute(session=sess)

Getting involved

Thank you in advance for your contributions!

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pymars-0.4.3.tar.gz (1.8 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pymars-0.4.3-cp38-cp38-win_amd64.whl (3.8 MB view details)

Uploaded CPython 3.8Windows x86-64

pymars-0.4.3-cp38-cp38-manylinux1_x86_64.whl (9.0 MB view details)

Uploaded CPython 3.8

pymars-0.4.3-cp38-cp38-macosx_10_15_x86_64.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (4.0 MB view details)

Uploaded CPython 3.8macOS 10.10+ Intel (x86-64, i386)macOS 10.10+ x86-64macOS 10.15+ x86-64macOS 10.9+ Intel (x86-64, i386)macOS 10.9+ x86-64

pymars-0.4.3-cp37-cp37m-win_amd64.whl (3.7 MB view details)

Uploaded CPython 3.7mWindows x86-64

pymars-0.4.3-cp37-cp37m-manylinux1_x86_64.whl (8.6 MB view details)

Uploaded CPython 3.7m

pymars-0.4.3-cp37-cp37m-macosx_10_15_x86_64.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (4.0 MB view details)

Uploaded CPython 3.7mmacOS 10.10+ Intel (x86-64, i386)macOS 10.10+ x86-64macOS 10.15+ x86-64macOS 10.9+ Intel (x86-64, i386)macOS 10.9+ x86-64

pymars-0.4.3-cp36-cp36m-win_amd64.whl (3.7 MB view details)

Uploaded CPython 3.6mWindows x86-64

pymars-0.4.3-cp36-cp36m-manylinux1_x86_64.whl (8.6 MB view details)

Uploaded CPython 3.6m

pymars-0.4.3-cp36-cp36m-macosx_10_15_x86_64.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (4.1 MB view details)

Uploaded CPython 3.6mmacOS 10.10+ Intel (x86-64, i386)macOS 10.10+ x86-64macOS 10.15+ x86-64macOS 10.9+ Intel (x86-64, i386)macOS 10.9+ x86-64

pymars-0.4.3-cp35-cp35m-win_amd64.whl (3.6 MB view details)

Uploaded CPython 3.5mWindows x86-64

pymars-0.4.3-cp35-cp35m-manylinux1_x86_64.whl (8.4 MB view details)

Uploaded CPython 3.5m

pymars-0.4.3-cp35-cp35m-macosx_10_6_x86_64.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (3.9 MB view details)

Uploaded CPython 3.5mmacOS 10.10+ Intel (x86-64, i386)macOS 10.10+ x86-64macOS 10.6+ x86-64macOS 10.9+ Intel (x86-64, i386)macOS 10.9+ x86-64

File details

Details for the file pymars-0.4.3.tar.gz.

File metadata

  • Download URL: pymars-0.4.3.tar.gz
  • Upload date:
  • Size: 1.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.1.3 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.8.3

File hashes

Hashes for pymars-0.4.3.tar.gz
Algorithm Hash digest
SHA256 630a794c8b4b5504752f8e07255327a35b768d28f1d2b00522bfafb40aa8327d
MD5 d0152efa90a785553e8113c4eb192124
BLAKE2b-256 4233cd8b1415548c56af10d7c2860f1925cf95ff51b5683af5eb0ca8818b62e4

See more details on using hashes here.

File details

Details for the file pymars-0.4.3-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: pymars-0.4.3-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 3.8 MB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.1.3 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.8.3

File hashes

Hashes for pymars-0.4.3-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 88d919778638880c36ac1e755fe8c4632a738822290d0eebd7ba6f2b92abae8a
MD5 f8cdf7dd12ef6d9b754d43eaefd2b6ac
BLAKE2b-256 e3d76696deacde12c05678ec20c121f61a1f654dc2a3e72c2e0e5cee4e8ccaee

See more details on using hashes here.

File details

Details for the file pymars-0.4.3-cp38-cp38-manylinux1_x86_64.whl.

File metadata

  • Download URL: pymars-0.4.3-cp38-cp38-manylinux1_x86_64.whl
  • Upload date:
  • Size: 9.0 MB
  • Tags: CPython 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.1.3 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.8.3

File hashes

Hashes for pymars-0.4.3-cp38-cp38-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 ba6d4e36051aaaf46f85887b9bb9e00fdd452796d5b9782f2dcfea445b0a840e
MD5 d16ffef8cfc515dda71716e3a772f287
BLAKE2b-256 834c6b8d12c148021fb1f2397df2f9316c0e67b7280fe55d4f7c6cee4d9f822a

See more details on using hashes here.

File details

Details for the file pymars-0.4.3-cp38-cp38-macosx_10_15_x86_64.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl.

File metadata

File hashes

Hashes for pymars-0.4.3-cp38-cp38-macosx_10_15_x86_64.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
Algorithm Hash digest
SHA256 481cd06ff9b05ebc3cc46ca25b948bd4e9d52b6cb1527b26129ba84243c15fe7
MD5 41c4e80b481fa7788d26b2e29d45b696
BLAKE2b-256 fea971105646045c7ef5dbba8a7b358eb44d72ce859a396b83261e5e11897c75

See more details on using hashes here.

File details

Details for the file pymars-0.4.3-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: pymars-0.4.3-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 3.7 MB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.1.3 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.7.7

File hashes

Hashes for pymars-0.4.3-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 33dac0a255fafe1aa03f012cff1ad75134bc680167b6672a359d4e89e8aa6888
MD5 1674d6ab51a0e9a4a765cf920860b73f
BLAKE2b-256 be4f4ad90bb2649d13ea7b0996d63c004a5dd4b5c4902581921c1cc1439b8d5d

See more details on using hashes here.

File details

Details for the file pymars-0.4.3-cp37-cp37m-manylinux1_x86_64.whl.

File metadata

  • Download URL: pymars-0.4.3-cp37-cp37m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 8.6 MB
  • Tags: CPython 3.7m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.1.3 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.7.7

File hashes

Hashes for pymars-0.4.3-cp37-cp37m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 7170874cde039cb9f07aedc71a1953304d9d190db8f58fdeecb3cd71aa55bd77
MD5 3c26e5fabcc2aafbc38d00c3abf0ea89
BLAKE2b-256 0e78b083d5df20fc0f298a6c161c8de63ff3b538d76f429beb8c9904a338bfc6

See more details on using hashes here.

File details

Details for the file pymars-0.4.3-cp37-cp37m-macosx_10_15_x86_64.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl.

File metadata

File hashes

Hashes for pymars-0.4.3-cp37-cp37m-macosx_10_15_x86_64.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
Algorithm Hash digest
SHA256 36ec8b625acea72d0f7f7cf28189eac669f1cc1c77ce17b55d390712c960dd1e
MD5 ef3b3c86347572a9959fd4c8d298f754
BLAKE2b-256 adb41a99e1da4077f730bb3f0ba785faf1937510408ed646a186301025936ccc

See more details on using hashes here.

File details

Details for the file pymars-0.4.3-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: pymars-0.4.3-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 3.7 MB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.1.3 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.6.10

File hashes

Hashes for pymars-0.4.3-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 18a9881f7222894e1d746c694afd575b6340f314fb8b40a4f90db8c756977ad9
MD5 57e4ccf138563cc194ea7f6fe6b45887
BLAKE2b-256 ae1b794c8e7611dd22f481d869b994da106d2dccfaaee935d5e3f151e1e84a61

See more details on using hashes here.

File details

Details for the file pymars-0.4.3-cp36-cp36m-manylinux1_x86_64.whl.

File metadata

  • Download URL: pymars-0.4.3-cp36-cp36m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 8.6 MB
  • Tags: CPython 3.6m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.1.3 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.6.10

File hashes

Hashes for pymars-0.4.3-cp36-cp36m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 1406a17640e82f16e1432cf0ebf6b966687124acadab3431fe5b27006a01f142
MD5 e838f8f0082696cee140f45a05b41469
BLAKE2b-256 e2be2d3b05c5f5617b2598c1daa65a917904ef54e1147387af9901b68e9c4a91

See more details on using hashes here.

File details

Details for the file pymars-0.4.3-cp36-cp36m-macosx_10_15_x86_64.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl.

File metadata

File hashes

Hashes for pymars-0.4.3-cp36-cp36m-macosx_10_15_x86_64.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
Algorithm Hash digest
SHA256 37bf61bd4d873b543bf377bd22c1068269845c2fa0efc6b650802577cce15e45
MD5 ee1d89908793bf4e42ed935b0f1edc97
BLAKE2b-256 f8d751012a75d9dd045f94e80f300e586b4be25dc84b24ca07142708c46a90dc

See more details on using hashes here.

File details

Details for the file pymars-0.4.3-cp35-cp35m-win_amd64.whl.

File metadata

  • Download URL: pymars-0.4.3-cp35-cp35m-win_amd64.whl
  • Upload date:
  • Size: 3.6 MB
  • Tags: CPython 3.5m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.1.3 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.5.6

File hashes

Hashes for pymars-0.4.3-cp35-cp35m-win_amd64.whl
Algorithm Hash digest
SHA256 a48f51139cff357ce6d47b4770577d7ce4cd772e0270e8c8698c3c5a76f7f4c2
MD5 5d8a9370266040c06966563147369d1d
BLAKE2b-256 7498f2c9412ea95d1201b7ffb18e38dc0b193be36cb5adace8363d33cf6bca6d

See more details on using hashes here.

File details

Details for the file pymars-0.4.3-cp35-cp35m-manylinux1_x86_64.whl.

File metadata

  • Download URL: pymars-0.4.3-cp35-cp35m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 8.4 MB
  • Tags: CPython 3.5m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.1.3 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.5.6

File hashes

Hashes for pymars-0.4.3-cp35-cp35m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 7c86a101ac7f3650f410504410db1b3eff86a808b1e2acea8948c0dcca6ba137
MD5 3150d48efdd0bfe75180dab4bbb211c2
BLAKE2b-256 4c264051fb182103830208e64cc5556abdf5c109bdc517ec46d7e4fc892872c0

See more details on using hashes here.

File details

Details for the file pymars-0.4.3-cp35-cp35m-macosx_10_6_x86_64.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl.

File metadata

File hashes

Hashes for pymars-0.4.3-cp35-cp35m-macosx_10_6_x86_64.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
Algorithm Hash digest
SHA256 02578ad51f01549331e7c14b9751142f3cb616319f0dcaaee5d29a1d362d35d5
MD5 c3f281fb4638a5a429821d5c7779e81b
BLAKE2b-256 8f50c8edc8644efee30f6a4e4ab2d4043bd0b8394dfc5371493b9cb67af9288c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page