Skip to main content

Spark extension for processing large-scale 3D data sets

Project description

Build Status codecov Maven Central

The package is under an active development!

Latest News

  • [05/2018] GSoC 2018: spark3D has been selected to the Google Summer of Code (GSoC) 2018. Congratulation to @mayurdb who will work on the project this year!
  • [06/2018] Release: version 0.1.0, 0.1.1
  • [07/2018] New location: spark3D is an official project of AstroLab Software!
  • [07/2018] Release: version 0.1.3, 0.1.4, 0.1.5
  • [08/2018] Release: version 0.2.0, 0.2.1 (pyspark3d)
  • [09/2018] Release: version 0.2.2
  • [11/2018] Release: version 0.3.0, 0.3.1 (new DataFrame API)

Rationale

spark3D should be viewed as an extension of the Apache Spark framework, and more specifically the Spark SQL module, focusing on the manipulation of three*-dimensional data sets.

Why would you use spark3D? If you often need to repartition large spatial 3D data sets, or perform spatial queries (neighbour search, window queries, cross-match, clustering, ...), spark3D is for you. It contains optimised classes and methods to do so, and it spares you the implementation time! In addition, a big advantage of all those extensions is to efficiently perform visualisation of large data sets by quickly building a representation of your data set (see more here).

spark3D exposes two API: Scala (spark3D) and Python (pyspark3d). The core developments are done in Scala, and interfaced with Python using the great py4j package. This means pyspark3d might not contain all the features present in spark3D. In addition, due to difference between Scala and Python, there might be subtle differences in the two APIs.

While we try to stick to the latest Apache Spark developments, spark3D started with the RDD API and slowly migrated to use the DataFrame API. This process left a huge imprint on the code structure, and low-level layers in spark3D often still use RDD to manipulate the data. Do not be surprised if things are moving, the package is under an active development but we try to keep the user interface as stable as possible!

Last but not least: spark3D is by no means complete, and you are welcome to suggest changes, report bugs or inconsistent implementations, and contribute directly to the package!

Cheers, Julien

Why 3? Because there are already plenty of very good packages dealing with 2D data sets (e.g. geospark, geomesa, magellan, GeoTrellis, and others), but that was not suitable for many applications such as in astronomy!

Installation and tutorials

Scala

You can link spark3D to your project (either spark-shell or spark-submit) by specifying the coordinates:

spark-submit --packages "com.github.astrolabsoftware:spark3d_2.11:0.3.0"

Python

Just run

pip install pyspark3d

Note that we release the assembly JAR with it.

More information

See our website!

Contributors

  • Julien Peloton (peloton at lal.in2p3.fr)
  • Christian Arnault (arnault at lal.in2p3.fr)
  • Mayur Bhosale (mayurdb31 at gmail.com) -- GSoC 2018.

Contributing to spark3D: see CONTRIBUTING.

Support

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyspark3d-0.3.1.tar.gz (16.5 kB view details)

Uploaded Source

Built Distribution

pyspark3d-0.3.1-py3-none-any.whl (28.1 kB view details)

Uploaded Python 3

File details

Details for the file pyspark3d-0.3.1.tar.gz.

File metadata

  • Download URL: pyspark3d-0.3.1.tar.gz
  • Upload date:
  • Size: 16.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.19.1 setuptools/40.5.0 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.7.0

File hashes

Hashes for pyspark3d-0.3.1.tar.gz
Algorithm Hash digest
SHA256 e5e1bd6a99451348a9116f3d93ca65269532795112706588586ceb0f2a08094a
MD5 d0d44532f92b784302658e2c1b071717
BLAKE2b-256 79544c3c0ade75aedf54d981e012b70e70b325cc799cf50df2b04758ea32542b

See more details on using hashes here.

File details

Details for the file pyspark3d-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: pyspark3d-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 28.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.19.1 setuptools/40.5.0 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.7.0

File hashes

Hashes for pyspark3d-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9e970dc31ed09715595675ea43c194807039bfd864b238a928fbd89495a9f011
MD5 e78af121c2965b59183c454769393c56
BLAKE2b-256 879d33da571a507ee99a3f2518f7f5bb96a1b10ed2abe5d2db8969ca84b42637

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page