Skip to main content

pyspark-sampling

Project description

pyspark-sampling

sparksampling is a PySpark-based sampling and data quality assessment GRPC service that supports containerized deployments and Spark On K8S

Feature

  • Common sampling methods: Random, Stratified, Simple
  • Relationship Sampling based on DAG and Topological sorting
  • Cloud Native and Spark on K8S support

QUICK START

Installation

The trial only requires direct installation using pypi

pip install sparksampling

run as

sparksampling

The service will start and listen on port 8530

Docker

docker run -p 8530:8530 wh1isper/pysparksampling:latest

Development

Using dev install

pip install -e .[test]
pre-commit install

run test

pytest -v

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sparksampling-0.4.2.tar.gz (1.9 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sparksampling-0.4.2-py3-none-any.whl (33.3 kB view details)

Uploaded Python 3

File details

Details for the file sparksampling-0.4.2.tar.gz.

File metadata

  • Download URL: sparksampling-0.4.2.tar.gz
  • Upload date:
  • Size: 1.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.11

File hashes

Hashes for sparksampling-0.4.2.tar.gz
Algorithm Hash digest
SHA256 010ac9c109ff3cd6a2d4a0a2531ac265d52191a6398586d51fb254540e6e32f5
MD5 fb515b51905e37d0108fbecb4251c7af
BLAKE2b-256 ed469982610865b02c3a958e7e9152c506b556b7f023ea42c86916c63660bd2e

See more details on using hashes here.

File details

Details for the file sparksampling-0.4.2-py3-none-any.whl.

File metadata

  • Download URL: sparksampling-0.4.2-py3-none-any.whl
  • Upload date:
  • Size: 33.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.11

File hashes

Hashes for sparksampling-0.4.2-py3-none-any.whl
Algorithm Hash digest
SHA256 743062f9f2a73b2cdd4957c10526c830e1277e0a7b7d3cf9b5ef0d01f5cfada2
MD5 6bd476eaf6ddda70aa617c3cae30544a
BLAKE2b-256 8cc2fb0e04e7361a3421514900d095a9dff23eafa510e94218e4d5cc79f61090

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page