pyspark-sampling
Project description
pyspark-sampling
sparksampling
is a PySpark-based sampling and data quality assessment GRPC service that supports containerized
deployments and Spark On K8S
Feature
- Common sampling methods: Random, Stratified, Simple
- Relationship Sampling based on DAG and Topological sorting
- Cloud Native and Spark on K8S support
QUICK START
Installation
The trial only requires direct installation using pypi
pip install sparksampling
run as
sparksampling
The service will start and listen on port 8530
Docker
docker run -p 8530:8530 wh1isper/pysparksampling:latest
Development
Using dev install
pip install -e .[test]
pre-commit install
run test
pytest -v
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
sparksampling-0.2.4.tar.gz
(1.9 MB
view hashes)
Built Distribution
Close
Hashes for sparksampling-0.2.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5e4dc9e0907f6a684709e1f6f1eb16cfec6a056f9f741908aed2ce71e095f207 |
|
MD5 | 9785e85581a0a38e5fb4b5ace7586fca |
|
BLAKE2b-256 | 461c0b2e6643136ce19628b3e1a119ec6eadcc1c10082572fce9ed82f1670c9b |