Skip to main content

An arrow interface for PySpark RDDs

Project description

Sparrow

Sparrow (a combination of Spark and arrow) is a Python mini library that enhances Spark with an arrow API.

Intent is to make mappers and filters over RDD a bit more elegant and exciting. Author also feels that here developed API does have a more consitent feel.

Consider and example of few operations on an RDD in native PySpark

...
rdd = spark.sparkContext.parallelize(
        [
            (1, 2.0, ["a", "b", "c"]),
            (2, 3.0, ["b", "c", "d"]),
            (3, 4.0, ["c", "d", "e"]),
            (4, 5.0, ["d", "e", "f"]),
            (5, 6.0, ["e", "f", "g"]),
        ]
    )
    
res = rdd.map(lambda x: x[2]).flatMap(lambda x: x).filter(lambda x: x == 'b')

and then on RDD extended with Sparrow:

rdd = spark.sparkContext.parallelize(
        [
            (1, 2.0, ["a", "b", "c"]),
            (2, 3.0, ["b", "c", "d"]),
            (3, 4.0, ["c", "d", "e"]),
            (4, 5.0, ["d", "e", "f"]),
            (5, 6.0, ["e", "f", "g"]),
        ]
    )

res = (
    SparrowRDD(rdd) 
    >> (lambda x: x[2]) 
    >> Flatten(lambda x: x)
    >> Filter(lambda x: x == 'b')
)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pysparrow-1.0.4.tar.gz (2.9 kB view details)

Uploaded Source

Built Distribution

pysparrow-1.0.4-py3-none-any.whl (2.8 kB view details)

Uploaded Python 3

File details

Details for the file pysparrow-1.0.4.tar.gz.

File metadata

  • Download URL: pysparrow-1.0.4.tar.gz
  • Upload date:
  • Size: 2.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.63.0 importlib-metadata/4.11.2 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.10

File hashes

Hashes for pysparrow-1.0.4.tar.gz
Algorithm Hash digest
SHA256 dadd4fe239c5b8007d2a5566b80b9255e779c0dfdbec5bd9d21d34574a9edf1e
MD5 cf64d7ef82b818a3633e2fae459e6222
BLAKE2b-256 d261496021fd1291fdffe572c698858100a327eff8b9bb5a866fe9e885d2c85b

See more details on using hashes here.

File details

Details for the file pysparrow-1.0.4-py3-none-any.whl.

File metadata

  • Download URL: pysparrow-1.0.4-py3-none-any.whl
  • Upload date:
  • Size: 2.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.63.0 importlib-metadata/4.11.2 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.10

File hashes

Hashes for pysparrow-1.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 3f7585a13190526ddaa19e81445cdc34269b458e67b6a9e4fc3f73c9ababc9ce
MD5 d8546b634d3d9913fb507ee99f9e803a
BLAKE2b-256 882e994f001823c3aa3d319ea2b517efbeca48046457090fd32e72bec367f453

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page