Skip to main content

Apache Accumulo and Apache HDFS Python Connector

Project description

logo Sharkbite

Documentation Status

Sharkbite is an HDFS and native client for Apache Accumulo ccumulo, with design liberties that make it usable across other key/value stores.

As of version V1.2 :

  • Works with Accumulo 1.6.x, 1.7.x, 1.8.x, 1.9.x and 2.x
  • package import is now sharkbite not pysharkbite
  • Support for torch IterableDatasets using batch scanners.
  • Read/Write : Reading and writing data to Accumulo is currently supported.
  • Bug fix for scanners when using Value in multiple threads

About the name

Sharkbite's name originated from design as a connector that abstracted components in which we tightly coupled and gripped interfaces of the underlying datastore. With an abstraction layer for access, and using cross compatible objects, the underlying interfaces are heavily coupled to each database. As a result, Sharkbite became a fitting name since interfaces exist to abstract the high coupling that exists within implementations of the API.

Python Support

This python client can be installed via pip install sharkbite

A Python example is included. This is your primary example of the Python bound sharkbite library.

Features

Hedged Reads (! BETA )

Sharkbite now supports hedged reads ( executing scans against RFiles when they can be accessed ) concurrently with Accumulo RPC scans. The first executor to complete will return your results. This feature is in beta and not suggested for production environments.

Enable it with the following option:


  import sharkbite as sharkbite

  connector = sharkbite.AccumuloConnector(user, zk)

  table_operations = connector.tableOps(table)  

  scanner = table_operations.createScanner(auths, 2)

  range = sharkbite.Range("myrow")

  scanner.addRange( range )

  ### enable the beta option of hedged reads

  scanner.setOption( sharkbite.ScannerOptions.HedgedReads )

  resultset = scanner.getResultSet()

  for keyvalue in resultset:
      key = keyvalue.getKey()
      value = keyvalue.getValue()

Python Iterators

We now support a beta version of python iterators. By using the cmake option PYTHON_ITERATOR_SUPPORT ( cmake -DPYTHON_ITERATOR_SUPPORT=ON ) we will build the necessary infrastructure to support python iterators

Iterators can be defined as single function lambdas or by implementing the seek or next methods.

The first example implements the seek and onNext methods. seek is optional if you don't wish to adjust the range. Once keys are being iterated you may get the top key. You may call iterator.next() after or the infrastructure will do that for you.


class myIterator: 
  def seek(iterator,soughtRange):
    range = Range("a")
    iterator.seek(range)


  def onNext(iterator):
    if (iterator.hasTop()):
    	kv = KeyValue()
  	  key = iterator.getTopKey()
  	  cf = key.getColumnFamily()
  	  value = iterator.getTopValue()
  	  key.setColumnFamily("oh changed " + cf)
  	  iterator.next()
  	  return KeyValue(key,value)
    else: 
      return None

If this is defined in a separate file, you may use it with the following code snippet

with open('test.iter', 'r') as file:
  iterator = file.read()
## name, iterator text, priority
iterator = sharkbite.PythonIterator("PythonIterator",iteratortext,100)
scanner.addIterator(iterator)    

Alternative you may use lambdas. The lambda you provide will be passed the KeyValue ( getKey() and getValue() return the constituent parts). A partial code example of setting it up is below. You may return a Key or KeyValue object. If you return the former an empty value will be return ed.

## define only the name and priority 
iterator = sharkbite.PythonIterator("PythonIterator",100)
## define a lambda to ajust the column family.
iterator = iterator.onNext("lambda x : Key( x.getKey().getRow(), 'new cf', x.getKey().getColumnQualifier()) ")

scanner.addIterator(iterator)

You may either define a python iterator as a text implementation or a lambda. Both cannot be used simulaneously.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

sharkbite-1.2.0.3-cp39-cp39-manylinux1_x86_64.whl (79.5 MB view details)

Uploaded CPython 3.9

sharkbite-1.2.0.3-cp36-cp36m-manylinux1_x86_64.whl (4.3 MB view details)

Uploaded CPython 3.6m

File details

Details for the file sharkbite-1.2.0.3-cp39-cp39-manylinux1_x86_64.whl.

File metadata

  • Download URL: sharkbite-1.2.0.3-cp39-cp39-manylinux1_x86_64.whl
  • Upload date:
  • Size: 79.5 MB
  • Tags: CPython 3.9
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/24.0 requests/2.25.1 requests-toolbelt/0.9.1 urllib3/1.26.2 tqdm/4.57.0 importlib-metadata/4.10.1 keyring/22.2.0 rfc3986/1.4.0 colorama/0.4.4 CPython/3.9.5

File hashes

Hashes for sharkbite-1.2.0.3-cp39-cp39-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 a4a9bb291e080bccc18ea69c190e19972d5a41cb700f3692230ba002eddd901e
MD5 adcd0d32f6ef362f0ea6ca4fb5c35981
BLAKE2b-256 628b8aaf857019c0ec31bba8ae4249278f205451ed05919d69534a4e8189de78

See more details on using hashes here.

File details

Details for the file sharkbite-1.2.0.3-cp36-cp36m-manylinux1_x86_64.whl.

File metadata

  • Download URL: sharkbite-1.2.0.3-cp36-cp36m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 4.3 MB
  • Tags: CPython 3.6m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.8.3 keyring/23.4.1 rfc3986/1.5.0 colorama/0.4.4 CPython/3.6.8

File hashes

Hashes for sharkbite-1.2.0.3-cp36-cp36m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 d32f703601f895014116565b8bfdb430bbcbb5c95fa3b78cc8dcc0d2ef6bfa59
MD5 09bfadf997825426e9f191877c7fbc8b
BLAKE2b-256 6ec81fde4d3aea68fa37cc0d62b471a116df28801e966e510a26b56117f21165

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page