Skip to main content

A simple package to let you Sqoop into HDFS/Hive/HBase with python

Project description

sqoop-it

A python package that lets you sqoop into HDFS/Hive/HBase data from RDBMS using sqoop.

PyPI Python MIT license

To install the package via pip, run

pip install sqoopit

You can then use the package using

from sqoopit.SqoopImport import Sqoop 
sqoop = Sqoop(help=True)
code = sqoop.perform_import()

This will print the output of the command

sqoop --help

to your stoud; e.g.

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.3.0-235/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.3.0-235/accumulo/lib/slf4j-log4j12.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
18/08/13 20:25:13 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.6.3.0-235
usage: sqoop import [GENERIC-ARGS] [TOOL-ARGS]

Common arguments:
   --connect <jdbc-uri>                                       Specify JDBC
                                                              connect
                                                              string
   --connection-manager <class-name>                          Specify
                                                              connection
                                                              manager
                                                              class name
   ...

Useful Resources

A more concrete example

The following code

sqoop = Sqoop(fs='hdfs://remote-cluster:8020', hive_drop_import_delims=True, fields_terminated_by='\;',
enclosed_by='\'"\'', escaped_by='\\\\', null_string='\'\'', null_non_string='\'\'',
table='sample_table', target_dir='hdfs://remote-cluster/user/hive/warehouse/db/sample_table',
delete_target_dir=True, connect='jdbc:oracle:thin:@//your_ip:your_port/your_schema',
username='user', password='pwd', num_mappers=2,
bindir='/path/to/bindir/folder')

sqoop.perform_import()

will execute the following command

sqoop import -fs hdfs://remote-cluster:8020 --hive-drop-import-delims --fields-terminated-by \; --enclosed-by \'\"\' --escaped-by \\\\ --null-string \'\' --null-non-string \'\' --table sample_table --target-dir hdfs://remote-cluster/user/hive/warehouse/db/sample_table --delete-target-dir --connect jdbc:oracle:thin:@//your_ip:your_port/your_schema --username user --password pwd --num-mappers 2 --bindir /path/to/bindir/folder

Conditional Building

Use the set_param, unset_param function to build conditioned sqoop imports.

sqoop = Sqoop(table="MyTable")

sqoop.set_param(param="--connect", value="jdbc:a_valid_string")

if taget_is_hbase :
   added_table = sqoop.set_param(param="--hbase-table", value="MyTable")
   added_key = sqoop.set_param(param="--hbase-row-key", value="Id_MyTable")
   if added_table and added_key:
      print("all params added :D")

sqoop.perform_import()

Doing

  • handle sqoop jobs
  • more tests coverage

TODOs

  • add missing parameters

Original Idea By Luca Fontanili

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sqoopit-0.0.12.tar.gz (6.7 kB view details)

Uploaded Source

Built Distribution

sqoopit-0.0.12-py3-none-any.whl (8.7 kB view details)

Uploaded Python 3

File details

Details for the file sqoopit-0.0.12.tar.gz.

File metadata

  • Download URL: sqoopit-0.0.12.tar.gz
  • Upload date:
  • Size: 6.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.8.2

File hashes

Hashes for sqoopit-0.0.12.tar.gz
Algorithm Hash digest
SHA256 dd455760e9aae28c17eec9d082a738139e2987a5c0970c135509ab1298906b5a
MD5 0b69ba514175a9c97cce67c2674fc3fd
BLAKE2b-256 e9e43a7fba8f4d51dc69ad8499397be5325383ca6f5bdbd0ef775d9957da28e4

See more details on using hashes here.

File details

Details for the file sqoopit-0.0.12-py3-none-any.whl.

File metadata

  • Download URL: sqoopit-0.0.12-py3-none-any.whl
  • Upload date:
  • Size: 8.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.8.2

File hashes

Hashes for sqoopit-0.0.12-py3-none-any.whl
Algorithm Hash digest
SHA256 e344eebf4706050086245166fb7e933d5c0790c54bc2e2b928860264124d5d86
MD5 221dd6663078023c85739e3385734ed7
BLAKE2b-256 14feab350c9ef7dd5116d4b463c582985680b48373e020e7bcc849dcc416f503

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page