Skip to main content

A simple package to let you Sqoop into HDFS/Hive/HBase with python

Project description


A python package that lets you sqoop into HDFS/Hive/HBase data from RDBMS using sqoop.

PyPI Python MIT license

To install the package via pip, run

pip install sqoopit

You can then use the package using

from sqoopit.SqoopImport import Sqoop 
sqoop = Sqoop(help=True)
code = sqoop.perform_import()

This will print the output of the command

sqoop --help

to your stoud; e.g.

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
18/08/13 20:25:13 INFO sqoop.Sqoop: Running Sqoop version:
usage: sqoop import [GENERIC-ARGS] [TOOL-ARGS]

Common arguments:
   --connect <jdbc-uri>                                       Specify JDBC
   --connection-manager <class-name>                          Specify
                                                              class name

Useful Resources

A more concrete example

The following code

sqoop = Sqoop(fs='hdfs://remote-cluster:8020', hive_drop_import_delims=True, fields_terminated_by='\;',
enclosed_by='\'"\'', escaped_by='\\\\', null_string='\'\'', null_non_string='\'\'',
table='sample_table', target_dir='hdfs://remote-cluster/user/hive/warehouse/db/sample_table',
delete_target_dir=True, connect='jdbc:oracle:thin:@//your_ip:your_port/your_schema',
username='user', password='pwd', num_mappers=2,


will execute the following command

sqoop import -fs hdfs://remote-cluster:8020 --hive-drop-import-delims --fields-terminated-by \; --enclosed-by \'\"\' --escaped-by \\\\ --null-string \'\' --null-non-string \'\' --table sample_table --target-dir hdfs://remote-cluster/user/hive/warehouse/db/sample_table --delete-target-dir --connect jdbc:oracle:thin:@//your_ip:your_port/your_schema --username user --password pwd --num-mappers 2 --bindir /path/to/bindir/folder

Conditional Building

Use the set_param, unset_param function to build conditioned sqoop imports.

sqoop = Sqoop(table="MyTable")

sqoop.set_param(param="--connect", value="jdbc:a_valid_string")

if taget_is_hbase :
   added_table = sqoop.set_param(param="--hbase-table", value="MyTable")
   added_key = sqoop.set_param(param="--hbase-row-key", value="Id_MyTable")
   if added_table and added_key:
      print("all params added :D")



  • handle sqoop jobs
  • more tests coverage


  • add missing parameters

Original Idea By Luca Fontanili

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sqoopit-0.0.12.tar.gz (6.7 kB view hashes)

Uploaded Source

Built Distribution

sqoopit-0.0.12-py3-none-any.whl (8.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page