Skip to main content

Hadoop URI utility

Project description

LindexURI

BigData URI utility

The idea is that every information stored is addressable using a URI, in this case we like to work with HDFS and HIVE. There

LindexURI.isValid( uri ) : returns true if the URI is valid, it's a static method and can be used in a quick way.

luri = LindexURI(uri)

luri.isPartitioned()

returns true if the HIVE uri is defining a partitioned table

if uri == "hive://databasename/tablename?dt=201212" luri.isPartitioned returns True.

luri.getPartitions()

returns a dictionary that describes the HIVE partition

if uri == "hive://databasename/tablename?dt=201212" luri.getPartitions() returns

OrderedDict( 'dt': '201212' )

luri.getDatabase()

gets the database name from the HIVE uri ( this can be modified to work also with HDFS paths )

if uri == "hive://databasename/tablename?dt=201212" luri.getDatabase() returns 'databasename'

luri.getTable()

gets the table name from HIVE uri, can be modified to work also with HDFS paths

if uri == "hive://databasename/tablename?dt=201212" luri.getDatabase() returns 'tablename'

luri.getHDFSHostName()

gets the HDFS hostname

if uri == "hdfs://hdfs-prod/warehouse/databasename.db/tablename.db/dt=201212" luri.getHDFSHostName returns 'hdfs-prod'

luri.getHDFSPath()

gets the path from the HDFS uri

if uri == "hdfs://hdfs-prod/warehouse/databasename.db/tablename.db/dt=201212" luri.getHDFSPath() returns 'warehouse/databasename.db/tablename.db/dt=201212'

luri.getSchema()

gets the schema

if uri == "hdfs://hdfs-prod/warehouse/databasename.db/tablename.db/dt=201212" luri.getSchema() returns 'hdfs'

luri.getPartitionsAsHDFSPath()

converts the partition coordinates into an HDFS path

p = OrderedDict( 'dt' : '201212', 'country': 'AU' ) dt=201212&country=AU

luri.getHDFSPathAsPartition()

converts the HDFS path into a partition coordinates dictionary

   'hdfs://hdfs-production/Vault/Docomodigital/Production/Newton/events/prod/year=2018/month=08/day=07/hour=09'

    root path : "/Vault/Docomodigital/Production/Newton/events/prod/"

    partitions : {
        "year" : "2018",
        "month" : "08",
        "day" : "07",
        "hour" : "09"
    }

luri.looksPartitioned()

returns true if the HDFS path can define a partition

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for LindexURI, version 0.1.2
Filename, size File type Python version Upload date Hashes
Filename, size LindexURI-0.1.2-py2.py3-none-any.whl (4.5 kB) File type Wheel Python version py2.py3 Upload date Hashes View
Filename, size LindexURI-0.1.2.tar.gz (3.3 kB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page