Skip to main content

No project description provided

Project description

# Convert CSV object files to Apache Parquet with IBM Cloud Object Storage

This tool was developed to help users on IBM Cloud convert their CSV objects in IBM Cloud Object Storage (COS) to Apache Parquet objects. It’s developed using Python 3.6.6 and will work with Python 3 versions up to 3.6.6.

### Installation To install the tool, run pip with:

` pip install csvtoparquet `

After the tool’s installed, you must have an IBM Cloud API Key and IBM COS service to make the command line tool work. It requires that you insert your IBM Cloud API Key and a IBM COS service. You can find the API Key from your IBM Cloud management panel: Manage > Security > Platform API Keys. If you don’t have IBM COS as a service, you can find it in the cloud Catalog under Object Storage, which has a lite tier (free).

If you already have the COS service, you’ll need the name of the bucket where your CSV objects are located. Right now, the tool doesn’t support multiple buckets, so you can’t convert objects from one bucket and store them in another. Nonetheless, you can rename your converted objects to use prefixes such as:

[object name] - mycsvfile.csv [renamed object stored as parquet] - new/prefix/mycsvfile.parquet

The file extension .parquet will be automatically added to your new object name.

### Usage

Run csvtoparquet on the command line using the following required flags:

` csvtoparquet -a <IBM_CLOUD_API_KEY> -e <IBM_CLOUD_COS_ENDPOINT> -b <IBM_COS_BUCKET> `

  • -a or –apikey - IBM Cloud API Key

  • -e or –endpoint - COS bucket endpoint

  • -b or –bucket - COS bucket name where the CSV objects are stored

After using the flag you can append the following flags to the command:

  • -l or –list - Lists all the objects in the bucket

  • -c or –csv - Lists all CSV objects in the bucket

  • -cn or –csv-names - Lists only the names of CSV objects in the bucket

  • -f or –file - Name of the CSV object you want to convert - used with -n

  • -n or –name - Name of the new object - can include prefixes - used with -f

#### Converting objects

##### Convert one object

Input:

` csvtoparquet -a <IBM_CLOUD_API_KEY> -e <IBM_CLOUD_COS_ENDPOINT> -b <IBM_COS_BUCKET> \ -f csvfile.csv -n csvfile `

Output:

` Now Converting: csvfile.csv --> csvfile.parquet `

##### Convert more than one object

Input:

` csvtoparquet -a <IBM_CLOUD_API_KEY> -e <IBM_CLOUD_COS_ENDPOINT> -b <IBM_COS_BUCKET> \ -f csvfile.csv anothercsvfile.csv -n csvfile new/csvfile `

Output:

` Now Converting: csvfile.csv --> csvfile.parquet Now Converting: anothercsvfile.csv --> new/csvfile.parquet `

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

csvtoparquet-0.1.5.tar.gz (6.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

csvtoparquet-0.1.5-py3-none-any.whl (13.9 kB view details)

Uploaded Python 3

File details

Details for the file csvtoparquet-0.1.5.tar.gz.

File metadata

  • Download URL: csvtoparquet-0.1.5.tar.gz
  • Upload date:
  • Size: 6.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.18.4 setuptools/40.1.0 requests-toolbelt/0.8.0 tqdm/4.23.4 CPython/3.6.6

File hashes

Hashes for csvtoparquet-0.1.5.tar.gz
Algorithm Hash digest
SHA256 b6d1a1a6037c1351f549339843140cd0bf814eab34fee91338d953ad073274a5
MD5 baa8c334ed0b100d1a138d245e688ba2
BLAKE2b-256 fc24e390619ed522ce8b1e7f10cbe710bc2fa6e6bbd5c54071a6dc5e78ce8d7c

See more details on using hashes here.

File details

Details for the file csvtoparquet-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: csvtoparquet-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 13.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.18.4 setuptools/40.1.0 requests-toolbelt/0.8.0 tqdm/4.23.4 CPython/3.6.6

File hashes

Hashes for csvtoparquet-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 25a3b8d393843aa3cbb68e4a6d82482c527efce7752cffb0ce2c8fe5b85c3eac
MD5 3bee9c22d59c535758be1f8839605fd5
BLAKE2b-256 8a165e9042359af822f66bc0c9c85d6c440bd08db428ff9aba987f89cba8ecee

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page