This is a pre-production deployment of Warehouse, however changes made here WILL affect the production instance of PyPI.
Latest Version Dependencies status unknown Test status unknown Test coverage unknown
Project Description

Tool exists to index content from a CSV in ElasticSearch.

Use

To index the contents of a CSV file (where the first row contains field names) into an ElasticSearch server, you must supply the -s, -f, -i and -t arguments. If the specified index or type does not exist within the specified ElasticSearch server, it will be created when the module executes. The following example will index data from data.file into ElasticSearch at http://myserver:9200/myindex/mytype:

python -m esimport -s myserver:9200 -f /path/to/import/data.file -i myindex -t mytype
  • -s server (may either be a hostname:port or fully qualified, i.e. https://servername:port)
  • -f filepath (location of tab-delimited file to import data from)
  • -i indexname_ (name of the target index within ElasticSearch)
  • -t typename_ (name of the target document type within ElasticSearch)

Further help available via the script:

python -m esimport --help

Additional Options

Custom Delimiter

The default delimiter for the operation is a “,”, but you may specify different delimiters via the -d argument.

python -m esimport -s myserver:9200 -f /path/to/import/data.file -i myindex -t mytype -d '|'
  • -d delimiter (the delimiter separating columns within the CSV)

Clear Existing Data First

When indexing data in ElasticSearch, you may optionally clear the existing ElasticSearch type before data is added by including the -rm argument.

python -m esimport -s myserver:9200 -f /path/to/import/data.file -i myindex -t mytype -rm
  • -rm (removes all documents of given type within the specified index on ElasticSearch)

Mapping

Mapping is the definitiion of the field names and value types of a document indexed within ElasticSearch. Characteristics such as field type, whether a field is searchable, and whether to include a timestamp may all be defined in a map.

Please note: If the specified mapping does not match the field names in the tab-delimited file, you should supply a field translation file (described below).

A mapping file can be specified by adding the -m parameter to the command:

python -m esimport -s myserver:9200 -f /path/to/import/data.file -i myindex -t mytype -m /path/to/mapping.json
  • -m mappingfilepath_ (location of JSON formatted mapping file for specified type)

Maps should be provided in JSON format, as seen below and on the ElasticSearch website.

Sample mapping for a document type “tracks”
{
    "tracks" : {
        "properties" : {
            "genre" : {
                "type" : "string"
            },
            "album" : {
                "type" : "string"
            },
            "ISRC" : {
                "type" : "string"
            },
            "recordCompany" : {
                "type" : "string"
            },
            "artist" : {
                "type" : "string"
            },
            "songToken" : {
                "type" : "integer"
            },
            "durationInSeconds" : {
                "type" : "integer"
            },
            "title" : {
                "type" : "string"
            }
        }
    }
}

More details on mapping may be found in ElasticSearch’s mapping reference.

Field Translations

If the original field names from the CSV file need to be altered or filtered during the import, you may provide a field translations file. The first row of this file should consist of the original field names, separated by the specified delimiter; the second, the new names, again separated by the specified delimeter.

Please note: Any original field names omitted from the first row, will be omitted from the indexed data as well. This is a handy way to filter the columns at time of indexing if that is necessary.

python -m esimport -s myserver:9200 -f /path/to/import/data.file -i myindex -t mytype -rm -m /path/to/mapping.json -n /path/to/field/name/translations.file
  • -n fieldtranslation_filepath_ (location of CSV field translation file for specified type)
Sample Field Name Translations File
Album_Category,Album_ID,ISRC,Record_Co,Song_Artist,Song_ID,Song_Secs,Song_Title
genre,albumID,ISRC,recordCompany,artist,songID,duration,title

Basic Auth

If login credentials are required, add the arguments -user and -pass

python -m esimport -s https://myserver.com -f /path/to/import/data.file -i myindex -t mytype -user exampleuser -pass examplepassword

Verify SSL

If ElasticSearch requires SSL, the -nv setting can be used to bypass certificate verification if necessary.

python -m esimport -s https://myserver.com -f /path/to/import/data.file -i myindex -t mytype -user exampleuser -pass examplepassword -nv

Bulk Index Count

You may specify the max number of records to index at time (defaults to 1000) by using the -bc argument.

python -m esimport -s https://myserver.com -f /path/to/import/data.file -i myindex -t mytype -user exampleuser -pass examplepassword -bc 500

Timeout

You may specify the timeout (defaults to 60) for communication with Elasticsearch using the -T argument..

python -m esimport -s https://myserver.com -f /path/to/import/data.file -i myindex -t mytype -user exampleuser -pass examplepassword -T 30

Dependencies

Release History

Release History

0.1.9

This version

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.1.8

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.1.7

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.1.6

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.1.5

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.1.4

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.1.3

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

Download Files

Download Files

TODO: Brief introduction on what you do with files - including link to relevant help section.

File Name & Checksum SHA256 Checksum Help Version File Type Upload Date
esimport-0.1.9.tar.gz (7.2 kB) Copy SHA256 Checksum SHA256 Source Oct 3, 2013

Supported By

WebFaction WebFaction Technical Writing Elastic Elastic Search Pingdom Pingdom Monitoring Dyn Dyn DNS HPE HPE Development Sentry Sentry Error Logging CloudAMQP CloudAMQP RabbitMQ Heroku Heroku PaaS Kabu Creative Kabu Creative UX & Design Fastly Fastly CDN DigiCert DigiCert EV Certificate Rackspace Rackspace Cloud Servers DreamHost DreamHost Log Hosting