Skip to main content
Join the official 2019 Python Developers SurveyStart the survey!

An extension module to send data to elasticsearch in bulk format.

Project description

###scrapy-elasticsearch-extension

A Scrapy Extension with the following functionaltity:

- to bulk export data to elasticsearch
- delete outdated documents

###required modules

[pyes](http://pyes.readthedocs.org/en/latest/)


###installation

generaly information to be found in the [Scrapy Extensions installation guide](http://doc.scrapy.org/en/latest/topics/extensions.html)

add the following line to the **EXTENSIONS** setting in your Scrapy settings:

```
'scrapyes.Sender' : 1000
```

###configuration

the module can be configured per project in your Scrapy settings using the following options:

```
ELASTICSEARCH_SERVER = "localhost"
ELASTICSEARCH_PORT = 9200
ELASTICSEARCH_INDEX = "sixx"
ELASTICSEARCH_TYPE = "text"
ELASTICSEARCH_BULK_SIZE = 10
SCRAPYES_ENABLED = True
```

### index configuration

the index used in Elastic Search insertion can be configured per spider [by initializing an attribute on the spider](http://doc.scrapy.org/en/latest/topics/spiders.html#spider-arguments), named index, and passing the desired value when the spider
job is scheduled.
example:
```
curl http://192.168.33.10:6800/schedule.json -d project=psd_search_crawler \
-d spider=sixx_spider \
-d index=my_index

```
if the index is not configured on the running spider, the crawler settings value for variable **ELASTICSEARCH_INDEX** will be used.

if the item declares an id field, it will be used to update ES


### deleting outdated documents

If the document has been indexed with fiels 'spider_name' and 'last_indexed'
documents indexed before the latest run of the spider
will be removed when the spider closes,in case the spider has
finished its task

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for ScrapyEs, version 0.23
Filename, size File type Python version Upload date Hashes
Filename, size ScrapyEs-0.23.tar.gz (2.4 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page