Skip to main content
This is a pre-production deployment of Warehouse. Changes made here affect the production instance of PyPI (pypi.python.org).
Help us improve Python packaging - Donate today!

An extension module to send data to elasticsearch in bulk format.

Project Description
###scrapy-elasticsearch-extension

A Scrapy Extension with the following functionaltity:

- to bulk export data to elasticsearch
- delete outdated documents

###required modules

[pyes](http://pyes.readthedocs.org/en/latest/)


###installation

generaly information to be found in the [Scrapy Extensions installation guide](http://doc.scrapy.org/en/latest/topics/extensions.html)

add the following line to the **EXTENSIONS** setting in your Scrapy settings:

```
'scrapyes.Sender' : 1000
```

###configuration

the module can be configured per project in your Scrapy settings using the following options:

```
ELASTICSEARCH_SERVER = "localhost"
ELASTICSEARCH_PORT = 9200
ELASTICSEARCH_INDEX = "sixx"
ELASTICSEARCH_TYPE = "text"
ELASTICSEARCH_BULK_SIZE = 10
SCRAPYES_ENABLED = True
```

### index configuration

the index used in Elastic Search insertion can be configured per spider [by initializing an attribute on the spider](http://doc.scrapy.org/en/latest/topics/spiders.html#spider-arguments), named index, and passing the desired value when the spider
job is scheduled.
example:
```
curl http://192.168.33.10:6800/schedule.json -d project=psd_search_crawler \
-d spider=sixx_spider \
-d index=my_index

```
if the index is not configured on the running spider, the crawler settings value for variable **ELASTICSEARCH_INDEX** will be used.

if the item declares an id field, it will be used to update ES


### deleting outdated documents

If the document has been indexed with fiels 'spider_name' and 'last_indexed'
documents indexed before the latest run of the spider
will be removed when the spider closes,in case the spider has
finished its task
Release History

Release History

This version
History Node

0.23

History Node

0.22

History Node

0.21

History Node

0.20

History Node

0.19

History Node

0.18

History Node

0.17

History Node

0.16

History Node

0.15

History Node

0.14

History Node

0.13

History Node

0.12

History Node

0.11

History Node

0.10

History Node

0.9

History Node

0.8

History Node

0.7

History Node

0.6

History Node

0.5

History Node

0.4

History Node

0.3

History Node

0.2

History Node

0.1

Download Files

Download Files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

File Name & Checksum SHA256 Checksum Help Version File Type Upload Date
ScrapyEs-0.23.tar.gz (2.4 kB) Copy SHA256 Checksum SHA256 Source Oct 18, 2013

Supported By

WebFaction WebFaction Technical Writing Elastic Elastic Search Pingdom Pingdom Monitoring Dyn Dyn DNS Sentry Sentry Error Logging CloudAMQP CloudAMQP RabbitMQ Heroku Heroku PaaS Kabu Creative Kabu Creative UX & Design Fastly Fastly CDN DigiCert DigiCert EV Certificate Rackspace Rackspace Cloud Servers DreamHost DreamHost Log Hosting