Skip to main content

An extension module to send data to elasticsearch in bulk format.

Project description

###scrapy-elasticsearch-extension

A Scrapy Extension to bulk export data to elasticsearch

###required modules

[pyes](http://pyes.readthedocs.org/en/latest/)


###installation

generaly information to be found in the [Scrapy Extensions installation guide](http://doc.scrapy.org/en/latest/topics/extensions.html)

add the following line to the **EXTENSIONS** setting in your Scrapy settings:

```
'scrapyes.Sender' : 1000
```

###configuration

the module can be configured per project in your Scrapy settings using the following options:

```
ELASTICSEARCH_SERVER = "localhost"
ELASTICSEARCH_PORT = 9200
ELASTICSEARCH_INDEX = "sixx"
ELASTICSEARCH_TYPE = "text"
ELASTICSEARCH_BULK_SIZE = 10
```

### index configuration

the index used in Elastic Search insertion can be configured per spider [by initializing an attribute on the spider](http://doc.scrapy.org/en/latest/topics/spiders.html#spider-arguments), named index, and passing the desired value when the spider
job is scheduled.
example:
```
curl http://192.168.33.10:6800/schedule.json -d project=psd_search_crawler \
-d spider=sixx_spider \
-d index=my_index

```
if the index is not configured on the running spider, the crawler settings value for variable **ELASTICSEARCH_INDEX** will be used.

if the item declares an id field, it will be used to update ES

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ScrapyEs-0.21.tar.gz (2.2 kB view details)

Uploaded Source

File details

Details for the file ScrapyEs-0.21.tar.gz.

File metadata

  • Download URL: ScrapyEs-0.21.tar.gz
  • Upload date:
  • Size: 2.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for ScrapyEs-0.21.tar.gz
Algorithm Hash digest
SHA256 d6c814059033bba284e0c187b5234581d5a5aa25417d49f5b63ce6363cb77060
MD5 765ac4ba7109a8a13e41a03af8475711
BLAKE2b-256 e93b36e512e5b868c593a46372b8da4424fafdfaa9a0159a82cfc20fae1d5fda

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page