An extension module to send data to elasticsearch in bulk format.
Project description
###scrapy-elasticsearch-extension
A Scrapy Extension to bulk export data to elasticsearch
###required modules
[pyes](http://pyes.readthedocs.org/en/latest/)
###installation
generaly information to be found in the [Scrapy Extensions installation guide](http://doc.scrapy.org/en/latest/topics/extensions.html)
add the following line to the **EXTENSIONS** setting in your Scrapy settings:
```
'scrapyes.Sender' : 1000
```
###configuration
the module can be configured per project in your Scrapy settings using the following options:
```
ELASTICSEARCH_SERVER = "localhost"
ELASTICSEARCH_PORT = 9200
ELASTICSEARCH_INDEX = "sixx"
ELASTICSEARCH_TYPE = "text"
ELASTICSEARCH_BULK_SIZE = 10
```
### index configuration
the index used in Elastic Search insertion can be configured per spider [by initializing an attribute on the spider](http://doc.scrapy.org/en/latest/topics/spiders.html#spider-arguments), named index, and passing the desired value when the spider
job is scheduled.
example:
```
curl http://192.168.33.10:6800/schedule.json -d project=psd_search_crawler \
-d spider=sixx_spider \
-d index=my_index
```
if the index is not configured on the running spider, the crawler settings value for variable **ELASTICSEARCH_INDEX** will be used.
if the item declares an id field, it will be used to update ES
A Scrapy Extension to bulk export data to elasticsearch
###required modules
[pyes](http://pyes.readthedocs.org/en/latest/)
###installation
generaly information to be found in the [Scrapy Extensions installation guide](http://doc.scrapy.org/en/latest/topics/extensions.html)
add the following line to the **EXTENSIONS** setting in your Scrapy settings:
```
'scrapyes.Sender' : 1000
```
###configuration
the module can be configured per project in your Scrapy settings using the following options:
```
ELASTICSEARCH_SERVER = "localhost"
ELASTICSEARCH_PORT = 9200
ELASTICSEARCH_INDEX = "sixx"
ELASTICSEARCH_TYPE = "text"
ELASTICSEARCH_BULK_SIZE = 10
```
### index configuration
the index used in Elastic Search insertion can be configured per spider [by initializing an attribute on the spider](http://doc.scrapy.org/en/latest/topics/spiders.html#spider-arguments), named index, and passing the desired value when the spider
job is scheduled.
example:
```
curl http://192.168.33.10:6800/schedule.json -d project=psd_search_crawler \
-d spider=sixx_spider \
-d index=my_index
```
if the index is not configured on the running spider, the crawler settings value for variable **ELASTICSEARCH_INDEX** will be used.
if the item declares an id field, it will be used to update ES
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
ScrapyEs-0.21.tar.gz
(2.2 kB
view details)
File details
Details for the file ScrapyEs-0.21.tar.gz.
File metadata
- Download URL: ScrapyEs-0.21.tar.gz
- Upload date:
- Size: 2.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d6c814059033bba284e0c187b5234581d5a5aa25417d49f5b63ce6363cb77060
|
|
| MD5 |
765ac4ba7109a8a13e41a03af8475711
|
|
| BLAKE2b-256 |
e93b36e512e5b868c593a46372b8da4424fafdfaa9a0159a82cfc20fae1d5fda
|