Scrapy is a popular open-source and collaborative python framework for extracting the data you need from websites. scrapy-ipfs-filecoin provides scrapy pipelines and feed exports to store items into IPFS and Filecoin using services like Web3.Storage, LightHouse.Storage, Estuary, Pinata and Moralis.
Project description
Welcome to Scrapy-IPFS-Filecoin
Scrapy is a popular open-source and collaborative python framework for extracting the data you need from websites. scrapy-ipfs-filecoin provides scrapy pipelines and feed exports to store items into IPFS and Filecoin using services like Web3.Storage, LightHouse.Storage, Estuary, Pinata and Moralis.
🏠 Homepage
Install
npm install -g https://github.com/pawanpaudel93/ipfs-only-hash.git
pip install scrapy-ipfs-filecoin
Example
Usage
-
Install ipfs-only-hash and scrapy-ipfs-filecoin.
npm install -g https://github.com/pawanpaudel93/ipfs-only-hash.git
pip install scrapy-ipfs-filecoin
-
Add 'scrapy-ipfs-filecoin.pipelines.ImagesPipeline' and/or 'scrapy-ipfs-filecoin.pipelines.FilesPipeline' to ITEM_PIPELINES setting in your Scrapy project if you need to store images or other files to IPFS and Filecoin.
ITEM_PIPELINES = { 'scrapy_ipfs_filecoin.pipelines.ImagesPipeline': 1, 'scrapy-ipfs-filecoin.pipelines.FilesPipeline': 2 }
Add store path of files or images for Web3Storage, LightHouse or Estuary as required.
IMAGES_STORE = 'w3s://images' # For Web3Storage IMAGES_STORE = 'es://images' # For Estuary IMAGES_STORE = 'lh://images' # For LightHouse IMAGES_STORE = 'pn://images' # For Pinata IMAGES_STORE = 'ms://images' # For Moralis FILES_STORE = 'w3s://files' # For Web3Storage FILES_STORE = 'es://files' # For Estuary FILES_STORE = 'lh://files' # For LightHouse FILES_STORE = 'es://files' # For Pinata FILES_STORE = 'pn://files' # For Moralis
-
For Feed storage to store the output of scraping as json, csv, json, jsonlines, jsonl, jl, csv, xml, marshal, pickle etc set FEED_STORAGES as following for the desired output format:
from scrapy_ipfs_filecoin.feedexport import get_feed_storages FEED_STORAGES = get_feed_storages()
Then set API Key for one of the storage i.e Web3Storage, LightHouse or Estuary. And, set FEEDS as following to finally store the scraped data.
For Web3Storage:
W3S_API_KEY="<W3S_API_KEY>" FEEDS={ 'w3s://house.json': { "format": "json" }, }
For LightHouse:
LH_API_KEY="<LH_API_KEY>" FEEDS={ 'lh://house.json': { "format": "json" }, }
For Estuary:
ES_API_KEY="<W3S_API_KEY>" FEEDS={ 'es://house.json': { "format": "json" }, }
For Pinata:
PN_JWT_TOKEN="<PN_JWT_TOKEN>" FEEDS={ 'pn://house.json': { "format": "json" }, }
For Moralis:
MS_API_KEY="<MS_API_KEY>" FEEDS={ 'ms://house.json': { "format": "json" }, }
See more on FEEDS here
-
Now perform the scrapping as you would normally.
Author
👤 Pawan Paudel
- Github: @pawanpaudel93
🤝 Contributing
Contributions, issues and feature requests are welcome!
Feel free to check issues page.
Show your support
Give a ⭐️ if this project helped you!
Copyright © 2022 Pawan Paudel.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for scrapy_ipfs_filecoin-0.0.2.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1e7bc3d5dfdb1d6e3dea67479b0940dd322009ca309c501bb60407477503e4cd |
|
MD5 | ec2f0ba05a8c22d06de03c2e7bc497b7 |
|
BLAKE2b-256 | 7257cb44906046ee8ee9eb08fec7c2d9513b8405fb5d3b0d22915fce01445c07 |
Hashes for scrapy_ipfs_filecoin-0.0.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 305a7c4e6268589f1f0a89c75dd07273d1c960bc4ae16e03fbceb0482257d5c3 |
|
MD5 | d6c156cc37a680aeff9f2a6789021d44 |
|
BLAKE2b-256 | 16205418850482ae0a438669d3a0694f839549c46fdb82b0de0827affd83e936 |