Scrapy is a popular open-source and collaborative python framework for extracting the data you need from websites. scrapy-ipfs-filecoin provides scrapy pipelines and feed exports to store items into IPFS and Filecoin using services like Web3.Storage, LightHouse.Storage, Estuary, Pinata and Moralis.
Project description
Welcome to Scrapy-IPFS-Filecoin
Scrapy is a popular open-source and collaborative python framework for extracting the data you need from websites. scrapy-ipfs-filecoin provides scrapy pipelines and feed exports to store items into IPFS and Filecoin using services like Web3.Storage, LightHouse.Storage, Estuary, Pinata and Moralis.
🏠 Homepage
Install
npm install -g https://github.com/pawanpaudel93/ipfs-only-hash.git
pip install scrapy-ipfs-filecoin
Example
Usage
-
Install ipfs-only-hash and scrapy-ipfs-filecoin.
npm install -g https://github.com/pawanpaudel93/ipfs-only-hash.git
pip install scrapy-ipfs-filecoin
-
Add 'scrapy-ipfs-filecoin.pipelines.ImagesPipeline' and/or 'scrapy-ipfs-filecoin.pipelines.FilesPipeline' to ITEM_PIPELINES setting in your Scrapy project if you need to store images or other files to IPFS and Filecoin.
ITEM_PIPELINES = { 'scrapy_ipfs_filecoin.pipelines.ImagesPipeline': 1, 'scrapy-ipfs-filecoin.pipelines.FilesPipeline': 2 }
Add store path of files or images for Web3Storage, LightHouse or Estuary as required.
IMAGES_STORE = 'w3s://images' # For Web3Storage IMAGES_STORE = 'es://images' # For Estuary IMAGES_STORE = 'lh://images' # For LightHouse IMAGES_STORE = 'pn://images' # For Pinata IMAGES_STORE = 'ms://images' # For Moralis FILES_STORE = 'w3s://files' # For Web3Storage FILES_STORE = 'es://files' # For Estuary FILES_STORE = 'lh://files' # For LightHouse FILES_STORE = 'es://files' # For Pinata FILES_STORE = 'pn://files' # For Moralis
-
For Feed storage to store the output of scraping as json, csv, json, jsonlines, jsonl, jl, csv, xml, marshal, pickle etc set FEED_STORAGES as following for the desired output format:
from scrapy_ipfs_filecoin.feedexport import get_feed_storages FEED_STORAGES = get_feed_storages()
Then set API Key for one of the storage i.e Web3Storage, LightHouse or Estuary. And, set FEEDS as following to finally store the scraped data.
For Web3Storage:
W3S_API_KEY="<W3S_API_KEY>" FEEDS={ 'w3s://house.json': { "format": "json" }, }
For LightHouse:
LH_API_KEY="<LH_API_KEY>" FEEDS={ 'lh://house.json': { "format": "json" }, }
For Estuary:
ES_API_KEY="<W3S_API_KEY>" FEEDS={ 'es://house.json': { "format": "json" }, }
For Pinata:
PN_JWT_TOKEN="<PN_JWT_TOKEN>" FEEDS={ 'pn://house.json': { "format": "json" }, }
For Moralis:
MS_API_KEY="<MS_API_KEY>" FEEDS={ 'ms://house.json': { "format": "json" }, }
See more on FEEDS here
-
Now perform the scrapping as you would normally.
Author
👤 Pawan Paudel
- Github: @pawanpaudel93
🤝 Contributing
Contributions, issues and feature requests are welcome!
Feel free to check issues page.
Show your support
Give a ⭐️ if this project helped you!
Copyright © 2022 Pawan Paudel.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for scrapy_ipfs_filecoin-0.0.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6a6116a61a02343ce05247e62673e2720cc2fbcc85b63ab73abfafe712c09344 |
|
MD5 | 3e504a098539a82b9f250b92c8c49281 |
|
BLAKE2b-256 | 1e78de39893f3101aa2cf870d9ccbeb75140fc41fc7ea27cf9fc5ee48b55825c |
Hashes for scrapy_ipfs_filecoin-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 53d557bc3a326fbff1ed6052ef192010a6d2d063f7cf05b06cc9e2604579cd75 |
|
MD5 | 055d190595264ce4afcbb931d9630aa9 |
|
BLAKE2b-256 | 60a37598c2004a4da2589a19c4eba689baa4687c8251195318ce0a15c7d3d474 |