Scrapy is a popular open-source and collaborative python framework for extracting the data you need from websites. scrapy-ipfs-filecoin provides scrapy pipelines and feed exports to store items into IPFS and Filecoin using services like Web3.Storage, LightHouse.Storage, Estuary, Pinata and Moralis.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

Development Status
- 3 - Alpha
License
- OSI Approved :: ISC License (ISCL)
Operating System
- OS Independent
Programming Language

Project description

original

Welcome to Scrapy-IPFS-Filecoin

Version

🏠 Homepage

Install

npm install -g https://github.com/pawanpaudel93/ipfs-only-hash.git

pip install scrapy-ipfs-filecoin

Example

Usage

Install ipfs-only-hash and scrapy-ipfs-filecoin.

npm install -g https://github.com/pawanpaudel93/ipfs-only-hash.git

pip install scrapy-ipfs-filecoin

Add 'scrapy-ipfs-filecoin.pipelines.ImagesPipeline' and/or 'scrapy-ipfs-filecoin.pipelines.FilesPipeline' to ITEM_PIPELINES setting in your Scrapy project if you need to store images or other files to IPFS and Filecoin.

ITEM_PIPELINES = {
	'scrapy_ipfs_filecoin.pipelines.ImagesPipeline': 1,
	'scrapy-ipfs-filecoin.pipelines.FilesPipeline': 2
}

Add store path of files or images for Web3Storage, LightHouse or Estuary as required.

IMAGES_STORE = 'w3s://images' # For Web3Storage
IMAGES_STORE = 'es://images' # For Estuary
IMAGES_STORE = 'lh://images' # For LightHouse
IMAGES_STORE = 'pn://images' # For Pinata
IMAGES_STORE = 'ms://images' # For Moralis

FILES_STORE = 'w3s://files' # For Web3Storage
FILES_STORE = 'es://files' # For Estuary
FILES_STORE = 'lh://files' # For LightHouse
FILES_STORE = 'es://files' # For Pinata
FILES_STORE = 'pn://files' # For Moralis

For Feed storage to store the output of scraping as json, csv, json, jsonlines, jsonl, jl, csv, xml, marshal, pickle etc set FEED_STORAGES as following for the desired output format:

from scrapy_ipfs_filecoin.feedexport import get_feed_storages
FEED_STORAGES = get_feed_storages()

Then set API Key for one of the storage i.e Web3Storage, LightHouse or Estuary. And, set FEEDS as following to finally store the scraped data.

For Web3Storage:

W3S_API_KEY="<W3S_API_KEY>"
FEEDS={
	'w3s://house.json': {
		"format": "json"
	},
}

For LightHouse:

LH_API_KEY="<LH_API_KEY>"
FEEDS={
	'lh://house.json': {
		"format": "json"
	},
}

For Estuary:

ES_API_KEY="<W3S_API_KEY>"
FEEDS={
	'es://house.json': {
		"format": "json"
	},
}

For Pinata:

PN_JWT_TOKEN="<PN_JWT_TOKEN>"
FEEDS={
	'pn://house.json': {
		"format": "json"
	},
}

For Moralis:

MS_API_KEY="<MS_API_KEY>"
FEEDS={
	'ms://house.json': {
		"format": "json"
	},
}

See more on FEEDS here

Now perform the scrapping as you would normally.

Author

👤 Pawan Paudel

Github: @pawanpaudel93

🤝 Contributing

Contributions, issues and feature requests are welcome!
Feel free to check issues page.

Show your support

Give a ⭐️ if this project helped you!

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

Development Status
- 3 - Alpha
License
- OSI Approved :: ISC License (ISCL)
Operating System
- OS Independent
Programming Language

Release history Release notifications | RSS feed

0.0.3

Dec 14, 2022

0.0.2

Nov 18, 2022

This version

0.0.1

Nov 18, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapy_ipfs_filecoin-0.0.1.tar.gz (11.9 kB view hashes)

Uploaded Nov 18, 2022 Source

Built Distribution

scrapy_ipfs_filecoin-0.0.1-py3-none-any.whl (10.5 kB view hashes)

Uploaded Nov 18, 2022 Python 3

Hashes for scrapy_ipfs_filecoin-0.0.1.tar.gz

Hashes for scrapy_ipfs_filecoin-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`6a6116a61a02343ce05247e62673e2720cc2fbcc85b63ab73abfafe712c09344`
MD5	`3e504a098539a82b9f250b92c8c49281`
BLAKE2b-256	`1e78de39893f3101aa2cf870d9ccbeb75140fc41fc7ea27cf9fc5ee48b55825c`

Hashes for scrapy_ipfs_filecoin-0.0.1-py3-none-any.whl

Hashes for scrapy_ipfs_filecoin-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`53d557bc3a326fbff1ed6052ef192010a6d2d063f7cf05b06cc9e2604579cd75`
MD5	`055d190595264ce4afcbb931d9630aa9`
BLAKE2b-256	`60a37598c2004a4da2589a19c4eba689baa4687c8251195318ce0a15c7d3d474`