A tiny web scraping library
Project description
Truelle
Truelle - "trowel" in french - is a tiny web scraping library, inspired by the great Scrapy framework, depending only on Requests, and Parsel libraries.
Truelle only offers a sequential request processing, and returns items directly It's intended to be embedded in tiny scripts. Spiders aims to be compatible with Scrapy spider and easily switch to a Scrapy.
Install
pip install truelle
Get started
- Create a Spider
from truelle import Spider
class MySpider(Spider):
start_urls = [ "https://truelle.io" ]
def parse(self, response: Response):
for title in response.css("h1::text").getall():
yield { "title": title }
spider = MySpider()
- Then get your items back...
... in vanilla Python:
for item in spider.crawl():
do_something(item)
... in a Pandas dataframe:
import pandas as pd
my_df = pd.DataFrame(spider.crawl())
Custom settings
def custom_fingerprint(request):
return "test"
custom_settings = {
"HTTP_CACHE_ENABLED": True,
"REQUEST_FINGERPRINTER": custom_fingerprint,
"HTTP_PROXY": "http://myproxy:8080",
"HTTPS_PROXY": "http://myproxy:8080",
"DOWNLOAD_DELAY": 2
}
spider.crawl(settings=custom_settings)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
truelle-0.0.1.tar.gz
(4.6 kB
view hashes)