Skip to main content

A tiny web scraping library

Project description

Truelle

Truelle - "trowel" in french - is a tiny web scraping library, inspired by the great Scrapy framework, depending only on Requests, and Parsel libraries.

Truelle only offers a sequential request processing, and returns items directly It's intended to be embedded in tiny scripts. Spiders aims to be compatible with Scrapy spider and easily switch to a Scrapy.

Install

pip install truelle

Get started

  1. Create a Spider
from truelle import Spider

class MySpider(Spider):
    start_urls = [ "https://truelle.io" ]
    
    def parse(self, response: Response):
        for title in response.css("h1::text").getall():
            yield { "title": title }
            
spider = MySpider() 
  1. Then get your items back...

... in vanilla Python:

for item in spider.crawl():
    do_something(item)

... in a Pandas dataframe:

import pandas as pd
my_df = pd.DataFrame(spider.crawl())

Custom settings

def custom_fingerprint(request):
    return "test"

custom_settings = {
    "HTTP_CACHE_ENABLED": True,
    "REQUEST_FINGERPRINTER": custom_fingerprint,
    "HTTP_PROXY": "http://myproxy:8080",
    "HTTPS_PROXY": "http://myproxy:8080",
    "DOWNLOAD_DELAY": 2
}

spider.crawl(settings=custom_settings)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

truelle-0.0.1.tar.gz (4.6 kB view hashes)

Uploaded Source

Built Distribution

truelle-0.0.1-py3-none-any.whl (4.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page