Skip to main content

Make HTTP requests exactly like a browser.

Project description

Stay Undetected While Scraping the Web.

The All-In-One Solution to Web Scraping:

  • Realistic HTTP Requests:
    • Mimics browser headers for undetected scraping, adapting to the requested file type
    • Tracks dynamic headers such as Referer and Host
    • Masks the TLS fingerprint of HTTP requests using the curl_cffi package
  • Faster and Easier Parsing:
    • Automatically extracts metadata (title, description, author, etc.) from HTML-based responses
    • Methods to extract all webpage and image URLs
    • Seamlessly converts responses into Lxml and BeautifulSoup objects

Install

$ pip install stealth_requests

Sending Requests

Stealth-Requests mimics the API of the requests package, allowing you to use it in nearly the same way.

You can send one-off requests like such:

import stealth_requests as requests

resp = requests.get('https://link-here.com')

Or you can use a StealthSession object which will keep track of certain headers for you between requests such as the Referer header.

from stealth_requests import StealthSession

with StealthSession() as session:
    resp = session.get('https://link-here.com')

When sending a request, or creating a StealthSession, you can specify the type of browser that you want the request to mimic - either chrome, which is the default, or safari. If you want to change which browser to mimic, set the impersonate argument, either in requests.get or when initializing StealthSession to safari or chrome.

Sending Requests With Asyncio

This package supports Asyncio in the same way as the requests package:

from stealth_requests import AsyncStealthSession

async with AsyncStealthSession(impersonate='safari') as session:
    resp = await session.get('https://link-here.com')

or, for a one-off request, you can make a request like this:

import stealth_requests as requests

resp = await requests.get('https://link-here.com', impersonate='safari')

Getting Response Metadata

The response returned from this package is a StealthResponse, which has all of the same methods and attributes as a standard requests response object, with a few added features. One of these extra features is automatic parsing of header metadata from HTML-based responses. The metadata can be accessed from the meta property, which gives you access to the following metadata:

  • title: str | None
  • author: str | None
  • description: str | None
  • thumbnail: str | None
  • canonical: str | None
  • twitter_handle: str | None
  • keywords: tuple[str] | None
  • robots: tuple[str] | None

Here's an example of how to get the title of a page:

import stealth_requests as requests

resp = requests.get('https://link-here.com')
print(resp.meta.title)

Parsing Responses

To make parsing HTML faster, I've also added two popular parsing packages to Stealth-Requests - Lxml and BeautifulSoup4. To use these add-ons you need to install the parsers extra:

$ pip install stealth_requests[parsers]

To easily get an Lxml tree, you can use resp.tree() and to get a BeautifulSoup object, use the resp.soup() method.

For simple parsing, I've also added the following convenience methods, from the Lxml package, right into the StealthResponse object:

  • text_content(): Get all text content in a response
  • xpath() Go right to using XPath expressions instead of getting your own Lxml tree.

Get All Image and Page Links From a Response

If you would like to get all of the webpage URLs (a tags) from an HTML-based response, you can use the links property. If you'd like to get all image URLs (img tags) you can use the images property from a response object.

import stealth_requests as requests

resp = requests.get('https://link-here.com')
for image_url in resp.images:
    # ...

Getting HTML Responses in Markdown Format

In some cases, it’s easier to work with a webpage in Markdown format rather than HTML. After making a GET request that returns HTML, you can use the resp.markdown() method to convert the response into a Markdown string, providing a simplified and readable version of the page content!

markdown() has two optional parameters:

  1. content_xpath An XPath expression, in the form of a string, which can be used to narrow down what text is converted to Markdown. This can be useful if you don't want the header and footer of a webpage to be turned into Markdown.
  2. ignore_links A boolean value that tells Html2Text whether it should include any links in the output of the Markdown.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stealth_requests-1.2.1.tar.gz (9.9 kB view details)

Uploaded Source

Built Distribution

stealth_requests-1.2.1-py3-none-any.whl (8.3 kB view details)

Uploaded Python 3

File details

Details for the file stealth_requests-1.2.1.tar.gz.

File metadata

  • Download URL: stealth_requests-1.2.1.tar.gz
  • Upload date:
  • Size: 9.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.1

File hashes

Hashes for stealth_requests-1.2.1.tar.gz
Algorithm Hash digest
SHA256 48cf22d32f56ee987852f7b48203d802ca8b6a1d268e6dae659400ea88770c87
MD5 13fba6d8c22ed52835fcb1def321d5f9
BLAKE2b-256 e31b22a556b133d7978634e89e6a52d6a33d2208f0ff4669b1e28a016f347f4c

See more details on using hashes here.

File details

Details for the file stealth_requests-1.2.1-py3-none-any.whl.

File metadata

File hashes

Hashes for stealth_requests-1.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 0a1c2b926d39c2dbd5074cb5a789973eead3c3ff3b604ead7b0ed739533a88f7
MD5 3a3a400cde174ec13be40f70265b6afd
BLAKE2b-256 e5bb6fc6a17c3a37cc651ce09108508ef5cc581be8ec5b61be5d959ed4e0ff79

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page