Skip to main content

Universal content extraction library with tiered fetching strategies and anti-bot bypass

Project description

OmniFetch Python Library

Python implementation of OmniFetch - a universal content extraction library.

Features

  • Universal Extraction: Fetches content from any URL, handling standard sites, SPAs, and paywalls.
  • Tiered System:
    1. Light Fetch: Fast, standard HTTP request.
    2. Headless Browser: Handles dynamic JS-heavy sites (requires Netlify endpoint).
    3. Search Fallback: Finds alternative sources for paywalled or blocked content.
  • Smart Parsing: Converts HTML to clean Markdown or JSON.

Installation

pip install omnifetch-lib

Quick Start

from omnifetch import omni_fetch

# Text extraction (Markdown)
result = omni_fetch('https://example.com', mode='TEXT')
print(result.content)

# JSON extraction (Structured Data)
json_result = omni_fetch('https://example.com', mode='JSON')
print(json_result.content['title'])

Configuration

def omni_fetch(
    url: str,
    mode: str = 'TEXT',           # 'JSON' for structured, 'TEXT' for markdown
    timeout: int = 30,            # Request timeout in seconds
    netlify_endpoint: str = None, # Headless browser endpoint (Tier 2)
    headers: dict = None,         # Custom headers
    skip_headless: bool = False,  # Skip Tier 2
    skip_search: bool = False,    # Skip Tier 3
    force_title: str = None       # Override title for search fallback
) -> OmniFetchResult

Advanced Usage

Handling Blocked Domains (e.g., X/Twitter)

Some domains block direct scraping. OmniFetch automatically handles this by falling back to search (Tier 3). For opaque URLs, you can provide a force_title to improve search results.

result = omni_fetch(
    'https://x.com/someuser/status/12345',
    mode='TEXT',
    force_title='Specific Tweet Content Title' # Helps find the content via search
)

Headless Browser Support

To enable Tier 2 (Headless Browser) for dynamic sites, you need to deploy the provided Netlify function and pass the endpoint.

result = omni_fetch(
    'https://dynamic-site.com',
    netlify_endpoint='https://your-site.netlify.app/.netlify/functions/headless-fetch'
)

Development Installation

pip install -e .

Running Tests

pip install -e ".[dev]"
pytest

See the main README.md for full documentation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

omnifetch_lib-1.2.1.tar.gz (16.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

omnifetch_lib-1.2.1-py3-none-any.whl (18.5 kB view details)

Uploaded Python 3

File details

Details for the file omnifetch_lib-1.2.1.tar.gz.

File metadata

  • Download URL: omnifetch_lib-1.2.1.tar.gz
  • Upload date:
  • Size: 16.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for omnifetch_lib-1.2.1.tar.gz
Algorithm Hash digest
SHA256 3f8e95eadb5e7ebfca1318793309b1f8ddd654ffacc762b91acbe22f3594a573
MD5 97bddf43cf5bdde2a4161c73347a9c2c
BLAKE2b-256 0fd641f135dece8feef343eaa33ceede60312346990b76acff72ca598bd00b39

See more details on using hashes here.

Provenance

The following attestation bundles were made for omnifetch_lib-1.2.1.tar.gz:

Publisher: publish-python.yml on visy-ani/omni-fetch

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file omnifetch_lib-1.2.1-py3-none-any.whl.

File metadata

  • Download URL: omnifetch_lib-1.2.1-py3-none-any.whl
  • Upload date:
  • Size: 18.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for omnifetch_lib-1.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e554ebe096bf019df8bfbb7c8183cdc6812f2d1b0a6074c88548f9923c5d505a
MD5 28832cdfd72f65d7a1521c73d62d717a
BLAKE2b-256 3ddc6f9de060f15a58321542d548317a19511e595b56a9d2e9b751ee04e241d3

See more details on using hashes here.

Provenance

The following attestation bundles were made for omnifetch_lib-1.2.1-py3-none-any.whl:

Publisher: publish-python.yml on visy-ani/omni-fetch

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page