Skip to main content

A flexible browser automation library with support for multiple drivers

Project description

Silk

PyPI version Python versions License Code style: black Type check: mypy

Silk is a functional web scraping framework for Python that reimagines how web automation should work. Built around composable "Actions" and the Expression library, Silk enables you to write elegant, maintainable, and resilient web scrapers with true functional programming patterns.

Unlike traditional scraping libraries, Silk embraces Railway-Oriented Programming for robust error handling, uses immutable data structures for predictability, and provides an expressive, composable API that makes even complex scraping workflows readable and maintainable.

Why Silk?

Traditional web scraping approaches in Python often lead to complex, brittle code that's difficult to maintain. Silk solves these common challenges:

  • No More Callback Hell: Replace nested try/except blocks with elegant Railway-Oriented Programming
  • Resilient Scraping: Built-in retry mechanisms, fallback selectors, and error recovery
  • Composable Actions: Chain operations with intuitive operators (>>, &, |) for cleaner code
  • Type-Safe: Full typing support with Mypy and Pydantic for fewer runtime errors
  • Browser Agnostic: Same API for Playwright, Selenium, or any other browser automation tool
  • Parallelization Made Easy: Run operations concurrently with the & operator

Whether you're building a small data collection script or a large-scale scraping system, Silk's functional approach scales with your needs while keeping your codebase clean and maintainable.

Features

  • Purely Functional Design: Built on Expression library for robust functional programming in Python
  • Immutable Data Structures: Uses immutable collections for thread-safety and predictability
  • Railway-Oriented Programming: Elegant error handling with Result types
  • Functional & Composable API: Build pipelines with intuitive operators (>>, &, |)
  • Browser Abstraction: Works with Playwright, Selenium, or any other browser automation tool
  • Resilient Selectors: Fallback mechanisms to handle changing website structures
  • Type Safety: Leverages Pydantic, Mypy and Python's type hints for static type checking
  • Parallel Execution: Easy concurrent scraping with functional composition

Installation

You can install Silk with your preferred browser driver:

# Base installation (no drivers)
pip install silk-scraper

# With Playwright support
pip install silk-scraper[playwright]

# With Selenium support
pip install silk-scraper[selenium]

# With Puppeteer support
pip install silk-scraper[puppeteer]

# With all drivers
pip install silk-scraper[all]

Quick Start

Basic Example

Here's a minimal example to get you started with Silk:

import asyncio
from silk.actions.navigation import Navigate
from silk.actions.extraction import GetText
from silk.browsers.manager import BrowserManager

async def main():
    # Create a browser manager (defaults to Playwright)
    async with BrowserManager() as manager:
        # Define a simple scraping pipeline
        pipeline = (
            Navigate("https://example.com") 
            >> GetText("h1")
        )
        
        # Execute the pipeline
        result = await manager.execute_action(pipeline)
        
        if result.is_ok():
            print(f"Page title: {result.unwrap()}")
        else:
            print(f"Error: {result.error}")

if __name__ == "__main__":
    asyncio.run(main())

Configuring the Browser

Silk supports different browser drivers. You can configure them like this:

from silk.models.browser import BrowserOptions
from silk.browsers.manager import BrowserManager

# Configure browser options
options = BrowserOptions(
    headless=False,  # Set to False to see the browser UI
    browser_name="chromium",  # Choose "chromium", "firefox", or "webkit"
    slow_mo=50,  # Slow down operations by 50ms (useful for debugging)
    viewport={"width": 1280, "height": 800}
)

# Create a manager with specific driver and options
manager = BrowserManager(driver_type="playwright", default_options=options)

Creating Custom Actions

You can easily create your own actions for reusable scraping logic:

from silk.actions.base import Action
from silk.actions.decorators import action
from expression.core import Ok, Error
from silk.models.browser import ActionContext

@action()
async def extract_price(context, selector):
    """Extract and parse a price from the page"""
    page_result = await context.get_page()
    if page_result.is_error():
        return page_result
        
    page = page_result.unwrap()
    element_result = await page.query_selector(selector)
    
    if element_result.is_error():
        return Error(f"Element not found: {selector}")
        
    element = element_result.unwrap()
    text_result = await element.get_text()
    
    if text_result.is_error():
        return text_result
        
    text = text_result.unwrap()
    
    try:
        # Remove currency symbol and convert to float
        price = float(text.replace('$', '').strip())
        return Ok(price)
    except ValueError:
        return Error(f"Failed to parse price from: {text}")

Core Concepts

Actions

The fundamental building block in Silk is the Action. An Action represents a pure operation that can be composed with other actions using functional programming patterns. Each Action takes an ActionContext and returns a Result containing either the operation's result or an error.

class FindElement(Action[ElementHandle]):
    """Action to find an element on the page"""
    
    def __init__(self, selector: str):
        self.selector = selector
        
    async def execute(self, context: ActionContext) -> Result[ElementHandle, Exception]:
        try:
            page_result = await context.get_page()
            if page_result.is_error():
                return page_result
                
            page = page_result.unwrap()
            return await page.query_selector(self.selector)
        except Exception as e:
            return Error(e)

ActionContext

The ActionContext carries references to the browser, page, and other execution context information. Actions use this context to interact with the browser.

Result Type

Silk uses the Result[T, E] type from the Expression library for error handling. Rather than relying on exceptions, actions return Ok(value) for success or Error(exception) for failures.

Composition Operators

Silk provides powerful operators for composing actions:

  • >> (then): Chain actions sequentially
  • & (and): Run actions in parallel
  • | (or): Try one action, fall back to another if it fails

These operators make it easy to build complex scraping workflows with clear, readable code.

Detailed Examples

Handling Complex Selectors

Silk provides robust ways to handle changing website structures with selector groups. Selector groups are a collection of selectors that are tried in order until one succeeds.

from silk.selectors.selector import SelectorGroup, css, xpath

# Create a selector group with fallback options
product_price = SelectorGroup.create(
    "product_price",
    css(".current-price"),             # Try this first
    css(".product-price .amount"),     # Fall back to this
    xpath("//div[contains(@class, 'price')]//span")  # Last resort
)

# Use it in an extraction action
extract_price = GetText(product_price)

Resilient Scraping with Retry and Fallbacks

from silk.actions.flow import retry, fallback
from silk.actions.extraction import GetText
from silk.actions.navigation import Navigate

# Retry navigation up to 3 times with 2s delay
resilient_navigation = retry(
    Navigate("https://example.com"),
    max_attempts=3,
    delay_ms=2000
)

# Try multiple selectors for extracting data
extract_title = fallback(
    GetText(".main-title"),
    GetText("h1.title"),
    GetText("#product-name")
)

# Combine into a pipeline
pipeline = resilient_navigation >> extract_title

Parallel Extraction

Extract multiple pieces of information at once:

from silk.actions.composition import parallel
from silk.actions.extraction import GetText, GetAttribute

# Extract product details in parallel
product_details = parallel(
    GetText(".product-name"),
    GetText(".product-price"),
    GetAttribute(".product-image", "src"),
    GetText(".product-description")
)

# Use in a pipeline
pipeline = Navigate(product_url) >> product_details

# Results come back as a Block collection
result = await manager.execute_action(pipeline)
if result.is_ok():
    [name, price, image_url, description] = result.unwrap()
    print(f"Product: {name}, Price: {price}")

Form Filling and Submission

from silk.actions.input import Fill, Click
from silk.actions.flow import compose

login_action = compose(
    Navigate("https://example.com/login"),
    Fill("#username", "user@example.com"),
    Fill("#password", "password123"),
    Click("button[type='submit']")
)

Handling Dynamic Content

from silk.actions.flow import wait, loop_until
from silk.actions.conditions import ElementExists

# Wait for dynamic content to load
wait_for_results = wait(1000) >> ElementExists(".search-results-item")

# Loop until a condition is met
load_all_results = loop_until(
    condition=ElementExists(".no-more-results"),
    body=Click(".load-more-button"),
    max_iterations=10,
    delay_ms=1000
)

# Use in a pipeline
search_pipeline = (
    Navigate("https://example.com/search?q=example")
    >> wait_for_results
    >> load_all_results
    >> GetText(".search-results-count")
)

Action Decorator for Custom Functions

Easily convert any function into a composable Action using the @action decorator:

from silk import action, Ok, Error

@action
async def scroll_to_element(driver, selector, smooth=True):
    """Scrolls the page to bring the element into view"""
    try:
        element = await driver.query_selector(selector)
        await element.scroll_into_view({"behavior": "smooth" if smooth else "auto"})
        return "Element scrolled into view"
    except Exception as e:
        raise e

# Use it in a pipeline - the function is now a composable Action!
pipeline = (
    Navigate(url)
    >> scroll_to_element("#my-element")
    >> extract_text("#my-element")
)

result = await pipeline(browser)
if result.is_ok():
    print(f"Extracted text after scrolling: {result.unwrap()}")

Composable Operations

Silk provides intuitive operators for composable scraping:

Sequential Operations (>>)

# Navigate to a page, then extract the title
Navigate(url) >> Click(title_selector)

Parallel Operations (&)

# Extract name, price, and description in parallel
# Each action is executed in a new context when using the & operator
Navigate(url) & Navigate(url2) & Navigate(url3)
# Combining parallel and sequential operations
# Each parallel branch can contain its own chain of sequential actions
(
    # First website: Get product details
    (Navigate("https://site1.com/product") 
     >> Wait(1000)
     >> GetText(".product-name"))
    &
    # Second website: Search and extract first result
    (Navigate("https://site2.com") 
     >> Fill("#search-input", "smartphone")
     >> Click("#search-button")
     >> Wait(2000)
     >> GetText(".first-result .name"))
    &
    # Third website: Login and get account info
    (Navigate("https://site3.com/login")
     >> Fill("#username", "user@example.com")
     >> Fill("#password", "password123")
     >> Click(".login-button")
     >> Wait(1500)
     >> GetText(".account-info"))
)
# Results are collected as a Block of 3 items, one from each parallel branch

Fallback Operations (|)

# Try to extract with one selector, fall back to another if it fails
GetText(primary_selector) | GetText(fallback_selector)

API Reference

Core Modules

  • silk.actions: Core action classes for browser automation

    • silk.actions.base: Base Action class and core utilities
    • silk.actions.navigation: Actions for navigating between pages
    • silk.actions.extraction: Actions for extracting data from pages
    • silk.actions.input: Actions for interacting with forms and elements
    • silk.actions.flow: Control flow actions like branch, retry, and loop
    • silk.actions.composition: Utilities for composing actions (sequence, parallel, pipe)
    • silk.actions.decorators: Decorators like @action for creating custom actions
  • silk.browsers: Browser management and abstraction layer

    • silk.browsers.manager: BrowserManager for session handling
    • silk.browsers.driver: Abstract BrowserDriver interface
    • silk.browsers.element: ElementHandle for working with DOM elements
  • silk.selectors: Selector utilities

    • silk.selectors.selector: Selector and SelectorGroup classes
  • silk.models: Data models using Pydantic

    • silk.models.browser: BrowserOptions, ActionContext, etc.

Common Action Classes

  • Navigation

    • Navigate(url): Navigate to a URL
    • Reload(): Reload the current page
    • GoBack(): Navigate back in history
    • GoForward(): Navigate forward in history
  • Extraction

    • Query(selector): Find an element
    • QueryAll(selector): Find all matching elements
    • GetText(selector): Extract text from an element
    • GetAttribute(selector, attribute): Get an attribute value
    • GetHtml(selector, outer=True): Get element HTML
    • ExtractTable(table_selector): Extract data from an HTML table
  • Input

    • Click(target): Click an element
    • DoubleClick(target): Double-click an element
    • Fill(target, text): Fill a form field
    • Type(target, text): Type text (alias for Fill)
    • Select(target, value/text): Select an option from a dropdown
    • MouseMove(target): Move the mouse to an element
    • KeyPress(key, modifiers): Press a key or key combination
  • Flow Control

    • branch(condition, if_true, if_false): Conditional branching
    • loop_until(condition, body, max_iterations): Loop until condition is met
    • retry(action, max_attempts, delay_ms): Retry an action on failure
    • retry_with_backoff(action): Retry with exponential backoff
    • with_timeout(action, timeout_ms): Apply a timeout to an action
  • Composition

    • sequence(*actions): Run actions in sequence, collect all results
    • parallel(*actions): Run actions in parallel, collect all results
    • pipe(*actions): Create a pipeline where each action uses the previous result
    • fallback(*actions): Try actions in sequence until one succeeds
    • compose(*actions): Compose actions sequentially, return only the last result

For a complete API reference, please see the API documentation.

Best Practices

Error Handling

Silk uses Railway-Oriented Programming for error handling. Instead of using try/except, leverage the Result type:

result = await manager.execute_action(pipeline)
if result.is_ok():
    data = result.unwrap()
    # Process the data
else:
    # Handle the error
    error = result.error
    logger.error(f"Scraping failed: {error}")

Browser Resources

Always use context managers to ensure browser resources are properly cleaned up:

async with BrowserManager() as manager:
    # Your scraping code here
    pass  # Resources automatically cleaned up

Selector Resilience

Use selector groups for resilient scraping that can handle UI changes:

# Instead of a single brittle selector:
extract_price = GetText(".price-box .price")

# Use a group with fallbacks:
price_selector = SelectorGroup.create(
    "price",
    css(".price-box .price"),
    css(".product-price"),
    xpath("//span[contains(@class, 'price')]")
)
extract_price = GetText(price_selector)

Action Composition

Build reusable pipelines through composition instead of large monolithic functions:

# Define reusable components
navigate_to_product = Navigate("https://example.com/product")
extract_product_info = parallel(
    GetText(".product-name"),
    GetText(".product-price"),
    GetText(".product-description")
)
extract_related_products = QueryAll(".related-product") >> extract_text_from_elements

# Compose them in different ways
full_scraper = navigate_to_product >> extract_product_info >> extract_related_products
minimal_scraper = navigate_to_product >> extract_product_info

Logging

Enable logging to better debug your scraping pipelines:

import logging
logging.basicConfig(level=logging.DEBUG)
logging.getLogger("silk").setLevel(logging.DEBUG)

Contributing

Contributions to Silk are welcome! Please feel free to submit a Pull Request.

Development Setup

  1. Clone the repository

    git clone https://github.com/yourusername/silk.git
    cd silk
    
  2. Create a virtual environment

    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
    
  3. Install development dependencies

    pip install -e ".[dev,test,all]"
    
  4. Run tests

    pytest
    

Guidelines

  • Follow PEP 8 and use Black for code formatting
  • Write tests for new features
  • Keep the functional programming paradigm in mind
  • Update documentation with new features

Acknowledgements

Silk builds upon several excellent libraries:

Roadmap

  • Initial release with Playwright support
  • Selenium integration
  • Puppeteer integration
  • Data extraction DSL for declarative scraping
  • Enhanced caching mechanisms
  • Distributed scraping support
  • Rate limiting and polite scraping utilities
  • Integration with popular data processing libraries (Pandas, etc.)
  • CLI tool for quick scraping tasks

FAQ

How does Silk compare to other scraping libraries?

Silk differs from traditional scraping libraries like Scrapy, Beautiful Soup, or plain Selenium/Playwright in its functional approach. While these tools focus on imperative code with callbacks and exceptions, Silk embraces functional composition, immutable data structures, and Railway-Oriented Programming for cleaner, more maintainable code.

Can I use Silk with my existing Playwright/Selenium code?

Yes, Silk is designed to work alongside existing browser automation code. You can gradually adopt Silk's patterns while keeping your existing code.

Is Silk suitable for large-scale scraping?

Absolutely. Silk's composable nature makes it excellent for large-scale scraping projects. Its built-in error handling, retries, and parallel execution capabilities are particularly valuable for robust production systems.

How can I handle authentication in Silk?

You can handle authentication like any other browser interaction:

login_action = compose(
    Navigate("https://example.com/login"),
    Fill("#username", "user@example.com"),
    Fill("#password", "password123"),
    Click("button[type='submit']"),
    wait(1000)  # Wait for login to complete
)

# Then use the authenticated context for further actions
pipeline = login_action >> Navigate("https://example.com/protected-content") >> GetText("#protected-data")

You can also save and reuse authentication state with browser context options.

License

Silk is released under the MIT License. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

silk_scraper-0.1.0.tar.gz (51.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

silk_scraper-0.1.0-py3-none-any.whl (55.6 kB view details)

Uploaded Python 3

File details

Details for the file silk_scraper-0.1.0.tar.gz.

File metadata

  • Download URL: silk_scraper-0.1.0.tar.gz
  • Upload date:
  • Size: 51.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for silk_scraper-0.1.0.tar.gz
Algorithm Hash digest
SHA256 b22bb36fa9af3fa13c1eca0fe1f81294b4b61079f19d1585cf72915ce9ef966b
MD5 c968b400c12a71e1338b0060a72dd33f
BLAKE2b-256 a354a4b2aef6825673b0f067926cc1d6cfd63b508d29708210a7859a9a3e9f03

See more details on using hashes here.

File details

Details for the file silk_scraper-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: silk_scraper-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 55.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for silk_scraper-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1907023a220488941ae03ac64384640bcaafdb5464e7314220602ec25e064d8f
MD5 227bb003c83a78786be43c757172f811
BLAKE2b-256 8c69501b31413e80264c058b84c6d640f8fd34a6d14533c59c6e5fa5de55006f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page