A flexible browser automation library with support for multiple drivers

These details have not been verified by PyPI

Project links

Project description

Silk

Silk is a functional web scraping framework for Python that reimagines how web automation should work. Built around composable "Actions" and the Expression library, Silk enables you to write elegant, maintainable, and resilient web scrapers with true functional programming patterns.

Unlike traditional scraping libraries, Silk embraces Railway-Oriented Programming for robust error handling, uses immutable data structures for predictability, and provides an expressive, composable API that makes even complex scraping workflows readable and maintainable.

Why Silk?

Traditional web scraping approaches in Python often lead to complex, brittle code that's difficult to maintain. Silk solves these common challenges:

No More Callback Hell: Replace nested try/except blocks with elegant Railway-Oriented Programming
Resilient Scraping: Built-in retry mechanisms, fallback selectors, and error recovery
Composable Actions: Chain operations with intuitive operators (>>, &, |) for cleaner code
Type-Safe: Full typing support with Mypy and Pydantic for fewer runtime errors
Browser Agnostic: Same API for Playwright, Selenium, or any other browser automation tool
Parallelization Made Easy: Run operations concurrently with the & operator

Whether you're building a small data collection script or a large-scale scraping system, Silk's functional approach scales with your needs while keeping your codebase clean and maintainable.

Features

Purely Functional Design: Built on Expression library for robust functional programming in Python
Immutable Data Structures: Uses immutable collections for thread-safety and predictability
Railway-Oriented Programming: Elegant error handling with Result types
Functional & Composable API: Build pipelines with intuitive operators (>>, &, |)
Browser Abstraction: Works with Playwright, Selenium, or any other browser automation tool
Resilient Selectors: Fallback mechanisms to handle changing website structures
Type Safety: Leverages Pydantic, Mypy and Python's type hints for static type checking
Parallel Execution: Easy concurrent scraping with functional composition

Installation

You can install Silk with your preferred browser driver:

# Base installation (no drivers)
pip install silk-scraper

# With Playwright support
pip install silk-scraper[playwright]

# With Selenium support
pip install silk-scraper[selenium]

# With Puppeteer support
pip install silk-scraper[puppeteer]

# With all drivers
pip install silk-scraper[all]

Quick Start

Basic Example

Here's a minimal example to get you started with Silk:

import asyncio
from silk.actions.navigation import Navigate
from silk.actions.extraction import GetText
from silk.browsers.manager import BrowserManager

async def main():
    # Create a browser manager (defaults to Playwright)
    async with BrowserManager() as manager:
        # Define a simple scraping pipeline
        pipeline = (
            Navigate("https://example.com") 
            >> GetText("h1")
        )
        
        # Execute the pipeline
        result = await pipeline(manager)
        
        if result.is_ok():
            print(f"Page title: {result.default_value(None)}")
        else:
            print(f"Error: {result.error}")

if __name__ == "__main__":
    asyncio.run(main())

Configuring the Browser

Silk supports different browser drivers. You can configure them like this:

from silk.models.browser import BrowserOptions
from silk.browsers.manager import BrowserManager

# Configure browser options
options = BrowserOptions(
    headless=False,  # Set to False to see the browser UI
    browser_name="chromium",  # Choose "chromium", "firefox", or "webkit"
    slow_mo=50,  # Slow down operations by 50ms (useful for debugging)
    viewport={"width": 1280, "height": 800}
)

# Create a manager with specific driver and options
manager = BrowserManager(driver_type="playwright", default_options=options)

Creating Custom Actions

You can easily create your own actions for reusable scraping logic:

from silk.actions.base import Action
from silk.actions.decorators import action
from expression.core import Ok, Error
from silk.models.browser import ActionContext

@action()
async def extract_price(context, selector):
    """Extract and parse a price from the page"""
    page_result = await context.get_page()
    if page_result.is_error():
        return page_result
        
    page = page_result.default_value(None)
    if page is None:
        return Error("No page found")   
    
    element_result = await page.query_selector(selector)
    
    if element_result.is_error():
        return Error(f"Element not found: {selector}")
        
    element = element_result.default_value(None)
    if element is None:
        return Error("No element found")
    
    text_result = await element.get_text()
    
    if text_result.is_error():
        return text_result
        
    text = text_result.default_value(None)
    if text is None:
        return Error("No text found")
    
    try:
        # Remove currency symbol and convert to float
        price = float(text.replace('$', '').strip())
        return Ok(price)
    except ValueError:
        return Error(f"Failed to parse price from: {text}")

Core Concepts

Actions

The fundamental building block in Silk is the Action. An Action represents a pure operation that can be composed with other actions using functional programming patterns. Each Action takes an ActionContext and returns a Result containing either the operation's result or an error.

class FindElement(Action[ElementHandle]):
    """Action to find an element on the page"""
    
    def __init__(self, selector: str):
        self.selector = selector
        
    async def execute(self, context: ActionContext) -> Result[ElementHandle, Exception]:
        try:
            page_result = await context.get_page()
            if page_result.is_error():
                return page_result
                
            page = page_result.default_value(None)
            if page is None:
                return Error("No page found")
            
            return await page.query_selector(self.selector)
        except Exception as e:
            return Error(e)

ActionContext

The ActionContext carries references to the browser, page, and other execution context information. Actions use this context to interact with the browser.

Result Type

Silk uses the Result[T, E] type from the Expression library for error handling. Rather than relying on exceptions, actions return Ok(value) for success or Error(exception) for failures.

Composition Operators

Silk provides powerful operators for composing actions:

>> (then): Chain actions sequentially
& (and): Run actions in parallel
| (or): Try one action, fall back to another if it fails

These operators make it easy to build complex scraping workflows with clear, readable code.

Detailed Examples

Handling Complex Selectors

Silk provides robust ways to handle changing website structures with selector groups. Selector groups are a collection of selectors that are tried in order until one succeeds.

from silk.selectors.selector import SelectorGroup, css, xpath

# Create a selector group with fallback options
product_price = SelectorGroup(
    "product_price",
    css(".current-price"),             # Try this first
    css(".product-price .amount"),     # Fall back to this
    xpath("//div[contains(@class, 'price')]//span")  # Last resort
)

# Use it in an extraction action
extract_price = GetText(product_price)

Resilient Scraping with Retry and Fallbacks

from silk.actions.flow import retry, fallback
from silk.actions.extraction import GetText
from silk.actions.navigation import Navigate

# Retry navigation up to 3 times with 2s delay
resilient_navigation = retry(
    Navigate("https://example.com"),
    max_attempts=3,
    delay_ms=2000
)

# Try multiple selectors for extracting data
extract_title = fallback(
    GetText(".main-title"),
    GetText("h1.title"),
    GetText("#product-name")
)

# Combine into a pipeline
pipeline = resilient_navigation >> extract_title

Parallel Extraction

Extract multiple pieces of information at once:

from silk.actions.composition import parallel
from silk.actions.extraction import GetText, GetAttribute

# Extract product details in parallel
product_details = parallel(
    GetText(".product-name"),
    GetText(".product-price"),
    GetAttribute(".product-image", "src"),
    GetText(".product-description")
)

# Use in a pipeline
pipeline = Navigate(product_url) >> product_details

# Results come back as a collection
result = await pipeline(manager)
if result.is_ok():
    product_details = result.default_value(None)
    if product_details is None:
        print("No product details found")
    else:
        [name, price, image_url, description] = product_details
        print(f"Product: {name}, Price: {price}")

Form Filling and Submission

from silk.actions.input import Fill, Click
from silk.actions.flow import compose

login_action = compose(
    Navigate("https://example.com/login"),
    Fill("#username", "user@example.com"),
    Fill("#password", "password123"),
    Click("button[type='submit']")
)

Handling Dynamic Content

from silk.actions.flow import wait, loop_until
from silk.actions.conditions import ElementExists

# Wait for dynamic content to load
wait_for_results = wait(1000) >> ElementExists(".search-results-item")

# Loop until a condition is met
load_all_results = loop_until(
    condition=ElementExists(".no-more-results"),
    body=Click(".load-more-button"),
    max_iterations=10,
    delay_ms=1000
)

# Use in a pipeline
search_pipeline = (
    Navigate("https://example.com/search?q=example")
    >> wait_for_results
    >> load_all_results
    >> GetText(".search-results-count")
)

Action Decorator for Custom Functions

Easily convert any function into a composable Action using the @action decorator:

from silk import action, Ok, Error

@action
async def scroll_to_element(driver, selector, smooth=True):
    """Scrolls the page to bring the element into view"""
    try:
        element = await driver.query_selector(selector)
        await element.scroll_into_view({"behavior": "smooth" if smooth else "auto"})
        return "Element scrolled into view"
    except Exception as e:
        raise e

# Use it in a pipeline - the function is now a composable Action!
pipeline = (
    Navigate(url)
    >> scroll_to_element("#my-element")
    >> extract_text("#my-element")
)

result = await pipeline(browser)
if result.is_ok():
    print(f"Extracted text after scrolling: {result.default_value(None)}")

Composable Operations

Silk provides intuitive operators for composable scraping:

Sequential Operations (`>>`)

# Navigate to a page, then extract the title
Navigate(url) >> Click(title_selector)

Parallel Operations (`&`)

# Extract name, price, and description in parallel
# Each action is executed in a new context when using the & operator
Navigate(url) & Navigate(url2) & Navigate(url3)

# Combining parallel and sequential operations
# Each parallel branch can contain its own chain of sequential actions
(
    # First website: Get product details
    (Navigate("https://site1.com/product") 
     >> Wait(1000)
     >> GetText(".product-name"))
    &
    # Second website: Search and extract first result
    (Navigate("https://site2.com") 
     >> Fill("#search-input", "smartphone")
     >> Click("#search-button")
     >> Wait(2000)
     >> GetText(".first-result .name"))
    &
    # Third website: Login and get account info
    (Navigate("https://site3.com/login")
     >> Fill("#username", "user@example.com")
     >> Fill("#password", "password123")
     >> Click(".login-button")
     >> Wait(1500)
     >> GetText(".account-info"))
)
# Results are collected as a Block of 3 items, one from each parallel branch

Fallback Operations (`|`)

# Try to extract with one selector, fall back to another if it fails
GetText(primary_selector) | GetText(fallback_selector)

Fallback operations are powerful tools for building resilient scraping pipelines. They allow you to try multiple scraping strategies and return the first successful result. in combination with SelectorGroups, you can create very robust scraping pipelines.

from silk.actions.navigation import Navigate
from silk.actions.extraction import GetText, GetAttribute, QueryAll, ExtractTable
from silk.actions.input import Click
from silk.actions.flow import wait, retry, fallback
from silk.selectors.selector import SelectorGroup, css, xpath

# Example: Advanced product information scraping with multiple strategies
async def scrape_product(url, manager):
    # Strategy 1: Direct extraction using primary selectors
    primary_strategy = (
        Navigate(url)
        >> GetText(".product-title")
    )
    
    # Strategy 2: Click on a tab first, then extract from revealed content
    secondary_strategy = (
        Navigate(url)
        >> Click(".details-tab")
        >> wait(500)  # Wait for tab content to load
        >> GetText(".tab-content h1")
    )
    
    # Strategy 3: Extract from structured JSON data in script tag
    json_strategy = (
        Navigate(url)
        >> GetAttribute('script[type="application/ld+json"]', "textContent")
        # Additional processing would parse the JSON and extract title
    )
    
    # Combine all strategies with fallback operator
    product_title_pipeline = (
        primary_strategy | secondary_strategy | json_strategy
    )
    
    # Multiple fallback approaches for price extraction
    price_pipeline = (
        # Try special sale price first
        (Navigate(url) >> GetText(".special-price .price-amount"))
        |
        # Then try regular price
        (Navigate(url) >> GetText(".regular-price"))
        |
        # Then try to extract from a pricing table
        (Navigate(url) 
         >> ExtractTable("#pricing-table")
         # Additional processing would extract price from table data
        )
        |
        # Last resort: Try to find price in any element containing "$"
        (Navigate(url)
         >> QueryAll("*:contains('$')")
         # Additional processing would filter and extract price
        )
    )
    
    # Execute both pipelines
    title_result = await product_title_pipeline(manager)
    price_result = await price_pipeline(manager)
    
    return {
        "title": title_result.default_value("Unknown Title"),
        "price": price_result.default_value("Price Unavailable")
    }

# Example with SelectorGroups for even more resilience
def build_robust_product_scraper(url):
    # Create selector groups with multiple options
    title_selectors = SelectorGroup(
        "product_title",
        css(".product-title"),
        css("h1.title"),
        xpath("//div[@class='product-info']//h1"),
        css(".pdp-title")
    )
    
    price_selectors = SelectorGroup(
        "product_price",
        css(".special-price .amount"),
        css(".product-price"),
        xpath("//span[contains(@class, 'price')]"),
        css(".price-info .price")
    )
    
    image_selectors = SelectorGroup(
        "product_image",
        css(".product-image-gallery img"),
        css(".main-image"),
        xpath("//div[contains(@class, 'gallery')]//img")
    )
    
    # Use these groups in a pipeline with retries
    return (
        Navigate(url)
        >> retry(GetText(title_selectors), max_attempts=3, delay_ms=1000)
        >> retry(GetText(price_selectors), max_attempts=3, delay_ms=1000)
        >> retry(GetAttribute(image_selectors, "src"), max_attempts=3, delay_ms=1000)
    )

API Reference

Core Modules

silk.actions: Core action classes for browser automation
- silk.actions.base: Base Action class and core utilities
- silk.actions.navigation: Actions for navigating between pages
- silk.actions.extraction: Actions for extracting data from pages
- silk.actions.input: Actions for interacting with forms and elements
- silk.actions.flow: Control flow actions like branch, retry, and loop
- silk.actions.composition: Utilities for composing actions (sequence, parallel, pipe)
- silk.actions.decorators: Decorators like @action for creating custom actions
silk.browsers: Browser management and abstraction layer
- silk.browsers.manager: BrowserManager for session handling
- silk.browsers.driver: Abstract BrowserDriver interface
- silk.browsers.element: ElementHandle for working with DOM elements
silk.selectors: Selector utilities
- silk.selectors.selector: Selector and SelectorGroup classes
silk.models: Data models using Pydantic
- silk.models.browser: BrowserOptions, ActionContext, etc.

Common Action Classes

Navigation
- Navigate(url): Navigate to a URL
- Reload(): Reload the current page
- GoBack(): Navigate back in history
- GoForward(): Navigate forward in history
Extraction
- Query(selector): Find an element
- QueryAll(selector): Find all matching elements
- GetText(selector): Extract text from an element
- GetAttribute(selector, attribute): Get an attribute value
- GetHtml(selector, outer=True): Get element HTML
- ExtractTable(table_selector): Extract data from an HTML table
Input
- Click(target): Click an element
- DoubleClick(target): Double-click an element
- Fill(target, text): Fill a form field
- Type(target, text): Type text (alias for Fill)
- Select(target, value/text): Select an option from a dropdown
- MouseMove(target): Move the mouse to an element
- KeyPress(key, modifiers): Press a key or key combination
Flow Control
- branch(condition, if_true, if_false): Conditional branching
- loop_until(condition, body, max_iterations): Loop until condition is met
- retry(action, max_attempts, delay_ms): Retry an action on failure
- retry_with_backoff(action): Retry with exponential backoff
- with_timeout(action, timeout_ms): Apply a timeout to an action
Composition
- sequence(*actions): Run actions in sequence, collect all results
- parallel(*actions): Run actions in parallel, collect all results
- pipe(*actions): Create a pipeline where each action uses the previous result
- fallback(*actions): Try actions in sequence until one succeeds
- compose(*actions): Compose actions sequentially, return only the last result

For a complete API reference, please see the API documentation.

Best Practices

Error Handling

Silk uses Railway-Oriented Programming for error handling. Instead of using try/except, leverage the Result type:

result = await pipeline(manager)
if result.is_ok():
    data = result.default_value(None)
    # Process the data
else:
    # Handle the error
    error = result.error
    logger.error(f"Scraping failed: {error}")

Browser Resources

Always use context managers to ensure browser resources are properly cleaned up:

async with BrowserManager() as manager:
    # Your scraping code here
    pass  # Resources automatically cleaned up

Selector Resilience

Use selector groups for resilient scraping that can handle UI changes:

# Instead of a single brittle selector:
extract_price = GetText(".price-box .price")

# Use a group with fallbacks:
price_selector = SelectorGroup(
    "price",
    css(".price-box .price"),
    css(".product-price"),
    xpath("//span[contains(@class, 'price')]")
)
extract_price = GetText(price_selector)

Action Composition

Build reusable pipelines through composition instead of large monolithic functions:

# Define reusable components
navigate_to_product = Navigate("https://example.com/product")
extract_product_info = parallel(
    GetText(".product-name"),
    GetText(".product-price"),
    GetText(".product-description")
)
extract_related_products = QueryAll(".related-product") >> extract_text_from_elements

# Compose them in different ways
full_scraper = navigate_to_product >> extract_product_info >> extract_related_products
minimal_scraper = navigate_to_product >> extract_product_info

Logging

Enable logging to better debug your scraping pipelines:

import logging
logging.basicConfig(level=logging.DEBUG)
logging.getLogger("silk").setLevel(logging.DEBUG)

Contributing

Contributions to Silk are welcome! Please feel free to submit a Pull Request.

Development Setup

Clone the repository

git clone https://github.com/galaddirie/silk.git
cd silk

Install Poetry (if not already installed)

curl -sSL https://install.python-poetry.org | python3 -

Install dependencies
```
poetry install --all-extras
```
Activate the virtual environment
```
poetry shell
```
Run tests
```
poetry run pytest
```

Guidelines

Follow PEP 8 and use Black for code formatting
Write tests for new features
Keep the functional programming paradigm in mind
Update documentation with new features

Acknowledgements

Silk builds upon several excellent libraries:

Expression for functional programming patterns in Python
Playwright and Selenium for browser automation
Pydantic for data validation and settings management

Roadmap

Initial release with Playwright support
Improve parallel execution
Support multiple actions in parallel in the same context/page eg. (GetText & GetAttribute & GetHtml) in an ergonomic way
Selenium integration
Puppeteer integration
Add examples
Support Mapped tasks similar to airflow tasks eg. (QueryAll >> GetText[]) where get text is applied to each element in the collection
Add proxy options
Explore stealth options for browser automation ( implement patchwright, no-driver, driverless, etc.)
add dependency review
Support for task dependencies
action signature validation
Data extraction DSL for declarative scraping
Support computer using agentds (browser-use, openai cua, claude computer-use)
Enhanced caching mechanisms
Distributed scraping support
Rate limiting and polite scraping utilities
Integration with popular data processing libraries (Pandas, etc.)
CLI tool for quick scraping tasks

FAQ

How does Silk compare to other scraping libraries?

Silk differs from traditional scraping libraries like Scrapy, Beautiful Soup, or plain Selenium/Playwright in its functional approach. While these tools focus on imperative code with callbacks and exceptions, Silk embraces functional composition, immutable data structures, and Railway-Oriented Programming for cleaner, more maintainable code.

Can I use Silk with my existing Playwright/Selenium code?

Yes, Silk is designed to work alongside existing browser automation code. You can gradually adopt Silk's patterns while keeping your existing code.

Is Silk suitable for large-scale scraping?

Absolutely. Silk's composable nature makes it excellent for large-scale scraping projects. Its built-in error handling, retries, and parallel execution capabilities are particularly valuable for robust production systems.

How can I handle authentication in Silk?

You can handle authentication like any other browser interaction:

login_action = compose(
    Navigate("https://example.com/login"),
    Fill("#username", "user@example.com"),
    Fill("#password", "password123"),
    Click("button[type='submit']"),
    wait(1000)  # Wait for login to complete
)

# Then use the authenticated context for further actions
pipeline = login_action >> Navigate("https://example.com/protected-content") >> GetText("#protected-data")

You can also save and reuse authentication state with browser context options.

License

Silk is released under the MIT License. See the LICENSE file for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.3.1

Jun 9, 2025

0.3.0

May 27, 2025

0.2.5

Apr 15, 2025

0.2.3

Apr 14, 2025

0.2.2

Apr 14, 2025

0.2.0

Apr 14, 2025

0.1.6

Apr 8, 2025

0.1.4

Apr 8, 2025

0.1.2

Apr 8, 2025

This version

0.1.1

Apr 8, 2025

0.1.0

Apr 8, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

silk_scraper-0.1.1.tar.gz (53.1 kB view details)

Uploaded Apr 8, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

silk_scraper-0.1.1-py3-none-any.whl (56.6 kB view details)

Uploaded Apr 8, 2025 Python 3

File details

Details for the file silk_scraper-0.1.1.tar.gz.

File metadata

Download URL: silk_scraper-0.1.1.tar.gz
Upload date: Apr 8, 2025
Size: 53.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for silk_scraper-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`5cd48803fe4a4f71a886002bfb3a0325daa7b63bfb7eb43182c88496f242812d`
MD5	`7cee475b08be3c75d078725a6fe1a763`
BLAKE2b-256	`d969a25ca9667570c996b240108030f655e9fadbf6b5718a9321cf36d32c9eb0`

See more details on using hashes here.

File details

Details for the file silk_scraper-0.1.1-py3-none-any.whl.

File metadata

Download URL: silk_scraper-0.1.1-py3-none-any.whl
Upload date: Apr 8, 2025
Size: 56.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for silk_scraper-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6a9c583c6ae881aaa645fbe6616e955bd40dc6ebbd8b63f6f406ce6beedd932f`
MD5	`ea0b42dcdd2c28202e1dffa26f414669`
BLAKE2b-256	`ab417d86dda470ea85e2b0ad649fbd9ea0a93c14b1ef6a5786229e7adb7df0c0`

See more details on using hashes here.

silk-scraper 0.1.1

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

Silk

Why Silk?

Features

Installation

Quick Start

Basic Example

Configuring the Browser

Creating Custom Actions

Core Concepts

Actions

ActionContext

Result Type

Composition Operators

Detailed Examples

Handling Complex Selectors

Resilient Scraping with Retry and Fallbacks

Parallel Extraction

Form Filling and Submission

Handling Dynamic Content

Action Decorator for Custom Functions

Composable Operations

Sequential Operations (>>)

Parallel Operations (&)

Fallback Operations (|)

API Reference

Core Modules

Common Action Classes

Best Practices

Error Handling

Browser Resources

Selector Resilience

Action Composition

Logging

Contributing

Development Setup

Guidelines

Acknowledgements

Roadmap

FAQ

How does Silk compare to other scraping libraries?

Can I use Silk with my existing Playwright/Selenium code?

Is Silk suitable for large-scale scraping?

How can I handle authentication in Silk?

License

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Sequential Operations (`>>`)

Parallel Operations (`&`)

Fallback Operations (`|`)