A flexible browser automation library with support for multiple drivers
Project description
Silk
Silk is a functional web scraping framework for Python that reimagines how web automation should work. Built around composable "Actions" and the Expression library, Silk enables you to write elegant, maintainable, and resilient web scrapers with true functional programming patterns.
Unlike traditional scraping libraries, Silk embraces Railway-Oriented Programming for robust error handling, uses immutable data structures for predictability, and provides an expressive, composable API that makes even complex scraping workflows readable and maintainable.
Why Silk?
Traditional web scraping approaches in Python often lead to complex, brittle code that's difficult to maintain. Silk solves these common challenges:
- No More Callback Hell: Replace nested try/except blocks with elegant Railway-Oriented Programming
- Resilient Scraping: Built-in retry mechanisms, fallback selectors, and error recovery
- Composable Actions: Chain operations with intuitive operators (
>>,&,|) for cleaner code - Type-Safe: Full typing support with Mypy and Pydantic for fewer runtime errors
- Browser Agnostic: Same API for Playwright, Selenium, or any other browser automation tool
- Parallelization Made Easy: Run operations concurrently with the
&operator
Whether you're building a small data collection script or a large-scale scraping system, Silk's functional approach scales with your needs while keeping your codebase clean and maintainable.
Features
- Purely Functional Design: Built on Expression library for robust functional programming in Python
- Immutable Data Structures: Uses immutable collections for thread-safety and predictability
- Railway-Oriented Programming: Elegant error handling with Result types
- Functional & Composable API: Build pipelines with intuitive operators (
>>,&,|) - Browser Abstraction: Works with Playwright, Selenium, or any other browser automation tool
- Resilient Selectors: Fallback mechanisms to handle changing website structures
- Type Safety: Leverages Pydantic, Mypy and Python's type hints for static type checking
- Parallel Execution: Easy concurrent scraping with functional composition
Installation
You can install Silk with your preferred browser driver:
# Base installation (no drivers)
pip install silk-scraper
# With Playwright support
pip install silk-scraper[playwright]
# With Selenium support
pip install silk-scraper[selenium]
# With Puppeteer support
pip install silk-scraper[puppeteer]
# With all drivers
pip install silk-scraper[all]
Quick Start
Basic Example
Here's a minimal example to get you started with Silk:
import asyncio
from silk.actions.navigation import Navigate
from silk.actions.extraction import GetText
from silk.browsers.manager import BrowserManager
async def main():
# Create a browser manager (defaults to Playwright)
async with BrowserManager() as manager:
# Define a simple scraping pipeline
pipeline = (
Navigate("https://example.com")
>> GetText("h1")
)
# Execute the pipeline
result = await pipeline(manager)
if result.is_ok():
print(f"Page title: {result.default_value(None)}")
else:
print(f"Error: {result.error}")
if __name__ == "__main__":
asyncio.run(main())
Configuring the Browser
Silk supports different browser drivers. You can configure them like this:
from silk.models.browser import BrowserOptions
from silk.browsers.manager import BrowserManager
# Configure browser options
options = BrowserOptions(
headless=False, # Set to False to see the browser UI
browser_name="chromium", # Choose "chromium", "firefox", or "webkit"
slow_mo=50, # Slow down operations by 50ms (useful for debugging)
viewport={"width": 1280, "height": 800}
)
# Create a manager with specific driver and options
manager = BrowserManager(driver_type="playwright", default_options=options)
Creating Custom Actions
You can easily create your own actions for reusable scraping logic:
from silk.actions.base import Action
from silk.actions.decorators import action
from expression.core import Ok, Error
from silk.models.browser import ActionContext
@action()
async def extract_price(context, selector):
"""Extract and parse a price from the page"""
page_result = await context.get_page()
if page_result.is_error():
return page_result
page = page_result.default_value(None)
if page is None:
return Error("No page found")
element_result = await page.query_selector(selector)
if element_result.is_error():
return Error(f"Element not found: {selector}")
element = element_result.default_value(None)
if element is None:
return Error("No element found")
text_result = await element.get_text()
if text_result.is_error():
return text_result
text = text_result.default_value(None)
if text is None:
return Error("No text found")
try:
# Remove currency symbol and convert to float
price = float(text.replace('$', '').strip())
return Ok(price)
except ValueError:
return Error(f"Failed to parse price from: {text}")
Core Concepts
Actions
The fundamental building block in Silk is the Action. An Action represents a pure operation that can be composed with other actions using functional programming patterns. Each Action takes an ActionContext and returns a Result containing either the operation's result or an error.
class FindElement(Action[ElementHandle]):
"""Action to find an element on the page"""
def __init__(self, selector: str):
self.selector = selector
async def execute(self, context: ActionContext) -> Result[ElementHandle, Exception]:
try:
page_result = await context.get_page()
if page_result.is_error():
return page_result
page = page_result.default_value(None)
if page is None:
return Error("No page found")
return await page.query_selector(self.selector)
except Exception as e:
return Error(e)
ActionContext
The ActionContext carries references to the browser, page, and other execution context information. Actions use this context to interact with the browser.
Result Type
Silk uses the Result[T, E] type from the Expression library for error handling. Rather than relying on exceptions, actions return Ok(value) for success or Error(exception) for failures.
Composition Operators
Silk provides powerful operators for composing actions:
>>(then): Chain actions sequentially&(and): Run actions in parallel|(or): Try one action, fall back to another if it fails
These operators make it easy to build complex scraping workflows with clear, readable code.
Detailed Examples
Handling Complex Selectors
Silk provides robust ways to handle changing website structures with selector groups. Selector groups are a collection of selectors that are tried in order until one succeeds.
from silk.selectors.selector import SelectorGroup, css, xpath
# Create a selector group with fallback options
product_price = SelectorGroup(
"product_price",
css(".current-price"), # Try this first
css(".product-price .amount"), # Fall back to this
xpath("//div[contains(@class, 'price')]//span") # Last resort
)
# Use it in an extraction action
extract_price = GetText(product_price)
Resilient Scraping with Retry and Fallbacks
from silk.actions.flow import retry, fallback
from silk.actions.extraction import GetText
from silk.actions.navigation import Navigate
# Retry navigation up to 3 times with 2s delay
resilient_navigation = retry(
Navigate("https://example.com"),
max_attempts=3,
delay_ms=2000
)
# Try multiple selectors for extracting data
extract_title = fallback(
GetText(".main-title"),
GetText("h1.title"),
GetText("#product-name")
)
# Combine into a pipeline
pipeline = resilient_navigation >> extract_title
Parallel Extraction
Extract multiple pieces of information at once:
from silk.actions.composition import parallel
from silk.actions.extraction import GetText, GetAttribute
# Extract product details in parallel
product_details = parallel(
GetText(".product-name"),
GetText(".product-price"),
GetAttribute(".product-image", "src"),
GetText(".product-description")
)
# Use in a pipeline
pipeline = Navigate(product_url) >> product_details
# Results come back as a collection
result = await pipeline(manager)
if result.is_ok():
product_details = result.default_value(None)
if product_details is None:
print("No product details found")
else:
[name, price, image_url, description] = product_details
print(f"Product: {name}, Price: {price}")
Form Filling and Submission
from silk.actions.input import Fill, Click
from silk.actions.flow import compose
login_action = compose(
Navigate("https://example.com/login"),
Fill("#username", "user@example.com"),
Fill("#password", "password123"),
Click("button[type='submit']")
)
Handling Dynamic Content
from silk.actions.flow import wait, loop_until
from silk.actions.conditions import ElementExists
# Wait for dynamic content to load
wait_for_results = wait(1000) >> ElementExists(".search-results-item")
# Loop until a condition is met
load_all_results = loop_until(
condition=ElementExists(".no-more-results"),
body=Click(".load-more-button"),
max_iterations=10,
delay_ms=1000
)
# Use in a pipeline
search_pipeline = (
Navigate("https://example.com/search?q=example")
>> wait_for_results
>> load_all_results
>> GetText(".search-results-count")
)
Action Decorator for Custom Functions
Easily convert any function into a composable Action using the @action decorator:
from silk import action, Ok, Error
@action
async def scroll_to_element(driver, selector, smooth=True):
"""Scrolls the page to bring the element into view"""
try:
element = await driver.query_selector(selector)
await element.scroll_into_view({"behavior": "smooth" if smooth else "auto"})
return "Element scrolled into view"
except Exception as e:
raise e
# Use it in a pipeline - the function is now a composable Action!
pipeline = (
Navigate(url)
>> scroll_to_element("#my-element")
>> extract_text("#my-element")
)
result = await pipeline(browser)
if result.is_ok():
print(f"Extracted text after scrolling: {result.default_value(None)}")
Composable Operations
Silk provides intuitive operators for composable scraping:
Sequential Operations (>>)
# Navigate to a page, then extract the title
Navigate(url) >> Click(title_selector)
Parallel Operations (&)
# Extract name, price, and description in parallel
# Each action is executed in a new context when using the & operator
Navigate(url) & Navigate(url2) & Navigate(url3)
# Combining parallel and sequential operations
# Each parallel branch can contain its own chain of sequential actions
(
# First website: Get product details
(Navigate("https://site1.com/product")
>> Wait(1000)
>> GetText(".product-name"))
&
# Second website: Search and extract first result
(Navigate("https://site2.com")
>> Fill("#search-input", "smartphone")
>> Click("#search-button")
>> Wait(2000)
>> GetText(".first-result .name"))
&
# Third website: Login and get account info
(Navigate("https://site3.com/login")
>> Fill("#username", "user@example.com")
>> Fill("#password", "password123")
>> Click(".login-button")
>> Wait(1500)
>> GetText(".account-info"))
)
# Results are collected as a Block of 3 items, one from each parallel branch
Fallback Operations (|)
# Try to extract with one selector, fall back to another if it fails
GetText(primary_selector) | GetText(fallback_selector)
Fallback operations are powerful tools for building resilient scraping pipelines. They allow you to try multiple scraping strategies and return the first successful result. in combination with SelectorGroups, you can create very robust scraping pipelines.
from silk.actions.navigation import Navigate
from silk.actions.extraction import GetText, GetAttribute, QueryAll, ExtractTable
from silk.actions.input import Click
from silk.actions.flow import wait, retry, fallback
from silk.selectors.selector import SelectorGroup, css, xpath
# Example: Advanced product information scraping with multiple strategies
async def scrape_product(url, manager):
# Strategy 1: Direct extraction using primary selectors
primary_strategy = (
Navigate(url)
>> GetText(".product-title")
)
# Strategy 2: Click on a tab first, then extract from revealed content
secondary_strategy = (
Navigate(url)
>> Click(".details-tab")
>> wait(500) # Wait for tab content to load
>> GetText(".tab-content h1")
)
# Strategy 3: Extract from structured JSON data in script tag
json_strategy = (
Navigate(url)
>> GetAttribute('script[type="application/ld+json"]', "textContent")
# Additional processing would parse the JSON and extract title
)
# Combine all strategies with fallback operator
product_title_pipeline = (
primary_strategy | secondary_strategy | json_strategy
)
# Multiple fallback approaches for price extraction
price_pipeline = (
# Try special sale price first
(Navigate(url) >> GetText(".special-price .price-amount"))
|
# Then try regular price
(Navigate(url) >> GetText(".regular-price"))
|
# Then try to extract from a pricing table
(Navigate(url)
>> ExtractTable("#pricing-table")
# Additional processing would extract price from table data
)
|
# Last resort: Try to find price in any element containing "$"
(Navigate(url)
>> QueryAll("*:contains('$')")
# Additional processing would filter and extract price
)
)
# Execute both pipelines
title_result = await product_title_pipeline(manager)
price_result = await price_pipeline(manager)
return {
"title": title_result.default_value("Unknown Title"),
"price": price_result.default_value("Price Unavailable")
}
# Example with SelectorGroups for even more resilience
def build_robust_product_scraper(url):
# Create selector groups with multiple options
title_selectors = SelectorGroup(
"product_title",
css(".product-title"),
css("h1.title"),
xpath("//div[@class='product-info']//h1"),
css(".pdp-title")
)
price_selectors = SelectorGroup(
"product_price",
css(".special-price .amount"),
css(".product-price"),
xpath("//span[contains(@class, 'price')]"),
css(".price-info .price")
)
image_selectors = SelectorGroup(
"product_image",
css(".product-image-gallery img"),
css(".main-image"),
xpath("//div[contains(@class, 'gallery')]//img")
)
# Use these groups in a pipeline with retries
return (
Navigate(url)
>> retry(GetText(title_selectors), max_attempts=3, delay_ms=1000)
>> retry(GetText(price_selectors), max_attempts=3, delay_ms=1000)
>> retry(GetAttribute(image_selectors, "src"), max_attempts=3, delay_ms=1000)
)
API Reference
Core Modules
-
silk.actions: Core action classes for browser automationsilk.actions.base: Base Action class and core utilitiessilk.actions.navigation: Actions for navigating between pagessilk.actions.extraction: Actions for extracting data from pagessilk.actions.input: Actions for interacting with forms and elementssilk.actions.flow: Control flow actions like branch, retry, and loopsilk.actions.composition: Utilities for composing actions (sequence, parallel, pipe)silk.actions.decorators: Decorators like @action for creating custom actions
-
silk.browsers: Browser management and abstraction layersilk.browsers.manager: BrowserManager for session handlingsilk.browsers.driver: Abstract BrowserDriver interfacesilk.browsers.element: ElementHandle for working with DOM elements
-
silk.selectors: Selector utilitiessilk.selectors.selector: Selector and SelectorGroup classes
-
silk.models: Data models using Pydanticsilk.models.browser: BrowserOptions, ActionContext, etc.
Common Action Classes
-
Navigation
Navigate(url): Navigate to a URLReload(): Reload the current pageGoBack(): Navigate back in historyGoForward(): Navigate forward in history
-
Extraction
Query(selector): Find an elementQueryAll(selector): Find all matching elementsGetText(selector): Extract text from an elementGetAttribute(selector, attribute): Get an attribute valueGetHtml(selector, outer=True): Get element HTMLExtractTable(table_selector): Extract data from an HTML table
-
Input
Click(target): Click an elementDoubleClick(target): Double-click an elementFill(target, text): Fill a form fieldType(target, text): Type text (alias for Fill)Select(target, value/text): Select an option from a dropdownMouseMove(target): Move the mouse to an elementKeyPress(key, modifiers): Press a key or key combination
-
Flow Control
branch(condition, if_true, if_false): Conditional branchingloop_until(condition, body, max_iterations): Loop until condition is metretry(action, max_attempts, delay_ms): Retry an action on failureretry_with_backoff(action): Retry with exponential backoffwith_timeout(action, timeout_ms): Apply a timeout to an action
-
Composition
sequence(*actions): Run actions in sequence, collect all resultsparallel(*actions): Run actions in parallel, collect all resultspipe(*actions): Create a pipeline where each action uses the previous resultfallback(*actions): Try actions in sequence until one succeedscompose(*actions): Compose actions sequentially, return only the last result
For a complete API reference, please see the API documentation.
Best Practices
Error Handling
Silk uses Railway-Oriented Programming for error handling. Instead of using try/except, leverage the Result type:
result = await pipeline(manager)
if result.is_ok():
data = result.default_value(None)
# Process the data
else:
# Handle the error
error = result.error
logger.error(f"Scraping failed: {error}")
Browser Resources
Always use context managers to ensure browser resources are properly cleaned up:
async with BrowserManager() as manager:
# Your scraping code here
pass # Resources automatically cleaned up
Selector Resilience
Use selector groups for resilient scraping that can handle UI changes:
# Instead of a single brittle selector:
extract_price = GetText(".price-box .price")
# Use a group with fallbacks:
price_selector = SelectorGroup(
"price",
css(".price-box .price"),
css(".product-price"),
xpath("//span[contains(@class, 'price')]")
)
extract_price = GetText(price_selector)
Action Composition
Build reusable pipelines through composition instead of large monolithic functions:
# Define reusable components
navigate_to_product = Navigate("https://example.com/product")
extract_product_info = parallel(
GetText(".product-name"),
GetText(".product-price"),
GetText(".product-description")
)
extract_related_products = QueryAll(".related-product") >> extract_text_from_elements
# Compose them in different ways
full_scraper = navigate_to_product >> extract_product_info >> extract_related_products
minimal_scraper = navigate_to_product >> extract_product_info
Logging
Enable logging to better debug your scraping pipelines:
import logging
logging.basicConfig(level=logging.DEBUG)
logging.getLogger("silk").setLevel(logging.DEBUG)
Contributing
Contributions to Silk are welcome! Please feel free to submit a Pull Request.
Development Setup
-
Clone the repository
git clone https://github.com/galaddirie/silk.git cd silk
-
Install Poetry (if not already installed)
curl -sSL https://install.python-poetry.org | python3 -
-
Install dependencies
poetry install --all-extras
-
Activate the virtual environment
poetry shell -
Run tests
poetry run pytest
Guidelines
- Follow PEP 8 and use Black for code formatting
- Write tests for new features
- Keep the functional programming paradigm in mind
- Update documentation with new features
Acknowledgements
Silk builds upon several excellent libraries:
- Expression for functional programming patterns in Python
- Playwright and Selenium for browser automation
- Pydantic for data validation and settings management
Roadmap
- Initial release with Playwright support
- Improve parallel execution
- Support multiple actions in parallel in the same context/page eg. (GetText & GetAttribute & GetHtml) in an ergonomic way
- Selenium integration
- Puppeteer integration
- Add examples
- Support Mapped tasks similar to airflow tasks eg. (QueryAll >> GetText[]) where get text is applied to each element in the collection
- Add proxy options
- Explore stealth options for browser automation ( implement patchwright, no-driver, driverless, etc.)
- add dependency review
- Support for task dependencies
- action signature validation
- Data extraction DSL for declarative scraping
- Support computer using agentds (browser-use, openai cua, claude computer-use)
- Enhanced caching mechanisms
- Distributed scraping support
- Rate limiting and polite scraping utilities
- Integration with popular data processing libraries (Pandas, etc.)
- CLI tool for quick scraping tasks
FAQ
How does Silk compare to other scraping libraries?
Silk differs from traditional scraping libraries like Scrapy, Beautiful Soup, or plain Selenium/Playwright in its functional approach. While these tools focus on imperative code with callbacks and exceptions, Silk embraces functional composition, immutable data structures, and Railway-Oriented Programming for cleaner, more maintainable code.
Can I use Silk with my existing Playwright/Selenium code?
Yes, Silk is designed to work alongside existing browser automation code. You can gradually adopt Silk's patterns while keeping your existing code.
Is Silk suitable for large-scale scraping?
Absolutely. Silk's composable nature makes it excellent for large-scale scraping projects. Its built-in error handling, retries, and parallel execution capabilities are particularly valuable for robust production systems.
How can I handle authentication in Silk?
You can handle authentication like any other browser interaction:
login_action = compose(
Navigate("https://example.com/login"),
Fill("#username", "user@example.com"),
Fill("#password", "password123"),
Click("button[type='submit']"),
wait(1000) # Wait for login to complete
)
# Then use the authenticated context for further actions
pipeline = login_action >> Navigate("https://example.com/protected-content") >> GetText("#protected-data")
You can also save and reuse authentication state with browser context options.
License
Silk is released under the MIT License. See the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file silk_scraper-0.1.1.tar.gz.
File metadata
- Download URL: silk_scraper-0.1.1.tar.gz
- Upload date:
- Size: 53.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5cd48803fe4a4f71a886002bfb3a0325daa7b63bfb7eb43182c88496f242812d
|
|
| MD5 |
7cee475b08be3c75d078725a6fe1a763
|
|
| BLAKE2b-256 |
d969a25ca9667570c996b240108030f655e9fadbf6b5718a9321cf36d32c9eb0
|
File details
Details for the file silk_scraper-0.1.1-py3-none-any.whl.
File metadata
- Download URL: silk_scraper-0.1.1-py3-none-any.whl
- Upload date:
- Size: 56.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6a9c583c6ae881aaa645fbe6616e955bd40dc6ebbd8b63f6f406ce6beedd932f
|
|
| MD5 |
ea0b42dcdd2c28202e1dffa26f414669
|
|
| BLAKE2b-256 |
ab417d86dda470ea85e2b0ad649fbd9ea0a93c14b1ef6a5786229e7adb7df0c0
|