Manage Screaming Frog configs programmatically

These details have not been verified by PyPI

Project links

Project description

sf-config-builder

Manage Screaming Frog .seospiderconfig files programmatically.

Installation

pip install sf-config-builder

Requirements

Screaming Frog SEO Spider must be installed (provides JARs for deserialization)
Python 3.8+

Quick Start

from sfconfig import SFConfig

# Load existing config (auto-detects SF installation)
config = SFConfig.load("base.seospiderconfig")

# Or specify custom SF path
config = SFConfig.load("base.seospiderconfig", sf_path="D:/Apps/Screaming Frog SEO Spider")

# Configure for e-commerce audit
config.max_urls = 100000
config.rendering_mode = "JAVASCRIPT"

# Add custom extractions
config.add_extraction("Price", "//span[@class='price']")
config.add_extraction("SKU", "//span[@itemprop='sku']")
config.add_extraction("Stock", ".availability", selector_type="CSS")

# Add exclude patterns
config.add_exclude(r".*\.pdf$")
config.add_exclude(r".*/admin/.*")

# Save and run
config.save("client-audit.seospiderconfig")
config.run_crawl("https://example.com", output_folder="./results")

Features

Inspect Configs

config = SFConfig.load("my.seospiderconfig")

# Get specific field
max_urls = config.get("mCrawlConfig.mMaxUrls")

# List all fields
for field in config.fields():
    print(f"{field['path']}: {field['value']}")

# Filter by prefix
crawl_fields = config.fields(prefix="mCrawlConfig")

Modify Configs

# Direct field access
config.set("mCrawlConfig.mMaxUrls", 100000)

# Convenience properties
config.max_urls = 100000
config.max_depth = 10
config.rendering_mode = "JAVASCRIPT"  # STATIC | JAVASCRIPT
config.robots_mode = "IGNORE"         # RESPECT | IGNORE
config.crawl_delay = 0.5
config.user_agent = "MyBot/1.0"

Custom Extractions

# Add extraction rules
config.add_extraction(
    name="Price",
    selector="//span[@class='price']",
    selector_type="XPATH",      # XPATH | CSS | REGEX
    extract_mode="TEXT"         # TEXT | HTML_ELEMENT | INNER_HTML
)

# List extractions
for ext in config.extractions:
    print(f"{ext['name']}: {ext['selector']}")

# Remove by name
config.remove_extraction("Price")

# Clear all
config.clear_extractions()

Custom Searches

# Add custom search filters
config.add_custom_search(
    name="Filter 1",
    query=".*",
    mode="CONTAINS",
    data_type="REGEX",
    scope="HTML",
    case_sensitive=False
)

# Remove by name
config.remove_custom_search("Filter 1")

# Clear all
config.clear_custom_searches()

Custom JavaScript

# Add custom JavaScript extraction
config.add_custom_javascript(
    name="Extractor 1",
    javascript="return document.title;",
    script_type="EXTRACTION",
    timeout_secs=10,
    content_types="text/html"
)

# Remove by name
config.remove_custom_javascript("Extractor 1")

# Clear all
config.clear_custom_javascript()

Exclude/Include Patterns

# Excludes (URLs matching these patterns are skipped)
config.add_exclude(r".*\.pdf$")
config.add_exclude(r".*/admin/.*")

# Includes (only URLs matching these are crawled)
config.add_include(r".*/products/.*")

# List patterns
print(config.excludes)
print(config.includes)

Compare Configs

from sfconfig import SFConfig

diff = SFConfig.diff("old.seospiderconfig", "new.seospiderconfig")

if diff.has_changes:
    print(f"Found {diff.change_count} differences:")
    print(diff)

# Filter by prefix
crawl_changes = diff.changes_for("mCrawlConfig")

Test Extractions

# Test selector against live URL before full crawl
result = config.test_extraction(
    url="https://example.com/product",
    selector="//span[@class='price']",
    selector_type="XPATH"
)

if result["match_count"] > 0:
    print(f"Found: {result['matches']}")
else:
    print("Selector didn't match - fix before crawling")

Run Crawls

# Blocking crawl
config.run_crawl(
    url="https://example.com",
    output_folder="./results",
    export_tabs=["Internal:All", "Response Codes:All"],
    export_format="csv",
    timeout=3600
)

# Async crawl
process = config.run_crawl_async(
    url="https://example.com",
    output_folder="./results"
)
# Do other work...
process.wait()  # Block until complete

Multi-Client Workflow

from sfconfig import SFConfig

clients = [
    {"domain": "client1.com", "max_urls": 50000},
    {"domain": "client2.com", "max_urls": 100000},
]

for client in clients:
    config = SFConfig.load("agency-base.seospiderconfig")
    config.max_urls = client["max_urls"]
    config.add_extraction("Price", "//span[@class='price']")

    config.save(f"/tmp/{client['domain']}.seospiderconfig")
    config.run_crawl(
        url=f"https://{client['domain']}",
        output_folder=f"./results/{client['domain']}"
    )

Error Handling

from sfconfig import (
    SFConfig,
    SFNotFoundError,
    SFValidationError,
    SFParseError,
    SFCrawlError
)

try:
    config = SFConfig.load("my.seospiderconfig")
    config.set("mInvalidField", 123)
    config.save()
except SFNotFoundError:
    print("Install Screaming Frog first")
except SFValidationError as e:
    print(f"Invalid field: {e}")
except SFParseError as e:
    print(f"Could not parse config: {e}")
except SFCrawlError as e:
    print(f"Crawl failed: {e}")

Environment Variables

Variable	Description
`SF_PATH`	Custom path to SF's JAR directory
`SF_CLI_PATH`	Custom path to SF CLI executable
`JAVA_HOME`	Custom Java installation path

Architecture

User Python code
       |
       v
+------------------+
|  sfconfig        |  (Python wrapper)
|  - SFConfig      |
|  - SFDiff        |
+--------+---------+
         | subprocess.run()
         v
+------------------+
|  ConfigBuilder   |  (Java CLI, bundled ~50KB)
|  .jar            |
+--------+---------+
         | classpath includes
         v
+------------------+
|  SF's JARs       |  (from user's local SF install, NOT bundled)
+------------------+

At runtime, the library builds a classpath combining:

ConfigBuilder.jar (bundled with this package)
{SF_INSTALL_PATH}/* (user's local Screaming Frog JARs)

This means:

Only our small JAR is distributed (no licensing issues)
SF's proprietary JARs are used from the user's existing installation
Compatibility is maintained across SF versions

Development

Building the Java CLI

The Java CLI lives in a separate repo (sf-config-builder). To build:

cd /path/to/sf-config-builder

# Compile against SF's JARs (as compile-time dependency)
javac -cp "C:/Program Files/Screaming Frog SEO Spider/*" \
      -d bin src/ConfigBuilder.java

# Package into JAR
cd bin
jar cfe ConfigBuilder.jar ConfigBuilder *.class

# Copy to Python package
cp ConfigBuilder.jar /path/to/sf-config-tool/sfconfig/java/

Important: Only bundle ConfigBuilder.jar. Do NOT bundle any JARs from SF's install directory - those are proprietary and already on the user's machine.

Installing for Development

cd sf-config-tool
pip install -e ".[dev]"
pytest tests/

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.6

Feb 21, 2026

0.1.5

Jan 23, 2026

This version

0.1.4

Jan 23, 2026

0.1.3

Jan 22, 2026

0.1.2

Jan 22, 2026

0.1.1

Jan 22, 2026

0.1.0

Jan 22, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sf_config_builder-0.1.4.tar.gz (48.1 kB view details)

Uploaded Jan 23, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sf_config_builder-0.1.4-py3-none-any.whl (43.3 kB view details)

Uploaded Jan 23, 2026 Python 3

File details

Details for the file sf_config_builder-0.1.4.tar.gz.

File metadata

Download URL: sf_config_builder-0.1.4.tar.gz
Upload date: Jan 23, 2026
Size: 48.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for sf_config_builder-0.1.4.tar.gz
Algorithm	Hash digest
SHA256	`578a79be10460762b328d9ce60336049b1847a0a9b2f1e199d4e30e2c82e71ea`
MD5	`f538669bbc80407667156ca8487f3d92`
BLAKE2b-256	`00b93767115543118b28f34e79297be6af48ebaf42decdf9e566d1ab92d85e51`

See more details on using hashes here.

File details

Details for the file sf_config_builder-0.1.4-py3-none-any.whl.

File metadata

Download URL: sf_config_builder-0.1.4-py3-none-any.whl
Upload date: Jan 23, 2026
Size: 43.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for sf_config_builder-0.1.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`47b70cb69f1361b96b10fed7057a0ffb465fba2303160fc50c8953552d202928`
MD5	`0c1e2b1afe54f075bea0ae6e4992f5b2`
BLAKE2b-256	`2616ac958c05d724d149343c59eaab0d824b2b4f6ccfbaedb88e82460924bb4d`

See more details on using hashes here.

sf-config-builder 0.1.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

sf-config-builder

Installation

Requirements

Quick Start

Features

Inspect Configs

Modify Configs

Custom Extractions

Custom Searches

Custom JavaScript

Exclude/Include Patterns

Compare Configs

Test Extractions

Run Crawls

Multi-Client Workflow

Error Handling

Environment Variables

Architecture

Development

Building the Java CLI

Installing for Development

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes