Fast Python web crawler for AI & RAG ingestion โ crawl, extract, and embed website content with one tool.
Project description
MarkCrawl by iD8 ๐ท๏ธ๐
Turn any webpage or website into clean Markdown for LLM pipelines โ in one command.
pip install markcrawl
markcrawl --base https://docs.example.com --out ./output --show-progress
MarkCrawl is a crawl-and-structure engine. It fetches one page or crawls an entire website, strips navigation/scripts/boilerplate, and writes clean Markdown files with a structured JSONL index. Every page includes a citation with the access date. No API keys needed.
Everything else โ LLM extraction, Supabase upload, MCP server, LangChain tools โ is optional and installed separately.
Want a hosted API instead of running locally? Join the waitlist โ we're gauging interest.
LLM agents: Load docs/LLM_PROMPT.md as a system prompt to generate correct MarkCrawl commands automatically.
Quickstart (2 minutes)
pip install markcrawl
markcrawl --base https://quotes.toscrape.com --out ./demo --max-pages 5 --show-progress
Your ./demo folder now contains:
demo/
โโโ index__a4f3b2c1d0.md โ clean Markdown of the page
โโโ page-2__b7e2d1f0a3.md
โโโ ...
โโโ pages.jsonl โ structured index (one JSON line per page)
Each line in pages.jsonl:
{
"url": "https://quotes.toscrape.com/",
"title": "Quotes to Scrape",
"crawled_at": "2026-04-04T12:30:00Z",
"citation": "Quotes to Scrape. quotes.toscrape.com. Available at: https://quotes.toscrape.com/ [Accessed April 04, 2026].",
"tool": "markcrawl",
"text": "# Quotes to Scrape\n\n> "The world as we have created it is a process of our thinking..." โ Albert Einstein\n\nTags: change, deep-thoughts, thinking, world..."
}
Common Recipes
Scrape a single page:
markcrawl --base https://example.com/pricing --no-sitemap --max-pages 1
Scrape a single JS-rendered page (React, Vue, YouTube, etc.):
markcrawl --base "https://www.youtube.com/@channel/videos" \
--no-sitemap --max-pages 1 --render-js
# โ outputs one .md file with video titles, view counts, and dates
For infinite-scroll pages like YouTube, this captures the first ~28 videos from the initial render.
Crawl a docs site:
markcrawl --base https://docs.example.com --max-pages 500 --concurrency 5 --show-progress
Crawl a subsection without sitemap wandering:
Large sites (YouTube, GitHub, etc.) have sitemaps with thousands of unrelated pages.
Use --no-sitemap to crawl only from your target URL:
markcrawl --base https://docs.example.com/guides \
--no-sitemap --max-pages 50 --show-progress
Competitive analysis (crawl 3 competitors, extract pricing):
markcrawl --base https://competitor-one.com/pricing --no-sitemap --max-pages 1 --out ./comp1
markcrawl --base https://competitor-two.com/pricing --no-sitemap --max-pages 1 --out ./comp2
markcrawl --base https://competitor-three.com/pricing --no-sitemap --max-pages 1 --out ./comp3
markcrawl-extract \
--jsonl ./comp1/pages.jsonl ./comp2/pages.jsonl ./comp3/pages.jsonl \
--fields pricing_tiers features free_trial --show-progress
# โ extracted.jsonl with structured pricing data across all three
Docs site โ RAG chatbot (full pipeline: crawl, embed, query):
markcrawl --base https://docs.example.com --out ./docs --max-pages 500 --concurrency 5 --show-progress
markcrawl-upload --jsonl ./docs/pages.jsonl --show-progress
# โ pages are chunked, embedded, and uploaded to Supabase/pgvector
# Wire your chatbot to query the vector table โ see docs/SUPABASE.md
API docs โ code generation prompt:
markcrawl --base https://api.example.com/docs --out ./api-docs --max-pages 200 --show-progress
# Feed the output to an LLM:
# "Using the API documentation in ./api-docs/pages.jsonl, generate a
# typed Python client with methods for each endpoint."
Back up a blog before it shuts down:
markcrawl --base https://engineering.example.com/blog \
--no-sitemap --max-pages 1000 --concurrency 5 --out ./blog-archive --show-progress
# โ every post saved as clean Markdown with citations and access dates
Skip junk pages (job listings, login walls, SEO spam):
markcrawl --base https://example.com \
--exclude-path "/job/*" --exclude-path "/careers/*" --exclude-path "/login" \
--max-pages 500 --out ./output --show-progress
Preview URLs before committing to a long crawl:
markcrawl --base https://example.com --dry-run
# โ prints every URL that would be crawled (from sitemap), then exits
# Pipe to wc -l to get a count, or grep to check for junk patterns
markcrawl --base https://example.com --dry-run | wc -l
markcrawl --base https://example.com --dry-run | grep "/job/"
Only crawl specific sections (blog + pricing, ignore everything else):
markcrawl --base https://example.com \
--include-path "/blog/*" --include-path "/pricing" \
--max-pages 200 --out ./output --show-progress
Safe crawl of a job board (dry-run + exclude):
# Step 1: see what you'd get
markcrawl --base https://tealhq.com --dry-run | head -50
# Step 2: exclude the job listings, crawl just the content pages
markcrawl --base https://tealhq.com \
--exclude-path "/job/*" --exclude-path "/resume-examples/*" \
--max-pages 200 --out ./tealhq --show-progress
Choose an extraction backend:
# Default (BS4 + markdownify) โ fastest, good for most sites
markcrawl --base https://docs.example.com --out ./output --show-progress
# Ensemble โ runs default + trafilatura, picks best per page
markcrawl --base https://docs.example.com --out ./output --extractor ensemble --show-progress
# ReaderLM-v2 โ ML-based extraction (requires: pip install markcrawl[ml])
markcrawl --base https://docs.example.com --out ./output --extractor readerlm --show-progress
Skip pages you've already crawled (cross-crawl dedup):
# First crawl
markcrawl --base https://docs.example.com --out ./docs --show-progress
# Later โ only fetches new/changed pages
markcrawl --base https://docs.example.com --out ./docs --cross-dedup --show-progress
Crawl high-value pages first (link prioritization):
markcrawl --base https://docs.example.com --out ./docs \
--prioritize-links --max-pages 100 --show-progress
# Prioritizes content-rich pages (guides, docs) over low-value ones (legal, login)
Smart-sample a large site (e-commerce, job boards, real estate):
# Preview the pattern clusters first
markcrawl --base https://bigsite.com --dry-run --smart-sample --show-progress
# Crawl with sampling โ 5 pages per templated cluster instead of thousands
markcrawl --base https://bigsite.com --out ./bigsite \
--smart-sample --sample-size 5 --sample-threshold 20 --show-progress
Download images alongside content (photography blogs, product pages):
# Crawl a photography blog and save images from the content area
markcrawl --base https://photography-blog.example.com --out ./photos \
--download-images --max-pages 50 --show-progress
# Output:
# ./photos/assets/mountain-abc123.jpg
# ./photos/assets/sunset-def456.png
# ./photos/post-1__a1b2c3.md โ Markdown with  refs
# ./photos/pages.jsonl โ index includes "images" array per page
# Adjust minimum image size to skip thumbnails (default: 5000 bytes)
markcrawl --base https://example.com/gallery --out ./gallery \
--download-images --min-image-size 20000 --show-progress
Resume an interrupted crawl:
markcrawl --base https://docs.example.com --out ./docs --resume --show-progress
How it compares to other crawlers
Different tools make different tradeoffs. This table summarizes the main differences:
| MarkCrawl | FireCrawl | Crawl4AI | Scrapy | |
|---|---|---|---|---|
| License | MIT | AGPL-3.0 | Apache-2.0 | BSD-3 |
| Install | pip install markcrawl |
SaaS or self-host | pip + Playwright | pip + framework |
| Output | Markdown + JSONL | Markdown + JSON | Markdown | Custom pipelines |
| JS rendering | Optional (--render-js) |
Built-in | Built-in | Plugin |
| LLM extraction | Optional add-on | Via API | Built-in | None |
| Best for | Single-site crawl โ Markdown | Hosted scraping API | AI-native crawling | Large-scale distributed |
Each tool has strengths: FireCrawl excels as a hosted API, Crawl4AI has deep browser automation, and Scrapy handles massive distributed workloads. MarkCrawl focuses on simple local crawls that produce LLM-ready Markdown.
Benchmark results (7 tools, April 2026 โ v2 methodology)
Speed: markcrawl is fastest (12.1 pages/sec), scrapy+md second (9.5). Playwright-based tools (crawlee, playwright, crawl4ai) average 1.5โ2.2 pages/sec.
Content signal: markcrawl leads at 99% (ratio of answer-bearing tokens to total output) โ almost no navigation, footer, or boilerplate makes it into your embeddings.
RAG quality: markcrawl scores 4.52/5 on LLM-judged answer quality (tied #2, leader at 4.53 within noise) and 0.698 MRR (3rd, leader crawlee at 0.733) โ with 2.1x fewer chunks than crawlee, keeping embedding costs low.
| Tool | Speed (p/s) | Content Signal | MRR | Answer (/5) | Annual cost (100K pages) |
|---|---|---|---|---|---|
| markcrawl | 12.1 | 99% | 0.698 | 4.52 | $4,505 |
| scrapy+md | 9.5 | 93% | 0.459 | 4.03 | $5,464 |
| colly+md | 4.2 | 67% | 0.677 | 4.53 | $7,213 |
| playwright | 2.2 | 64% | 0.727 | 4.42 | $7,320 |
| crawlee | 1.7 | 63% | 0.733 | 4.52 | $7,467 |
| crawl4ai | 1.5 | 83% | 0.694 | 4.43 | $6,960 |
Full benchmark data: docs/BENCHMARKS.md | Methodology: llm-crawler-benchmarks
RAG-optimized recipe (v0.6.0): With --i18n-filter --title-at-top and the opt-in chunker flags (auto_extract_title=True, prepend_first_paragraph=True, strip_markdown_links=True on chunk_markdown), markcrawl reaches 0.8148 MRR on the same 57-query benchmark โ a +0.18 jump over the default config and +0.08 over the next best tool (crawlee at 0.733).
Installation
The core crawler is the only thing you need. Everything else is optional.
pip install markcrawl # Core crawler (free, no API keys)
Optional add-ons:
pip install markcrawl[extract] # + LLM extraction (OpenAI, Claude, Gemini, Grok)
pip install markcrawl[js] # + JavaScript rendering (Playwright)
pip install markcrawl[upload] # + Supabase upload with embeddings
pip install markcrawl[ml] # + ReaderLM-v2 extraction backend
pip install markcrawl[mcp] # + MCP server for AI agents
pip install markcrawl[langchain] # + LangChain tool wrappers
pip install markcrawl[all] # Everything
For Playwright, also run playwright install chromium after installing.
Install from source (for development)
git clone https://github.com/AIMLPM/markcrawl.git
cd markcrawl
python -m venv .venv
source .venv/bin/activate
pip install -e ".[all]"
Crawling
markcrawl --base https://www.example.com --out ./output --show-progress
Add flags as needed:
markcrawl \
--base https://www.example.com \
--out ./output \
--include-subdomains \ # crawl sub.example.com too
--render-js \ # render JavaScript (React, Vue, etc.)
--concurrency 5 \ # fetch 5 pages in parallel
--proxy http://proxy:8080 \ # route through a proxy
--max-pages 200 \ # stop after 200 pages
--format markdown \ # or "text" for plain text
--show-progress
Resume an interrupted crawl:
markcrawl --base https://www.example.com --out ./output --resume --show-progress
Output
Each page becomes a .md file with a citation header:
# Getting Started
> URL: https://docs.example.com/getting-started
> Crawled: April 04, 2026
> Citation: Getting Started. docs.example.com. Available at: https://docs.example.com/getting-started [Accessed April 04, 2026].
Welcome to the platform. This guide walks you through installation...
Navigation, footer, cookie banners, and scripts are stripped. Only the main content remains.
All crawler CLI arguments
| Argument | Description |
|---|---|
--base |
Base site URL to crawl |
--out |
Output directory |
--format |
markdown or text (default: markdown) |
--show-progress |
Print progress and crawl events |
--render-js |
Render JavaScript with Playwright before extracting |
--concurrency |
Pages to fetch in parallel (default: 1) |
--proxy |
HTTP/HTTPS proxy URL |
--resume |
Resume from saved state |
--include-subdomains |
Include subdomains under the base domain |
--max-pages |
Max pages to save; 0 = unlimited (default: 500) |
--delay |
Minimum delay between requests in seconds (default: 0, adaptive throttle adjusts automatically) |
--timeout |
Per-request timeout in seconds (default: 15) |
--min-words |
Skip pages with fewer words (default: 20) |
--user-agent |
Override the default user agent |
--use-sitemap / --no-sitemap |
Enable/disable sitemap discovery. Use --no-sitemap when you want to scrape a specific page or subsection โ without it, large sites (YouTube, GitHub) may discover thousands of unrelated pages via their sitemap |
--exclude-path |
Glob pattern to exclude URL paths (e.g. '/job/*'). Can be repeated |
--include-path |
Glob pattern to include URL paths (e.g. '/blog/*'). Only matching paths are crawled. Can be repeated |
--dry-run |
Discover URLs (via sitemap/links) and print them without fetching content |
--smart-sample |
Auto-detect templated URL patterns and sample from large clusters instead of crawling every page |
--sample-size |
Pages to sample per templated cluster (default: 5, used with --smart-sample) |
--sample-threshold |
Clusters larger than this are sampled (default: 20, used with --smart-sample) |
--auto-resume |
Automatically resume if saved state exists, otherwise start fresh |
--cross-dedup |
Skip pages already seen in previous crawls to the same output directory |
--prioritize-links |
Score discovered links by predicted content yield โ crawl high-value pages first |
--extractor |
Content extraction backend: default, trafilatura, ensemble, or readerlm |
--download-images |
Download images from the content area to assets/ and use local paths in Markdown |
--min-image-size |
Minimum image file size in bytes to keep (default: 5000). Smaller images are skipped |
--i18n-filter |
Skip URLs under locale path segments (/fr/, /de-DE/, /zh-Hans/, ...) โ generic, no per-domain config |
--title-at-top |
Prepend # {title} to the text field of every JSONL row when not already present โ top-MRR RAG recipe |
Optional: structured extraction
If you need structured data (not just text), the extraction add-on uses an LLM to pull specific fields from each page.
pip install markcrawl[extract]
markcrawl-extract \
--jsonl ./output/pages.jsonl \
--fields company_name pricing features \
--show-progress
Auto-discover fields across multiple crawled sites:
markcrawl-extract \
--jsonl ./comp1/pages.jsonl ./comp2/pages.jsonl ./comp3/pages.jsonl \
--auto-fields \
--context "competitor pricing analysis" \
--show-progress
Supports OpenAI, Anthropic (Claude), Google Gemini, and xAI (Grok) via --provider.
Extraction details
Provider and model selection
markcrawl-extract --jsonl ... --fields pricing --provider openai # default
markcrawl-extract --jsonl ... --fields pricing --provider anthropic # Claude
markcrawl-extract --jsonl ... --fields pricing --provider gemini # Gemini
markcrawl-extract --jsonl ... --fields pricing --provider grok # Grok
markcrawl-extract --jsonl ... --fields pricing --model gpt-4o # override model
| Provider | API key env var | Default model |
|---|---|---|
| OpenAI | OPENAI_API_KEY |
gpt-4o-mini |
| Anthropic | ANTHROPIC_API_KEY |
claude-sonnet-4-20250514 |
| Google Gemini | GEMINI_API_KEY |
gemini-2.0-flash |
| xAI (Grok) | XAI_API_KEY |
grok-3-mini-fast |
All extraction CLI arguments
| Argument | Description |
|---|---|
--jsonl |
Path(s) to pages.jsonl โ pass multiple for cross-site analysis |
--fields |
Field names to extract (space-separated) |
--auto-fields |
Auto-discover fields by sampling pages |
--context |
Describe your goal for auto-discovery |
--sample-size |
Pages to sample for auto-discovery (default: 3) |
--provider |
openai, anthropic, gemini, or grok |
--model |
Override the default model |
--output |
Output path (default: extracted.jsonl) |
--delay |
Delay between LLM calls in seconds (default: 0.25) |
--show-progress |
Print progress |
Output format
Extracted rows include LLM attribution:
{
"url": "https://competitor.com/pricing",
"citation": "Pricing. competitor.com. Available at: ... [Accessed April 04, 2026].",
"pricing_tiers": "Starter ($29/mo), Pro ($99/mo), Enterprise (contact sales)",
"extracted_by": "gpt-4o-mini (openai)",
"extraction_note": "Field values were extracted by an LLM and may be interpreted, not verbatim."
}
Optional: Supabase vector search (RAG)
Chunk pages, generate embeddings, and upload to Supabase with pgvector:
pip install markcrawl[upload]
markcrawl --base https://docs.example.com --out ./output --show-progress
markcrawl-upload --jsonl ./output/pages.jsonl --show-progress
Requires SUPABASE_URL, SUPABASE_KEY, and OPENAI_API_KEY. See docs/SUPABASE.md for table setup, query examples, and recommendations.
Optional: agent integrations
MarkCrawl includes integrations for AI agents. Each is an optional add-on.
MCP Server (Claude Desktop, Cursor, Windsurf)
pip install markcrawl[mcp]
{
"mcpServers": {
"markcrawl": {
"command": "python",
"args": ["-m", "markcrawl.mcp_server"]
}
}
}
Tools: crawl_site, list_pages, read_page, search_pages, extract_data
LangChain Tool
pip install markcrawl[langchain]
from markcrawl.langchain import all_tools
from langchain_openai import ChatOpenAI
from langchain.agents import initialize_agent, AgentType
agent = initialize_agent(tools=all_tools, llm=ChatOpenAI(model="gpt-4o-mini"),
agent=AgentType.STRUCTURED_CHAT_ZERO_SHOT_REACT_DESCRIPTION)
agent.run("Crawl docs.example.com and summarize their auth guide")
OpenClaw Skill (WhatsApp, Telegram, Slack)
npx clawhub install markcrawl-skill
LLM assistant prompt
Copy the system prompt from docs/LLM_PROMPT.md into any LLM to get an assistant that generates correct MarkCrawl commands.
When NOT to use MarkCrawl
- Sites behind login/auth โ no cookie or session support
- Aggressive bot protection (Cloudflare, Akamai) โ no anti-bot evasion
- Millions of pages โ designed for hundreds to low thousands; use Scrapy for scale
- PDF content โ HTML only (PDF support is on the roadmap)
- JavaScript SPAs โ add
markcrawl[js]and use--render-jsfor React/Vue/Angular - Infinite-scroll pages โ
--render-jsrenders the initial page load but does not scroll; you'll get the first screenful of content (e.g., ~28 of 82 YouTube videos). For complete listings, combine with the platform's API or RSS feed (e.g., YouTube's/feeds/videos.xml?channel_id=...)
Architecture
MarkCrawl is a web crawler. The optional layers (extraction, upload, agents) are separate add-ons that work with the crawler's output.
CORE (free, no API keys) OPTIONAL ADD-ONS
โโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ 1. Discover URLs โ markcrawl[extract] โ LLM field extraction
โ (sitemap or links) โ markcrawl[upload] โ Supabase/pgvector RAG
โ 2. Fetch & clean HTML โ markcrawl[js] โ Playwright JS rendering
โ 3. Write Markdown + JSONLโ markcrawl[mcp] โ MCP server for agents
โ + auto-citation โ markcrawl[langchain] โ LangChain tools
โโโโโโโโโโโโโโโโโโโโโโโโโโโโ
For internals, see docs/ARCHITECTURE.md.
Extending MarkCrawl
from markcrawl import crawl
result = crawl("https://example.com", out_dir="./output")
print(f"Saved {result.pages_saved} pages")
# Process output in your own pipeline
import json
with open(result.index_file) as f:
for line in f:
page = json.loads(line)
your_db.insert(page) # Pinecone, Weaviate, Elasticsearch, etc.
# Use individual components
from markcrawl import chunk_text
from markcrawl.extract import LLMClient, extract_fields
See docs/ARCHITECTURE.md for the full module map and extensibility guide.
Cost
The core crawler is free. Two optional features have API costs:
| Feature | Cost | When |
|---|---|---|
| Structured extraction | ~$0.01-0.03 per page | markcrawl-extract |
| Supabase upload | ~$0.0001 per page | markcrawl-upload |
Setting up API keys
Only needed for extraction and upload. The core crawler requires no keys.
# .env โ in your working directory
OPENAI_API_KEY="sk-..." # extraction (--provider openai) + upload
ANTHROPIC_API_KEY="sk-ant-..." # extraction (--provider anthropic)
GEMINI_API_KEY="AI..." # extraction (--provider gemini)
XAI_API_KEY="xai-..." # extraction (--provider grok)
SUPABASE_URL="https://..." # upload
SUPABASE_KEY="eyJ..." # upload (service-role key)
source .env
Project structure
.
โโโ README.md
โโโ LICENSE
โโโ PRIVACY.md
โโโ SECURITY.md
โโโ CONTRIBUTING.md
โโโ CODE_OF_CONDUCT.md
โโโ Dockerfile
โโโ Makefile
โโโ glama.json
โโโ pyproject.toml
โโโ requirements.txt
โโโ .github/
โ โโโ pull_request_template.md
โ โโโ workflows/
โ โโโ ci.yml
โ โโโ publish.yml
โโโ docs/
โ โโโ ARCHITECTURE.md
โ โโโ LLM_PROMPT.md
โ โโโ MCP_SUBMISSION.md
โ โโโ RAG_RETRIEVAL_RESEARCH.md
โ โโโ SUPABASE.md
โโโ tests/
โ โโโ __init__.py
โ โโโ test_chunker.py
โ โโโ test_core.py
โ โโโ test_extract.py
โ โโโ test_upload.py
โโโ markcrawl/
โโโ __init__.py
โโโ cli.py
โโโ core.py # orchestrator
โโโ fetch.py # HTTP/Playwright fetching
โโโ robots.py # robots.txt parsing
โโโ throttle.py # adaptive rate limiting
โโโ state.py # crawl state & resume
โโโ urls.py # URL normalization & filtering
โโโ extract_content.py # HTML โ Markdown conversion
โโโ dedup.py # cross-crawl deduplication
โโโ link_scorer.py # link prioritization
โโโ chunker.py
โโโ exceptions.py
โโโ utils.py
โโโ extract.py # LLM field extraction
โโโ extract_cli.py
โโโ upload.py
โโโ upload_cli.py
โโโ langchain.py
โโโ mcp_server.py
Roadmap
- Canonical URL support
- PDF support
- Authenticated crawling
- Multi-provider embeddings
Shipped features
pip install markcrawlon PyPI- 200 automated tests + GitHub Actions CI (Python 3.10-3.13) + ruff linting
- Markdown and plain text output with auto-citation
- Sitemap-first crawling with robots.txt compliance
- Text chunking with configurable overlap + semantic chunking
- Supabase/pgvector upload for RAG
- JavaScript rendering via Playwright
- Concurrent fetching and proxy support
- Resume interrupted crawls + auto-resume
- LLM extraction (OpenAI, Claude, Gemini, Grok) with auto-field discovery
- MCP server, LangChain tools, OpenClaw skill
- Image alt text preservation
- Python API (
result.pages) - Page-type extraction and content-region heuristics
- Multiple extraction backends (default, trafilatura, ensemble, ReaderLM-v2)
- Cross-crawl deduplication (
--cross-dedup) - Link prioritization by predicted content yield (
--prioritize-links) - Smart sampling of templated URL clusters (
--smart-sample) - URL path filtering (
--include-path,--exclude-path) and dry-run preview
Contributing
See CONTRIBUTING.md. If you used an LLM to generate code, include the prompt in your PR.
Security
See SECURITY.md.
Privacy
MarkCrawl runs locally. No telemetry, no analytics, no data sent anywhere. See PRIVACY.md.
License
MIT. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file markcrawl-0.6.0.tar.gz.
File metadata
- Download URL: markcrawl-0.6.0.tar.gz
- Upload date:
- Size: 97.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b2113ed160c884a2a593f83ab969bbaf7ff76592da0277ce745b96d8ba1385da
|
|
| MD5 |
7089695819fa93ef4fea81352b25e756
|
|
| BLAKE2b-256 |
24274b31803e74dbb9a1e60054fce7cc1cbec7f66d7e3f7160894b21d97a5c4d
|
Provenance
The following attestation bundles were made for markcrawl-0.6.0.tar.gz:
Publisher:
publish.yml on AIMLPM/markcrawl
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
markcrawl-0.6.0.tar.gz -
Subject digest:
b2113ed160c884a2a593f83ab969bbaf7ff76592da0277ce745b96d8ba1385da - Sigstore transparency entry: 1338582361
- Sigstore integration time:
-
Permalink:
AIMLPM/markcrawl@83d082618691d24c78423ade27d485335053160f -
Branch / Tag:
refs/tags/v0.6.0 - Owner: https://github.com/AIMLPM
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@83d082618691d24c78423ade27d485335053160f -
Trigger Event:
release
-
Statement type:
File details
Details for the file markcrawl-0.6.0-py3-none-any.whl.
File metadata
- Download URL: markcrawl-0.6.0-py3-none-any.whl
- Upload date:
- Size: 77.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1dae011b19600d821aca5caf2da10ff5a6e28d89ed25ede762d8dff588067602
|
|
| MD5 |
2514cf4fc948b862c6e532cd10fd69c5
|
|
| BLAKE2b-256 |
85fc7268ecd92298aee0b7679cb7ab7b658e1bf7eaa7a13e90ca7aac3e36a908
|
Provenance
The following attestation bundles were made for markcrawl-0.6.0-py3-none-any.whl:
Publisher:
publish.yml on AIMLPM/markcrawl
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
markcrawl-0.6.0-py3-none-any.whl -
Subject digest:
1dae011b19600d821aca5caf2da10ff5a6e28d89ed25ede762d8dff588067602 - Sigstore transparency entry: 1338582393
- Sigstore integration time:
-
Permalink:
AIMLPM/markcrawl@83d082618691d24c78423ade27d485335053160f -
Branch / Tag:
refs/tags/v0.6.0 - Owner: https://github.com/AIMLPM
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@83d082618691d24c78423ade27d485335053160f -
Trigger Event:
release
-
Statement type: