Intelligent Market Monitoring

These details have not been verified by PyPI

Project links

Project description

fraudcrawler

CI Status Python Version License PyPI

Fraudcrawler is an intelligent market monitoring tool that searches the web for products, extracts product details, and classifies them using LLMs. It combines search APIs, web scraping, and AI to automate product discovery and relevance assessment.

Features

Asynchronous pipeline - Products move through search, extraction, and classification stages independently
Multiple search engines - Google Search, Google Shopping, and more...
Search term enrichment - Automatically find related terms and expand your search
Product extraction - Get structured product data via Zyte API
LLM classification - Assess product relevance using OpenAI API with custom prompts
Marketplace filtering - Focus searches on specific domains
Deduplication - Avoid reprocessing previously collected URLs
CSV export - Results saved with timestamps for easy tracking

Prerequisites

Python 3.11 or higher
API keys for:
- SerpAPI - Google search results
- Zyte API - Product data extraction
- OpenAI API - Product classification
- DataForSEO (optional) - Search term enrichment

Installation

python3.11 -m venv .venv
source .venv/bin/activate
pip install fraudcrawler

Using Poetry:

poetry install

Configuration

Create a .env file with your API credentials (see .env.example for template):

SERPAPI_KEY=your_serpapi_key
ZYTEAPI_KEY=your_zyte_key
OPENAIAPI_KEY=your_openai_key
DATAFORSEO_USER=your_user  # optional
DATAFORSEO_PWD=your_pwd    # optional
REDIS_URL=redis://localhost:6379/0  # optional, for response caching

Caching

Fraudcrawler uses Redis-backed caching to avoid duplicate expensive API calls when re-running pipelines during debugging. External API responses (OpenAI, Zyte, SerpAPI, DataForSEO) are automatically cached with a default 24-hour TTL.

Setup:

Install Redis locally via docker: docker run -d -p 6379:6379 redis:8 or use a cloud Redis instance
Set REDIS_USE_CACHE in your .env file (defaults to true, switch to falseif you do not want to use the cache)
Set REDIS_URL in your .env file (defaults to redis://localhost:6379/0 if not set)
Set REDIS_CACHE_TTL in your .env file (defaults to 86400 which is 24h if not set)

Benefits:

Prevents re-paying for identical API calls during development
Supports multiple workers/processes with shared cache
Automatic stampede protection prevents duplicate requests
Gracefully degrades if Redis is unavailable

The cache is automatically invalidated when request parameters change, ensuring you always get fresh results for new queries.

Usage

Basic Configuration

For a complete working example, see fraudcrawler/launch_demo_pipeline.py. After setting up the necessary parameters you can launch and analyse the results with:

# Run pipeline
await client.run(
    search_term=search_term,
    search_engines=search_engines,
    language=language,
    location=location,
    deepness=deepness,
    excluded_urls=excluded_urls,
)

# Load results
df = client.load_results()
print(df.head())

Advanced Configuration

Search term enrichment - Find and search related terms:

from fraudcrawler import Enrichment

deepness.enrichment = Enrichment(
    additional_terms=5,
    additional_urls_per_term=10
)

Marketplace filtering - Focus on specific domains:

from fraudcrawler import Host

marketplaces = [
    Host(name="International", domains="zavamed.com,apomeds.com"),
    Host(name="National", domains="netdoktor.ch,nobelpharma.ch"),
]

await client.run(..., marketplaces=marketplaces)

Exclude domains - Exclude specific domains from your results:

excluded_urls = [
    Host(name="Compendium", domains="compendium.ch"),
]

await client.run(..., excluded_urls=excluded_urls)

Skip previously collected URLs:

previously_collected_urls = [
    "https://example.com/product1",
    "https://example.com/product2",
]

await client.run(..., previously_collected_urls=previously_collected_urls)

View all results from a client instance:

client.print_available_results()

Output

Results are saved as CSV files in data/results/ with the naming pattern:

<search_term>_<language_code>_<location_code>_<timestamp>.csv

Example: sildenafil_de_ch_20250115143022.csv

The CSV includes product details, URLs, and classification scores from your workflows.

Development

For detailed contribution guidelines, see CONTRIBUTING.md.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Architecture

Fraudcrawler uses an asynchronous pipeline where products can be at different processing stages simultaneously. Product A might be in classification while Product B is still being scraped. This is enabled by async workers for each stage (Search, Context Extraction, Processing) using httpx.AsyncClient.

Async Setup

For more details on the async design, see the httpx documentation.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.8.12

Apr 25, 2026

0.8.11

Apr 14, 2026

0.8.10

Apr 13, 2026

0.8.9

Apr 12, 2026

0.8.8

Apr 11, 2026

0.8.7

Apr 11, 2026

0.8.6

Apr 10, 2026

0.8.5

Mar 13, 2026

0.8.4

Mar 8, 2026

0.8.3

Jan 25, 2026

0.8.1

Jan 25, 2026

This version

0.8.0

Jan 25, 2026

0.7.28

Jan 21, 2026

0.7.27

Jan 21, 2026

0.7.26

Jan 21, 2026

0.7.24

Jan 13, 2026

0.7.23

Jan 6, 2026

0.7.22

Jan 6, 2026

0.7.21

Jan 6, 2026

0.7.20

Jan 6, 2026

0.7.19

Jan 6, 2026

0.7.18

Jan 5, 2026

0.7.17

Jan 5, 2026

0.7.16

Jan 5, 2026

0.7.15

Jan 2, 2026

0.7.14

Jan 2, 2026

0.7.13

Jan 2, 2026

0.7.12

Jan 1, 2026

0.7.11

Dec 31, 2025

0.7.10

Dec 31, 2025

0.7.9

Dec 30, 2025

0.7.8

Dec 30, 2025

0.7.7

Dec 30, 2025

0.7.6

Dec 29, 2025

0.7.5

Dec 20, 2025

0.7.4

Dec 12, 2025

0.7.3

Dec 7, 2025

0.7.2

Dec 5, 2025

0.7.1

Dec 3, 2025

0.7.0

Dec 3, 2025

0.6.3

Oct 31, 2025

0.6.2

Oct 10, 2025

0.6.1

Sep 23, 2025

0.6.0

Sep 8, 2025

0.5.9

Aug 29, 2025

0.5.8

Aug 24, 2025

0.5.7

Aug 24, 2025

0.5.6

Aug 24, 2025

0.5.5

Aug 24, 2025

0.4.7

Aug 15, 2025

0.4.6

Aug 11, 2025

0.4.5

Jul 28, 2025

0.4.3

Jul 23, 2025

0.4.2

Jul 22, 2025

0.4.0

Jul 14, 2025

0.3.10

Jul 9, 2025

0.3.9

May 14, 2025

0.3.8

May 14, 2025

0.3.7

May 10, 2025

0.3.6

May 9, 2025

0.3.5

May 9, 2025

0.3.4

May 9, 2025

0.3.3

Apr 25, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fraudcrawler-0.8.0.tar.gz (998.3 kB view details)

Uploaded Jan 25, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

fraudcrawler-0.8.0-py3-none-any.whl (1.1 MB view details)

Uploaded Jan 25, 2026 Python 3

File details

Details for the file fraudcrawler-0.8.0.tar.gz.

File metadata

Download URL: fraudcrawler-0.8.0.tar.gz
Upload date: Jan 25, 2026
Size: 998.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for fraudcrawler-0.8.0.tar.gz
Algorithm	Hash digest
SHA256	`21e4776e95da7a5e92d5ca7b83179b0aae6288adf42a2c19d8851b3a6e7ca1ee`
MD5	`88e7088e1c11009be6170d1535f13881`
BLAKE2b-256	`5fe1922737a0489f5942c98a5a5a1f89fbc83140406306906d74afd6e3132e6b`

See more details on using hashes here.

File details

Details for the file fraudcrawler-0.8.0-py3-none-any.whl.

File metadata

Download URL: fraudcrawler-0.8.0-py3-none-any.whl
Upload date: Jan 25, 2026
Size: 1.1 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for fraudcrawler-0.8.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6cf8acb37e4fc3880212c9771d348fae67aee90c0a8729e4327d7e946577de7e`
MD5	`f8acf2514aeeeee5ee20a0886fe4d636`
BLAKE2b-256	`3e6a9b46122094dd486ef5d5c12298dbbc26bade9b85279d3492ef2d670ad932`

See more details on using hashes here.

fraudcrawler 0.8.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

fraudcrawler

Features

Prerequisites

Installation

Configuration

Caching

Usage

Basic Configuration

Advanced Configuration

Output

Development

License

Architecture

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes