API for the internet

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

Desync Search — "API to the Internet"

Motto: The easiest way to scrape and retrieve web data without aggressive rate limits or heavy detection.

Key Features

No Rate Limiting: We allow you to scale concurrency without punishing usage. You can open many parallel searches; we’ll only throttle if the underlying cloud providers themselves are saturated.
Extremely Low Detection Rates: Our “stealth_search” uses advanced methods for a “human-like” page visit. While we cannot guarantee 100% evasion, most websites pass under the radar, and CAPTCHAs—when they do appear—are often circumvented by a second pass.
Competitive, Pay-as-You-Go Pricing: No forced subscriptions or huge minimum monthly costs. You pick how much you spend. Our per-search cost is typically half of what big competitors charge (who often require $1,000+ per month).
First 1,000 Searches Free: Not convinced? Try it yourself, risk-free. We’ll spot you 1,000 searches when you sign up. Check out desync.ai for more info.

Installation

Install via PyPI using:

pip install desync_search

Because we update often, you may want to run:

pip install --upgrade desync_search

This library requires Python 3.6+ and requests (installed automatically).

Basic Usage

You’ll need a user API key (e.g. "totallynotarealapikeywithactualcreditsonit"). A best practice is to store that key in an environment variable (e.g., DESYNC_API_KEY) to avoid embedding secrets in code:

export DESYNC_API_KEY="YOUR_ACTUAL_KEY"

Then in your Python code:

import os
from desync_search import DesyncClient

user_api_key = os.environ.get("DESYNC_API_KEY", "")
client = DesyncClient(user_api_key)

Here, the client automatically targets our production endpoint:

https://nycv5sx75joaxnzdkgvpx5mcme0butbo.lambda-url.us-east-1.on.aws/

Tip: Pass developer_mode=True to DesyncClient(...) if you want to use a testing endpoint (e.g. staging environment).

Searching for Data

1) Single-URL Search

By default, client.search(...) does a stealth search (cost: 10 credits). If you want a test search (cost: 1 credit), pass search_type="test_search".

# Stealth Search (default)
page_data = client.search("https://www.137ventures.com/portfolio")

print("URL:", page_data.url)
print("Text length:", len(page_data.text_content))

# Test Search
test_response = client.search(
    "https://www.python.org",
    search_type="test_search"
)
print("Test search type:", test_response.search_type)

Both calls return a PageData object. For stealth, you’ll typically see fields like .text_content, .internal_links, and .external_links.

print(page_data)
# <PageData url=https://www.137ventures.com/portfolio search_type=stealth_search timestamp=... complete=True>

print(page_data.text_content[:200])  # first 200 chars

Pass scrape_full_html=True to get the entire HTML, or remove_link_duplicates=False to keep duplicates:

stealth_response = client.search(
    "https://www.137ventures.com/portfolio",
    scrape_full_html=True,
    remove_link_duplicates=False
)
print(len(stealth_response.html_content), "HTML chars")

2) Bulk Searching Multiple URLs

If you have a large list of URLs, use bulk_search. This creates an asynchronous job on the server side, which processes each URL in parallel. For each URL, it’s typically 10 credits (stealth). Example:

from desync_search import DesyncClient

# Suppose you have a list of URLs:
example_urls = [
    "https://www.137ventures.com/", 
    "https://www.137ventures.com/portfolio"
]

client = DesyncClient("YOUR_API_KEY")
response = client.bulk_search(
    target_list=example_urls,
    extract_html=False  # If True, returns HTML in each record
)

print(response)
# e.g. {
#   "message": "Bulk search triggered successfully.",
#   "bulk_search_id": "123e4567-e89b-12d3-a456-426614174000",
#   "total_links": 2,
#   "cost_charged": 20,
#   "execution_arn": "arn:aws:states:..."
# }

bulk_id = response["bulk_search_id"]
print("Bulk job started with ID:", bulk_id)

Because the scraping happens asynchronously, you’ll typically wait a few seconds (or more) before the results are fully ready.

2a) Manual Polling

You can do a manual check using list_available to see if those pages appear (and are marked complete=True). For example:

import time

# Wait ~8 seconds, then see if the results are ready
time.sleep(8)
found_records = client.list_available(
    url_list=example_urls,
    bulk_search_id=bulk_id
)

for record in found_records:
    print(record.url, record.complete)

Once the pages are ready, you can call pull_data(bulk_search_id=bulk_id) to retrieve the full text/HTML, etc.

2b) Automated Polling with `collect_results`

We provide a collect_results method to automate this polling. It will:

Check periodically (e.g., every 2 seconds) how many of your URLs are “complete.”
If 97.5% (or another fraction you choose) are done, or a certain max wait time expires, it retrieves the full data.

bulk_search_resp = client.bulk_search(
    target_list=example_urls,
    extract_html=False
)
bulk_id = bulk_search_resp["bulk_search_id"]

# We'll collect results once ~97.5% are done or 30 seconds pass (whichever first).
records = client.collect_results(
    bulk_search_id=bulk_id,
    target_links=example_urls,  # so it knows how many links to expect
    wait_time=30,               # max seconds to wait
    poll_interval=2,            # check every 2 seconds
    completion_fraction=0.975    # 97.5%
)

print(f"Got {len(records)} pages in final result.")
for page in records:
    print(page.url, page.complete)

Note: If the job is very large (hundreds of URLs), consider further chunking or splitting. The library and the API can handle up to 1,000 links at once, but results might take longer.

Retrieving Past Results

3) Listing Minimal Data

Use list_available() to get minimal data (like IDs, URLs, timestamps) for all or a subset of your past searches:

all_records = client.list_available()
print("Found", len(all_records), "total records in the database.")

# Or just for certain URLs or a bulk_search_id:
subset_records = client.list_available(
    url_list=["https://www.137ventures.com/"],
    bulk_search_id="123e4567-e89b-12d3-a456-426614174000"
)
for r in subset_records:
    print(r.id, r.url, r.search_type, r.complete)

Each returned item is a PageData with limited fields (no large text or HTML) to save bandwidth.

4) Pulling Full Details

If you want all fields (including text_content, html_content, etc.), call pull_data(...). You can filter by various parameters such as record_id, url, or bulk_search_id.

# Pull by record_id:
detailed_list = client.pull_data(record_id="your_record_id_here")

# Or by bulk_search_id:
bulk_details = client.pull_data(bulk_search_id="123e4567-e89b-12d3-a456-426614174000")

# Now each item in `detailed_list` or `bulk_details` can have text_content, html_content, etc.
for page in detailed_list:
    print(page.url, len(page.text_content), "chars of text")

Checking Your Credits Balance

Use pull_credits_balance() to see how many credits remain on your account:

balance_info = client.pull_credits_balance()
print("Credits left:", balance_info.get("credits_balance"))

Typical response:

{
  "success": True,
  "credits_balance": 240
}

Example: Combine Bulk Search + collect_results

Here’s a short end-to-end script you could run:

from desync_search import DesyncClient

def run_bulk_search_example():
    # 1) Provide your user API key
    my_api_key = "YOUR_ACTUAL_KEY"
    client = DesyncClient(my_api_key)

    # 2) Some URLs to crawl
    example_urls = [
        "https://www.137ventures.com/", 
        "https://www.137ventures.com/portfolio"
    ]

    # 3) Trigger bulk search
    resp = client.bulk_search(target_list=example_urls)
    bulk_id = resp["bulk_search_id"]
    print("Bulk Search Response:", resp)

    # 4) Collect results automatically (poll until 97.5% done or 30s)
    results = client.collect_results(bulk_search_id=bulk_id, target_links=example_urls)
    for page in results:
        print(page.url, page.complete)

    print(f"Total pages retrieved = {len(results)}")

if __name__ == "__main__":
    run_bulk_search_example()

This snippet:

Initiates a bulk job.
Waits for it to become “mostly done” (default completion_fraction=0.975).
Pulls the full data for all those pages in a single retrieval call.

Additional Notes

Attribution: Relies on open-source libraries such as requests.
Rate Limits: We do not impose strict concurrency throttles, but large-scale usage could be slowed if the underlying cloud environment is heavily utilized.
First 1,000 Searches: New accounts start with 1,000 free searches. If you do large-scale crawling, keep an eye on your credit usage.
Large Bulk: For >1,000 URLs, break them into multiple bulk_search calls, or contact support for special accommodations.

License

This project is licensed under the MIT License.

Happy scraping with Desync Search—the next-level “API to the Internet”! Let us know how it goes, and feel free to file issues or pull requests.

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

0.2.29

Aug 28, 2025

0.2.28

Aug 28, 2025

0.2.27

Aug 28, 2025

0.2.26

Aug 28, 2025

0.2.25

Mar 12, 2025

0.2.24

Feb 11, 2025

0.2.23

Feb 11, 2025

0.2.22

Feb 11, 2025

0.2.21

Feb 7, 2025

0.2.20

Jan 31, 2025

0.2.19

Jan 31, 2025

0.2.18

Jan 29, 2025

This version

0.2.17

Jan 25, 2025

0.2.16

Jan 25, 2025

0.2.15

Jan 25, 2025

0.2.14

Jan 25, 2025

0.2.13

Jan 25, 2025

0.2.12

Jan 23, 2025

0.2.11

Jan 23, 2025

0.2.10

Jan 23, 2025

0.2.9

Jan 23, 2025

0.2.8

Jan 22, 2025

0.2.4

Jan 20, 2025

0.2.3

Jan 20, 2025

0.2.2

Jan 20, 2025

0.2.1

Jan 20, 2025

0.2.0

Jan 20, 2025

0.1.4

Dec 5, 2024

0.1.3

Dec 5, 2024

0.1.0

Dec 3, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

desync_search-0.2.17.tar.gz (13.5 kB view details)

Uploaded Jan 25, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

desync_search-0.2.17-py3-none-any.whl (10.8 kB view details)

Uploaded Jan 25, 2025 Python 3

File details

Details for the file desync_search-0.2.17.tar.gz.

File metadata

Download URL: desync_search-0.2.17.tar.gz
Upload date: Jan 25, 2025
Size: 13.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.7

File hashes

Hashes for desync_search-0.2.17.tar.gz
Algorithm	Hash digest
SHA256	`6cbbc862e3099129f078b4bce874f2ac15182330d2c87ad0bcacaf89e91367bf`
MD5	`446f339bba7fe4a1e1c1f1d3d6d435e4`
BLAKE2b-256	`21efc0ced7b896d203a89d33b2f5c4afa05ee496dc3babd1d6cedcaa0453213f`

See more details on using hashes here.

File details

Details for the file desync_search-0.2.17-py3-none-any.whl.

File metadata

Download URL: desync_search-0.2.17-py3-none-any.whl
Upload date: Jan 25, 2025
Size: 10.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.7

File hashes

Hashes for desync_search-0.2.17-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d6b703df4731e4df6fc742425ea15f8d185046c69fe5ab1b89ec768d1caf2d38`
MD5	`4cfcbc383753b96c79ef8a2d70868a1e`
BLAKE2b-256	`ca2053a53a4fe05cc2ba88e236631c804abf5d32b22b0144a768dfe92d4bb372`

See more details on using hashes here.

desync-search 0.2.17

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Desync Search — "API to the Internet"

Key Features

Installation

Basic Usage

Searching for Data

1) Single-URL Search

2) Bulk Searching Multiple URLs

2a) Manual Polling

2b) Automated Polling with collect_results

Retrieving Past Results

3) Listing Minimal Data

4) Pulling Full Details

Checking Your Credits Balance

Example: Combine Bulk Search + collect_results

Additional Notes

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

2b) Automated Polling with `collect_results`