Tokopedia scraper
Project description
Tokopaedi
Tokopaedi is a Python library for extracting e-commerce data from Tokopedia, including product search, detailed product information, and product reviews.
Features
search()– search products by keyword with support for filtersget_product()– fetch rich product details including variants and mediaget_reviews()– retrieve product reviews with ratings and timestamps- Dataclass-based results with
.json()method for easy conversion SearchResultscontainer for iterable and serializable product search results
Installation
pip install tokopaedi
Quick Start
from tokopaedi import search, SearchFilters, get_product, get_reviews, combine_data
from dataclasses import dataclass, asdict
import json
filters = SearchFilters(
bebas_ongkir_extra = True,
pmin = 15000000,
pmax = 30000000,
rt = 4.5
)
results = search("Zenbook 14 32GB", max_result=100, debug=False)
for result in results:
combine_data(
result,
get_product(product_id=result.product_id, debug=True),
get_reviews(product_id=result.product_id, max_result=20, debug=True)
)
with open('log.json','w') as f:
f.write(json.dumps(results.json(), indent=4))
print(json.dumps(results.json(), indent=4))
📘 API Overview
🔍 search(keyword: str, max_result: int = 100, filters: Optional[SearchFilters] = None, debug: bool = False) -> SearchResults
Search for products from Tokopedia.
Parameters:
-
keyword: string keyword (e.g.,"logitech mouse"). -
max_result: Expected number of results to return. -
filters: OptionalSearchFiltersinstance to narrow search results. -
debug: Show debug message if True
Returns:
- A
SearchResultsinstance (list-like object ofProductSearchResult), supporting.json()for easy export.
📦 get_product(product_id: Union[int, str], debug: bool = False) -> ProductData
Fetch detailed information for a given product.
Parameters:
product_id: ID of a product returned fromsearch().debug: Show debug message if True
Returns:
-
A
ProductDatainstance containing detailed information (price, variants, media, etc.). -
Supports
.json()for serialization.
🗣️ get_reviews(product_id: Union[int, str], max_count: int = 20, debug: bool = False) -> List[ProductReview]
Scrape customer reviews for a given product.
Parameters:
-
product_id: Product ID to fetch reviews for. -
max_count: Max number of reviews to fetch (default: 20). -
debug: Show debug message if True
Returns:
- A list of
ProductReviewobjects, each of which has a.json()method.
🔗 combine_data(search_results, products=None, reviews=None) -> SearchResults
Attach product detail and/or reviews to the search results.
Parameters:
-
search_results: TheSearchResultsfromsearch(). -
products: List ofProductDatafromget_product()(optional). -
reviews: List ofProductReviewfromget_reviews()(optional).
Returns:
- A new
SearchResultsobject with.product_detailand.product_reviewsfields filled in (if data was provided).
SearchFilters – Optional Search Filters
Use SearchFilters to refine your search results. All fields are optional. Pass it into the search() function via the filters argument.
Example:
from tokopaedi import SearchFilters, search
filters = SearchFilters(
pmin=100000,
pmax=1000000,
condition=1, # 1 = New
is_discount=True,
bebas_ongkir_extra=True,
rt=4.5, # Minimum rating 4.5
latest_product=30 # Products listed in the last 30 days
)
results = search("logitech mouse", filters=filters)
Available Fields:
| Field | Type | Description | Accepted Values |
|---|---|---|---|
pmin |
int |
Minimum price (in IDR) | e.g., 100000 |
pmax |
int |
Maximum price (in IDR) | e.g., 1000000 |
condition |
int |
Product condition | 1 = New, 2 = Used |
shop_tier |
int |
Type of shop | 2 = Mall, 3 = Power Shop |
rt |
float |
Minimum rating | e.g., 4.5 |
latest_product |
int |
Product recency filter | 7, 30, 90 |
bebas_ongkir_extra |
bool |
Filter for extra free shipping | True / False |
is_discount |
bool |
Only show discounted products | True / False |
is_fulfillment |
bool |
Only Fulfilled by Tokopedia | True / False |
is_plus |
bool |
Only Tokopedia PLUS sellers | True / False |
cod |
bool |
Cash on delivery available | True / False |
Example: Enrich with product details & reviews, then convert to pandas DataFrame from Jupyter Notebook
from tokopaedi import search, SearchFilters, get_product, get_reviews, combine_data
import json
import pandas as pd
from pandas import json_normalize
filters = SearchFilters(
bebas_ongkir_extra=True,
pmax=100000,
rt=4.5
)
# Fetch search results
results = search("logitech g304", max_result=10, debug=False)
# Enrich each result with product details and reviews
for result in results:
combine_data(
result,
get_product(product_id=result.product_id, debug=False),
get_reviews(product_id=result.product_id, max_result=1, debug=False)
)
# Convert to DataFrame and preview important fields
df = json_normalize(results.json())
print(df[[
"product_id",
"category",
"real_price",
"original_price",
"product_detail.product_name",
"rating",
"shop.name"
]].head())
📄 License
This project is licensed under the MIT License.
You are free to use, modify, and distribute this project with attribution. See the LICENSE file for more details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tokopaedi-0.1.1.tar.gz.
File metadata
- Download URL: tokopaedi-0.1.1.tar.gz
- Upload date:
- Size: 15.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c1927b9a499a3898ad55c848f637eec581375ce0bcc71ec85db7c134448c1048
|
|
| MD5 |
866a8f583607af2f0b8dafe7c5b8d470
|
|
| BLAKE2b-256 |
ff502f39f0b8f9c7de411e89d02d711f675eb3fc4b43ae42fcbfd9cc6c6e0304
|
File details
Details for the file tokopaedi-0.1.1-py3-none-any.whl.
File metadata
- Download URL: tokopaedi-0.1.1-py3-none-any.whl
- Upload date:
- Size: 16.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
296a1cf794bca50aaef44c5be52f1df6e9b2c26819e3609b08a1de72d71138a3
|
|
| MD5 |
808621992340f70304f49a1f5ad18212
|
|
| BLAKE2b-256 |
324654fd204e5754acacf43b75dcff6e1fada82d2f1677c48eb912a51b2f97ed
|