Skip to main content
Python Software Foundation 20th Year Anniversary Fundraiser  Donate today!

A crawler for product information of sellers on Ruten.

Project description

Ruten Seller Product Parser

PyPI version PyPI license

This is a repository that offers a ProductCrawler class to crawl Ruten web pages for the product information in json format.

from ruten_crawler import ProductCrawler
product_crawler = ProductCrawler(seller_id = "hambergurs")
results = product_crawler.get_crawl_result()


To install this verson from PyPI, type:

pip install rutencrawler

To get the newest one from this repo (note that we are in the alpha stage, so there may be frequent updates), type:

pip install git+git://


class ProductCrawler class handles the whole web crawling logic. It takes optional arguments of sleep_time and sleep_at_each_iteration

class ProductPageParser handles the product page information extraction. Currently the parser only extracts shipping information, urls for images and the title of the product. More info can be extracted and the logic can be added here.

class ProdcutListParser handles the parsing of product list page. The main function is to extract a list of product urls at each page, and then the urls are then used to parse product information with ProductPageParser


  • add more error-proof exception handlers in ProductCrawler due to the multi-threaded nature of the process.
  • add more product info extraction features in ProductCrawler, e.g. price, remaining time, description, etc.

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for ruten-crawler, version 0.0.6
Filename, size File type Python version Upload date Hashes
Filename, size ruten_crawler-0.0.6-py3-none-any.whl (8.0 kB) File type Wheel Python version py3 Upload date Hashes View
Filename, size ruten_crawler-0.0.6.tar.gz (3.7 kB) File type Source Python version None Upload date Hashes View

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring DigiCert DigiCert EV certificate Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page