Skip to main content

Python3 library to ease Aliexpress crawling

Project description

Crawliexpress

Description

Allows to fetch various resources from Aliexpress, such as category, text search, product, feedbacks.

It does not use official API nor a headless browser, but parses page source.

Obviously, it is very vulnerable to DOM changes.

Usage

Install

pip install crawliexpress

Item

from crawliexpress import Client

client = Client("https://www.aliexpress.com")
client.get_item("4000505787173")

Feedbacks

from crawliexpress import Client

from pprint import pprint
from time import sleep

client = Client("https://www.aliexpress.com")
item = client.get_item("20000001708485")

page = 1
pages = list()
while True:
    feedback_page = client.get_feedbacks(
        item.product_id,
        item.owner_member_id,
        item.company_id,
        with_picture=True,
        page=page,
    )
    print(feedback_page.page)
    if feedback_page.has_next_page() is False:
        break
    page += 1
    sleep(1)

Category

from crawliexpress import Client

from time import sleep

client = Client(
    "https://www.aliexpress.com",
    # copy it from your browser cookies
    "xxxx",
)

page = 1
while True:
    search_page = client.get_category(205000314, "t-shirts", page=page)
    print(search_page.page)
    if search_page.has_next_page() is False:
        break
    page += 1
    sleep(1)
  • Cookies must be taken from your browser cookies, to avoid captcha and empty results. I usually login then copy as cURL a request made by my browser on a category or a text search. Make sure to remove the Cookie: prefix to keep only cookie values.

Search

from crawliexpress import Client

from time import sleep

client = Client(
    "https://www.aliexpress.com",
    # copy it from your browser cookies
    "xxxx",
)

page = 1
while True:
    search_page = client.get_search("akame ga kill", page=page)
    print(search_page.page)
    if search_page.has_next_page() is False:
        break
    page += 1
    sleep(1)
  • Cookies must be taken from your browser cookies, to avoid captcha and empty results. I usually login then copy as cURL a request made by my browser on a category or a text search. Make sure to remove the Cookie: prefix to keep only cookie values.

API

class crawliexpress.Category(client, category_id, category_name, sort_by='default')

A category

class crawliexpress.Client(base_url, cookies=None)

Exposes methods to fetch various resources.

  • Parameters

    • base_url – allows to change locale (not sure about this one)

    • cookies – must be taken from your browser cookies, to avoid captcha and empty results. I usually login then copy as cURL a request made by my browser on a category or a text search. Make sure to remove the Cookie: prefix to keep only cookie values.

get_category(category_id, category_name, page=1, sort_by='default')

Fetches a category page

  • Parameters

  • Returns

    a search page

  • Return type

    Crawliexpress.SearchPage

  • Raises

    • CrawliexpressException – if there was an error fetching the dataz

    • CrawliexpressCaptchaException – if there is a captcha, make sure to use valid cookies to avoid this

get_feedbacks(product_id, owner_member_id, company_id=None, v=2, member_type='seller', page=1, with_picture=False)

Fetches a product feedback page

  • Parameters

    • product_id – id of the product, item id of https://www.aliexpress.com/item/20000001708485.html is 20000001708485

    • owner_member_id – member id of the product owner, as stored in Crawliexpress.Item.owner_member_id

    • page – page number

    • with_picture – limit to feedbacks with a picture

  • Returns

    a feedback page

  • Return type

    Crawliexpress.FeedbackPage

  • Raises

    CrawliexpressException – if there was an error fetching the dataz

get_item(item_id)

Fetches a product informations from its id

  • Parameters

    item_id – id of the product to fetch, item id of https://www.aliexpress.com/item/20000001708485.html is 20000001708485

  • Returns

    a product

  • Return type

    Crawliexpress.Item

  • Raises

    CrawliexpressException – if there was an error fetching the dataz

get_search(text, page=1, sort_by='default')

Fetches a search page

  • Parameters

    • text – text search

    • page – page number

    • sort_by (default: best match total_tranpro_desc: number of orders) – indeed

  • Returns

    a search page

  • Return type

    Crawliexpress.SearchPage

  • Raises

    • CrawliexpressException – if there was an error fetching the dataz

    • CrawliexpressCaptchaException – if there is a captcha, make sure to use valid cookies to avoid this

exception crawliexpress.CrawliexpressCaptchaException()

exception crawliexpress.CrawliexpressException()

class crawliexpress.Feedback()

A user feedback

comment( = None)

Review

country( = None)

Country code

datetime( = None)

Raw datetime from DOM

images( = None)

List of image links

profile( = None)

Profile link

rating( = None)

Rating out of 100

user( = None)

Name

class crawliexpress.FeedbackPage()

A feedback page

feedbacks( = None)

List of Crawliexpress.Feedback objects

has_next_page()

Returns true if there is a following page, useful for crawling

  • Return type

    bool

known_pages( = None)

Sibling pages

page( = None)

Page number

class crawliexpress.Search(client, text, sort_by='default')

A search

  • Parameters

    • text – text search

    • sort_by (default: best match total_tranpro_desc: number of orders) – indeed

class crawliexpress.SearchPage()

A search page

has_next_page()

Returns true if there is a following page, useful for crawling

  • Return type

    bool

items( = None)

List of products, raw from JS parsing

page( = None)

page number

result_count( = None)

Number of result for the whole search

size_per_page( = None)

Number of result per page

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crawliexpress-0.1.7.tar.gz (9.0 kB view hashes)

Uploaded Source

Built Distribution

crawliexpress-0.1.7-py3-none-any.whl (10.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page