Skip to main content

Python3 library to ease Aliexpress crawling

Project description

Crawliexpress

Description

Allows to fetch various resources from Aliexpress, such as category, text search, product, feedbacks.

It does not use official API nor a headless browser, but parses page source.

Obviously, it is very vulnerable to DOM changes.

Usage

Install

pip install crawliexpress

Item

from crawliexpress import Client

client = Client("https://www.aliexpress.com")
client.get_item("4000505787173")

Feedbacks

from crawliexpress import Client

from pprint import pprint
from time import sleep

client = Client("https://www.aliexpress.com")
item = client.get_item("20000001708485")

page = 1
pages = list()
while True:
    feedback_page = client.get_feedbacks(
        item.product_id,
        item.owner_member_id,
        item.company_id,
        with_picture=True,
        page=page,
    )
    print(feedback_page.page)
    if feedback_page.has_next_page() is False:
        break
    page += 1
    sleep(1)

Category

from crawliexpress import Client

from time import sleep

client = Client(
    "https://www.aliexpress.com",
    # copy it from your browser cookies
    "xxxx",
)

page = 1
while True:
    search_page = client.get_category(205000314, "t-shirts", page=page)
    print(search_page.page)
    if search_page.has_next_page() is False:
        break
    page += 1
    sleep(1)
  • Cookies must be taken from your browser cookies, to avoid captcha and empty results. I usually login then copy as cURL a request made by my browser on a category or a text search. Make sure to remove the Cookie: prefix to keep only cookie values.

Search

from crawliexpress import Client

from time import sleep

client = Client(
    "https://www.aliexpress.com",
    # copy it from your browser cookies
    "xxxx",
)

page = 1
while True:
    search_page = client.get_search("akame ga kill", page=page)
    print(search_page.page)
    if search_page.has_next_page() is False:
        break
    page += 1
    sleep(1)
  • Cookies must be taken from your browser cookies, to avoid captcha and empty results. I usually login then copy as cURL a request made by my browser on a category or a text search. Make sure to remove the Cookie: prefix to keep only cookie values.

API

class crawliexpress.Category(client, category_id, category_name, sort_by='default')

A category

class crawliexpress.Client(base_url, cookies=None)

Exposes methods to fetch various resources.

  • Parameters

    • base_url – allows to change locale (not sure about this one)

    • cookies – must be taken from your browser cookies, to avoid captcha and empty results. I usually login then copy as cURL a request made by my browser on a category or a text search. Make sure to remove the Cookie: prefix to keep only cookie values.

get_category(category_id, category_name, page=1, sort_by='default')

Fetches a category page

  • Parameters

  • Returns

    a search page

  • Return type

    Crawliexpress.SearchPage

  • Raises

    • CrawliexpressException – if there was an error fetching the dataz

    • CrawliexpressCaptchaException – if there is a captcha, make sure to use valid cookies to avoid this

get_feedbacks(product_id, owner_member_id, company_id=None, v=2, member_type='seller', page=1, with_picture=False)

Fetches a product feedback page

  • Parameters

    • product_id – id of the product, item id of https://www.aliexpress.com/item/20000001708485.html is 20000001708485

    • owner_member_id – member id of the product owner, as stored in Crawliexpress.Item.owner_member_id

    • page – page number

    • with_picture – limit to feedbacks with a picture

  • Returns

    a feedback page

  • Return type

    Crawliexpress.FeedbackPage

  • Raises

    CrawliexpressException – if there was an error fetching the dataz

get_item(item_id)

Fetches a product informations from its id

  • Parameters

    item_id – id of the product to fetch, item id of https://www.aliexpress.com/item/20000001708485.html is 20000001708485

  • Returns

    a product

  • Return type

    Crawliexpress.Item

  • Raises

    CrawliexpressException – if there was an error fetching the dataz

get_search(text, page=1, sort_by='default')

Fetches a search page

  • Parameters

    • text – text search

    • page – page number

    • sort_by (default: best match total_tranpro_desc: number of orders) – indeed

  • Returns

    a search page

  • Return type

    Crawliexpress.SearchPage

  • Raises

    • CrawliexpressException – if there was an error fetching the dataz

    • CrawliexpressCaptchaException – if there is a captcha, make sure to use valid cookies to avoid this

exception crawliexpress.CrawliexpressCaptchaException()

exception crawliexpress.CrawliexpressException()

class crawliexpress.Feedback()

A user feedback

comment( = None)

Review

country( = None)

Country code

datetime( = None)

Raw datetime from DOM

images( = None)

List of image links

profile( = None)

Profile link

rating( = None)

Rating out of 100

user( = None)

Name

class crawliexpress.FeedbackPage()

A feedback page

feedbacks( = None)

List of Crawliexpress.Feedback objects

has_next_page()

Returns true if there is a following page, useful for crawling

  • Return type

    bool

known_pages( = None)

Sibling pages

page( = None)

Page number

class crawliexpress.Search(client, text, sort_by='default')

A search

  • Parameters

    • text – text search

    • sort_by (default: best match total_tranpro_desc: number of orders) – indeed

class crawliexpress.SearchPage()

A search page

has_next_page()

Returns true if there is a following page, useful for crawling

  • Return type

    bool

items( = None)

List of products, raw from JS parsing

page( = None)

page number

result_count( = None)

Number of result for the whole search

size_per_page( = None)

Number of result per page

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crawliexpress-0.1.7.tar.gz (9.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

crawliexpress-0.1.7-py3-none-any.whl (10.4 kB view details)

Uploaded Python 3

File details

Details for the file crawliexpress-0.1.7.tar.gz.

File metadata

  • Download URL: crawliexpress-0.1.7.tar.gz
  • Upload date:
  • Size: 9.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.50.0 CPython/3.8.5

File hashes

Hashes for crawliexpress-0.1.7.tar.gz
Algorithm Hash digest
SHA256 380b0a45716a34aa87afc33b2c92597285b5f8520268a7179164e30c5dceb9f9
MD5 a8629de4968359a9e964b0f536c21a51
BLAKE2b-256 4088a29336361568e093ff94a60ea336771c3b6d4c35eb758f12af43adb31678

See more details on using hashes here.

File details

Details for the file crawliexpress-0.1.7-py3-none-any.whl.

File metadata

  • Download URL: crawliexpress-0.1.7-py3-none-any.whl
  • Upload date:
  • Size: 10.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.50.0 CPython/3.8.5

File hashes

Hashes for crawliexpress-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 7146a70f29a81541a4e51a44d8e29d16a7515e726410669f82681d9c57ef37e7
MD5 41df33f9b19a08d198a3cf6b45c06c7f
BLAKE2b-256 a1306b0f7d5135ade0f358dea941e16633317c560a730e6e4cb68305be7074c8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page