Python3 library to ease Aliexpress crawling

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Crawliexpress

Crawliexpress
- Description
- Usage
- API

Description

Allows to fetch various resources from Aliexpress, such as category, text search, product, feedbacks.

It does not use official API nor a headless browser, but parses page source.

Obviously, it is very vulnerable to DOM changes.

Usage

Install

pip install crawliexpress

Item

from crawliexpress import Client

client = Client("https://www.aliexpress.com")
client.get_item("4000505787173")

Feedbacks

from crawliexpress import Client

from pprint import pprint
from time import sleep

client = Client("https://www.aliexpress.com")
item = client.get_item("20000001708485")

page = 1
pages = list()
while True:
    feedback_page = client.get_feedbacks(
        item.product_id,
        item.owner_member_id,
        item.company_id,
        with_picture=True,
        page=page,
    )
    print(feedback_page.page)
    if feedback_page.has_next_page() is False:
        break
    page += 1
    sleep(1)

Search

from crawliexpress import Client

from time import sleep

client = Client(
    "https://www.aliexpress.com",
    # copy it from your browser cookies
    "xxxx",
)

page = 1
while True:
    search_page = client.get_search("akame ga kill", page=page)
    print(search_page.page)
    if search_page.has_next_page() is False:
        break
    page += 1
    sleep(1)

Cookies must be taken from your browser cookies, to avoid captcha and empty results. I usually login then copy as cURL a request made by my browser on a category or a text search. Make sure to remove the Cookie: prefix to keep only cookie values.

API

class crawliexpress.Category(client, category_id, category_name, sort_by='default')

A category

Parameters
- category_id – id of the category, category id of https://www.aliexpress.com/category/205000221/t-shirts.html is 205000220
- category_name – name of the category, category name of https://www.aliexpress.com/category/205000221/t-shirts.html is t-shirts
- sort_by (default: best match total_tranpro_desc: number of orders) – indeed

class crawliexpress.Client(base_url, cookies=None)

Exposes methods to fetch various resources.

Parameters
- base_url – allows to change locale (not sure about this one)
- cookies – must be taken from your browser cookies, to avoid captcha and empty results. I usually login then copy as cURL a request made by my browser on a category or a text search. Make sure to remove the Cookie: prefix to keep only cookie values.

get_category(category_id, category_name, page=1, sort_by='default')

Fetches a category page

Parameters
- category_id – id of the category, category id of https://www.aliexpress.com/category/205000221/t-shirts.html is 205000220
- category_name – name of the category, category name of https://www.aliexpress.com/category/205000221/t-shirts.html is t-shirts
- page – page number
- sort_by (default: best match total_tranpro_desc: number of orders) – indeed
Returns

a search page
Return type

Crawliexpress.SearchPage
Raises
- CrawliexpressException – if there was an error fetching the dataz
- CrawliexpressCaptchaException – if there is a captcha, make sure to use valid cookies to avoid this

get_feedbacks(product_id, owner_member_id, company_id=None, v=2, member_type='seller', page=1, with_picture=False)

Fetches a product feedback page

Parameters
- product_id – id of the product, item id of https://www.aliexpress.com/item/20000001708485.html is 20000001708485
- owner_member_id – member id of the product owner, as stored in Crawliexpress.Item.owner_member_id
- page – page number
- with_picture – limit to feedbacks with a picture
Returns

a feedback page
Return type

Crawliexpress.FeedbackPage
Raises

CrawliexpressException – if there was an error fetching the dataz

get_item(item_id)

Fetches a product informations from its id

Parameters

item_id – id of the product to fetch, item id of https://www.aliexpress.com/item/20000001708485.html is 20000001708485
Returns

a product
Return type

Crawliexpress.Item
Raises

CrawliexpressException – if there was an error fetching the dataz

get_search(text, page=1, sort_by='default')

Fetches a search page

Parameters
- text – text search
- page – page number
- sort_by (default: best match total_tranpro_desc: number of orders) – indeed
Returns

a search page
Return type

Crawliexpress.SearchPage
Raises
- CrawliexpressException – if there was an error fetching the dataz
- CrawliexpressCaptchaException – if there is a captcha, make sure to use valid cookies to avoid this

exception crawliexpress.CrawliexpressCaptchaException()

exception crawliexpress.CrawliexpressException()

class crawliexpress.Feedback()

A user feedback

comment( = None)

Review

country( = None)

Country code

datetime( = None)

Raw datetime from DOM

images( = None)

List of image links

profile( = None)

Profile link

rating( = None)

Rating out of 100

user( = None)

Name

class crawliexpress.FeedbackPage()

A feedback page

feedbacks( = None)

List of Crawliexpress.Feedback objects

has_next_page()

Returns true if there is a following page, useful for crawling

Return type

bool

known_pages( = None)

Sibling pages

page( = None)

Page number

class crawliexpress.Search(client, text, sort_by='default')

A search

Parameters
- text – text search
- sort_by (default: best match total_tranpro_desc: number of orders) – indeed

class crawliexpress.SearchPage()

A search page

has_next_page()

Returns true if there is a following page, useful for crawling

Return type

bool

items( = None)

List of products, raw from JS parsing

page( = None)

page number

result_count( = None)

Number of result for the whole search

size_per_page( = None)

Number of result per page

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.1.7

Oct 9, 2020

0.1.6

Oct 8, 2020

0.1.5

Oct 8, 2020

0.1.4

Oct 6, 2020

0.1.3

Oct 6, 2020

0.1.2

Oct 6, 2020

0.1.1

Oct 2, 2020

0.1.0

Sep 30, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crawliexpress-0.1.7.tar.gz (9.0 kB view hashes)

Uploaded Oct 9, 2020 Source

Built Distribution

crawliexpress-0.1.7-py3-none-any.whl (10.4 kB view hashes)

Uploaded Oct 9, 2020 Python 3

Hashes for crawliexpress-0.1.7.tar.gz

Hashes for crawliexpress-0.1.7.tar.gz
Algorithm	Hash digest
SHA256	`380b0a45716a34aa87afc33b2c92597285b5f8520268a7179164e30c5dceb9f9`
MD5	`a8629de4968359a9e964b0f536c21a51`
BLAKE2b-256	`4088a29336361568e093ff94a60ea336771c3b6d4c35eb758f12af43adb31678`

Hashes for crawliexpress-0.1.7-py3-none-any.whl

Hashes for crawliexpress-0.1.7-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7146a70f29a81541a4e51a44d8e29d16a7515e726410669f82681d9c57ef37e7`
MD5	`41df33f9b19a08d198a3cf6b45c06c7f`
BLAKE2b-256	`a1306b0f7d5135ade0f358dea941e16633317c560a730e6e4cb68305be7074c8`

crawliexpress 0.1.7

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Project description

Crawliexpress

Description

Usage

Install

Item

Feedbacks

Category

Search

API

class crawliexpress.Category(client, category_id, category_name, sort_by='default')

class crawliexpress.Client(base_url, cookies=None)

get_category(category_id, category_name, page=1, sort_by='default')

get_feedbacks(product_id, owner_member_id, company_id=None, v=2, member_type='seller', page=1, with_picture=False)

get_item(item_id)

get_search(text, page=1, sort_by='default')

exception crawliexpress.CrawliexpressCaptchaException()

exception crawliexpress.CrawliexpressException()

class crawliexpress.Feedback()

comment( = None)

country( = None)

datetime( = None)

images( = None)

profile( = None)

rating( = None)

user( = None)

class crawliexpress.FeedbackPage()

feedbacks( = None)

has_next_page()

known_pages( = None)

page( = None)

class crawliexpress.Search(client, text, sort_by='default')

class crawliexpress.SearchPage()

has_next_page()

items( = None)

page( = None)

result_count( = None)

size_per_page( = None)

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution