Skip to main content

A simple package for scraping product information on the Nike website (nike.com/id)

Project description

Nike Scraper By Kejora

A simple Python package for scraping detailed product information from the Nike Indonesia site (nike.com/id).

Table of Contents

Installation

Make sure you have created a virtual environment before installing the package.

To use this package, you need to install it in your Python virtual environment. You can do this using pip:

pip install -U nikescraperbykejora

You also need to install the following dependencies:

  • pandas>=2.1.1;
  • httpx>=0.25.0;
  • playwright>=1.38.0;
  • selectolax>=0.3.16;

Usage

Directory Structure

Make directory structure to use package as below:

example_scraper/
├── result
├── main.py
├── .gitignore
└── README.md

Scraping a Single Product

You can scrape data for a single Nike product using the provided main.py script. See example below:

example_scraping_single_product

Here's an example of how to use it. Scraping detail data of Nike Air Force 1 '07 with the url:

https://www.nike.com/id/t/air-force-1-07-shoe-NMmm1B/DD8959-100

So...the code for scraping in your main.py file as below.

import asyncio
import os
from nikescraperbykejora.scrapingresult import ProductScraperHandler


async def main():
    # User input for scraping one product
    target_url_one = "https://www.nike.com/id/t/air-force-1-07-shoe-NMmm1B/DD8959-100" # Change with url you want to scrape, DON'T skip quotation mark (" ")
    txt_file_name = "Nike Air Force 1 '07.txt" # Change with the name of product you want to scrape, DON'T skip .txt

    # Setting result directory
    project_directory = os.path.dirname(os.path.abspath(__file__))
    os.chdir(project_directory)
    result_directory = os.path.join(project_directory, "result")

    if not os.path.exists(result_directory):
        os.makedirs(result_directory)

    result_file_path = os.path.join(result_directory, txt_file_name)

    await ProductScraperHandler.one_product(target_url_one, result_file_path)

if __name__ == "__main__":
    asyncio.run(main())

Scraping Multi Product

You can scrape data for some Nike products in product category using the provided main.py script. See example as below:

example_scraping_multi_product

On nike.com/id navbar you choose Men > Football > Shop by price Rp1.500.001 - Rp2.999.999, so below the data will be display on the site:

https://www.nike.com/id/w/mens-1500001-2999999-football-shoes-1gdj0z2952fznik1zy7ok
  • Multi Product Category Name : Men's Rp1.500.001 - Rp2.999.999 Football Shoes
  • Product count : 14 # It could be that when you try this URL, the product count value is different

And...the code for scraping in your main.py file as below.

import asyncio
import os
from nikescraperbykejora.scrapingresult import ProductScraperHandler


async def main():
    # User input for scraping multi products
    target_url_multi = "https://www.nike.com/id/w/mens-1500001-2999999-football-shoes-1gdj0z2952fznik1zy7ok"  # Change with url you want to scrape
    csv_file_name = "Men's Rp1.500.001 - Rp2.999.999 Football Shoes.csv"  # Change with the name of product category you want to scrape, DON'T skip .CSV
    product_count = 14  # Change with product count that displayed on the page
    timeout_seconds = 10  # Change with a higher integer value if scraping fails due to timeout

    # Setting result directory
    project_directory = os.path.dirname(os.path.abspath(__file__))
    os.chdir(project_directory)
    result_directory = os.path.join(project_directory, "result")

    if not os.path.exists(result_directory):
        os.makedirs(result_directory)

    result_file_path = os.path.join(result_directory, csv_file_name)

    await ProductScraperHandler.multi_product(target_url_multi, result_file_path, product_count, timeout_seconds)

if __name__ == "__main__":
    asyncio.run(main())

WARNING

Make sure double quotation mark (" ") for url, .txt for single product file name, .csv for multi product file name.

Scraping Result

The scraping result will be saved in the 'result' directory that automatically appears after the scraping process is complete.

Contributing

If you'd like to contribute to this project, please follow these steps:

  • Fork the repository on GitHub.
  • Create a new branch with a descriptive name.
  • Make your changes and commit them.
  • Push your changes to your fork.
  • Submit a pull request to the original repository.

License

This project is licensed under GNU General Public License (GPL) - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nikescraperbykejora-0.4.tar.gz (21.7 kB view details)

Uploaded Source

Built Distribution

nikescraperbykejora-0.4-py3-none-any.whl (21.2 kB view details)

Uploaded Python 3

File details

Details for the file nikescraperbykejora-0.4.tar.gz.

File metadata

  • Download URL: nikescraperbykejora-0.4.tar.gz
  • Upload date:
  • Size: 21.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.1

File hashes

Hashes for nikescraperbykejora-0.4.tar.gz
Algorithm Hash digest
SHA256 44334988ea2d20d75cae262b072bad546a740648de0cb4ad81e62c750dc8870a
MD5 bf558460e274455f02463e60ab29533b
BLAKE2b-256 1f1336cf198ea54911433a8518106c0e6abf2a689848c5e2aa060213ee6a3db6

See more details on using hashes here.

File details

Details for the file nikescraperbykejora-0.4-py3-none-any.whl.

File metadata

File hashes

Hashes for nikescraperbykejora-0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 997274c71c74def4d4434c035141ebc1a62d96a8b602a792bb894c801ba03d6a
MD5 418eb568311be8ce700f6c11a5ce92c6
BLAKE2b-256 5f3d2adddfca415d619fe00f57550c5a4576dea4fa41bf970475c9a9d6204c4d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page