A simple package for scraping product information on the Nike website (nike.com/id)
Project description
Nike Scraper By Kejora
A simple Python package for scraping detailed product information from the Nike Indonesia site (nike.com/id).
Table of Contents
Installation
Make sure you have created a virtual environment before installing the package.
To use this package, you need to install it in your Python virtual environment. You can do this using pip:
pip install -U nikescraperbykejora
You also need to install the following dependencies:
- pandas>=2.1.1;
- httpx>=0.25.0;
- playwright>=1.38.0;
- selectolax>=0.3.16;
Usage
Directory Structure
Make directory structure to use package as below:
example_scraper/
├── result
├── main.py
├── .gitignore
└── README.md
Scraping a Single Product
You can scrape data for a single Nike product using the provided main.py script. See example below:
Here's an example of how to use it. Scraping detail data of Nike Air Force 1 '07 with the url:
https://www.nike.com/id/t/air-force-1-07-shoe-NMmm1B/DD8959-100
So...the code for scraping in your main.py file as below.
import asyncio
import os
from nikescraperbykejora.scrapingresult import ProductScraperHandler
async def main():
# User input for scraping one product
target_url_one = "https://www.nike.com/id/t/air-force-1-07-shoe-NMmm1B/DD8959-100" # Change with url you want to scrape, DON'T skip quotation mark (" ")
txt_file_name = "Nike Air Force 1 '07.txt" # Change with the name of product you want to scrape, DON'T skip .txt
# Setting result directory
project_directory = os.path.dirname(os.path.abspath(__file__))
os.chdir(project_directory)
result_directory = os.path.join(project_directory, "result")
if not os.path.exists(result_directory):
os.makedirs(result_directory)
result_file_path = os.path.join(result_directory, txt_file_name)
await ProductScraperHandler.one_product(target_url_one, result_file_path)
if __name__ == "__main__":
asyncio.run(main())
Scraping Multi Product
You can scrape data for some Nike products in product category using the provided main.py script. See example as below:
On nike.com/id navbar you choose Men > Football > Shop by price Rp1.500.001 - Rp2.999.999, so below the data will be display on the site:
https://www.nike.com/id/w/mens-1500001-2999999-football-shoes-1gdj0z2952fznik1zy7ok
- Multi Product Category Name : Men's Rp1.500.001 - Rp2.999.999 Football Shoes
- Product count : 14 # It could be that when you try this URL, the product count value is different
And...the code for scraping in your main.py file as below.
import asyncio
import os
from nikescraperbykejora.scrapingresult import ProductScraperHandler
async def main():
# User input for scraping multi products
target_url_multi = "https://www.nike.com/id/w/mens-1500001-2999999-football-shoes-1gdj0z2952fznik1zy7ok" # Change with url you want to scrape
csv_file_name = "Men's Rp1.500.001 - Rp2.999.999 Football Shoes.csv" # Change with the name of product category you want to scrape, DON'T skip .CSV
product_count = 14 # Change with product count that displayed on the page
timeout_seconds = 10 # Change with a higher integer value if scraping fails due to timeout
# Setting result directory
project_directory = os.path.dirname(os.path.abspath(__file__))
os.chdir(project_directory)
result_directory = os.path.join(project_directory, "result")
if not os.path.exists(result_directory):
os.makedirs(result_directory)
result_file_path = os.path.join(result_directory, csv_file_name)
await ProductScraperHandler.multi_product(target_url_multi, result_file_path, product_count, timeout_seconds)
if __name__ == "__main__":
asyncio.run(main())
WARNING
Make sure double quotation mark (" ") for url, .txt for single product file name, .csv for multi product file name.
Scraping Result
The scraping result will be saved in the 'result' directory that automatically appears after the scraping process is complete.
Contributing
If you'd like to contribute to this project, please follow these steps:
- Fork the repository on GitHub.
- Create a new branch with a descriptive name.
- Make your changes and commit them.
- Push your changes to your fork.
- Submit a pull request to the original repository.
License
This project is licensed under GNU General Public License (GPL) - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file nikescraperbykejora-0.4.tar.gz
.
File metadata
- Download URL: nikescraperbykejora-0.4.tar.gz
- Upload date:
- Size: 21.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 44334988ea2d20d75cae262b072bad546a740648de0cb4ad81e62c750dc8870a |
|
MD5 | bf558460e274455f02463e60ab29533b |
|
BLAKE2b-256 | 1f1336cf198ea54911433a8518106c0e6abf2a689848c5e2aa060213ee6a3db6 |
File details
Details for the file nikescraperbykejora-0.4-py3-none-any.whl
.
File metadata
- Download URL: nikescraperbykejora-0.4-py3-none-any.whl
- Upload date:
- Size: 21.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 997274c71c74def4d4434c035141ebc1a62d96a8b602a792bb894c801ba03d6a |
|
MD5 | 418eb568311be8ce700f6c11a5ce92c6 |
|
BLAKE2b-256 | 5f3d2adddfca415d619fe00f57550c5a4576dea4fa41bf970475c9a9d6204c4d |