Skip to main content

A Easy Scrapper for scrapping essential data from web (Like Price of car, bike, mobiles, and flipkart, and PC Building components)

Project description

ExtractKit

what is this??

This is a Easy Scrapper for scrapping essential data from web

What can it Scrape?

How to use this??

So simple!! Just go and type

pip install extractkit

Bam!! You just installed a great scraping tool

How to use in python??

import ExtractKit as ekit #or whatever

Detailed documentation

Scrap Car Data

This scrapes car specification from Carwale Website

To Scrap car Specification,

from ExtractKit import carExtractor as car

data = car.extractCar('hyundai',"creta")
print(data)

extractCar(brand, model),

Parameters, brand - string data type, enter the brand name of the car you want to search model - string data type, enter the model name of the car you want to search

Result Data Looks like this:

    {
    'product': 'Hyundai Creta', 
    'price': '₹ 10.00 Lakh'
    }

Scrap Bike Data

This scrapes bike specification from Bikewale Website

To Scrap Bike Specification,

from ExtractKit import bikeExtractor as bike

data = bike.extractBike('yamaha',"R15")
print(data)

extractBike(brand, model)

Parameters, brand - string data type, enter the brand name of the bike you want to search model - string data type, enter the model name of the bike you want to search

Result Data Looks like this:

    {
    "product": "Yamaha YZF R15 V3", 
    "price": "1,55,334", 
    "Engine Capacity": "155 cc", 
    "Mileage": "43 kmpl", 
    "Transmission": "6 Speed Manual ", 
    "Kerb Weight": "142 kg", 
    "Fuel Tank Capacity": "11 litres", 
    "Seat Height": "815 mm"
    }

Scrap Mobile Specifications

This scraps data from 91 Mobiles Website

from ExtractKit import gadgetsExtractor as gadgets
x = gadgets.extractMobile('samsung','galaxy a52')
print(x)

In extractMobile(brand, model),

  • Brand and model must be passed as string

Result data looks like this:

{
    "name": "Samsung Galaxy A52",
    "price": " 23,499",
    "variants": [
        "128GB Storage, 8GB RAM\n"
    ],
    "specs": [
        [
            "RAM","6GB"
        ],
        [   
            "Processor","QualcommSnapdragon720G"
        ],
        [   
            "Rear Camera","64MP+12MP+5MP+5MP"
        ],
        [
            "Front Camera","32MP"
        ],
        [
            "Battery","4500mAh"
        ],
        [...],
        .
        .
        .
        [...]
    ]
}

Scrap Flipkart

This Scraps data from flipkart.com

To Scrap Flipkart products,

from ExtractKit import flipkartExtractor as flipkart
x = flipkart.extractFlipkart('XBOX Series S')
print(x)

In extractFlipkart(product),

  • It must be a valid product.
  • Product must be in string format.

Result data looks like this:

{
    "product": "MICROSOFT Xbox Series S 512 GB\u00a0\u00a0(White)",
    "price": "\u20b934,990",
    "url": "https://www.flipkart.com/microsoft-xbox-series-s-512-gb/p/itm13c51f9047da8?pid=GMCFVPFCNHHZQBNA&lid=LSTGMCFVPFCNHHZQBNAH2FMXL&marketplace=FLIPKART&q=XBOX+Series+S&store=4rr&srno=s_1_1&otracker=search&fm=organic&iid=a98ea799-22c1-440d-b211-9835eb1790eb.GMCFVPFCNHHZQBNA.SEARCH&ppt=None&ppn=None&ssid=l55b2xbdfk0000001625828211339&qH=fd75188725591a70"
}

You can also use extractFlipkartUrl(url) to scrap data from the given flipkart link

from ExtractKit import flipkartExtractor as flipkart
x = flipkart.extractFlipkartUrl('https://www.flipkart.com/cyberpunk-2077-limited-edition/p/itm6e7fc6515a3b3?pid=GAMG2Z8UGZHTYHAF&lid=LSTGAMG2Z8UGZHTYHAFHTIMQD&marketplace=FLIPKART&q=cyberpunk+2077&store=4rr%2Ffa6%2F27p%2Fmtr&spotlightTagId=TrendingId_4rr%2Ffa6%2F27p%2Fmtr&srno=s_1_1&otracker=AS_QueryStore_OrganicAutoSuggest_2_9_na_na_na&otracker1=AS_QueryStore_OrganicAutoSuggest_2_9_na_na_na&fm=SEARCH&iid=0fa0a337-bb8d-4126-9fe6-9421bcbdb3ee.GAMG2Z8UGZHTYHAF.SEARCH&ppt=sp&ppn=sp&ssid=rg6sh1i72o0000001625828417574&qH=65fc51bd3f81e7c2')
print(x)

The result for this function is:

{
    "product": "Cyberpunk 2077 (Limited Edition) (for PS4)",
    "price": "₹2,899"
}

Scrap PC Components

This Scraps data from PC Studio

To Scrap PC Components,

from ExtractKit import pcBuildExtractor as pcBuild
x = pcBuild.extractProcessors()
print(x)

Result data looks like this:

{
    "0": {
        "title": "AMD RYZEN 3 3200G OPEN BOX OEM PROCESSOR WITH RX VEGA 8 GRAPHICS (UP TO 4.0GHZ 6 MB CACHE) ",
        "price": "\u20b911,300.00",
        "details": "https://www.pcstudio.in/product/amd-ryzen-3-3200g-open-box-oem-processor-with-rx-vega-8-graphics-up-to-4-0ghz-6-mb-cache/",
        "img": "https://www.pcstudio.in/wp-content/uploads/2021/07/AMD-RYZEN-3-3200G-OPEN-BOX-OEM-PROCESSOR-WITH-RX-VEGA-8-GRAPHICS-1.jpg"
    },
    "1":{...},
    "2":{...},
}

other functions available are:

print(pcBuild.extractProcessors())    #Extract Processors Data
print(extractMotherboards())          #Extract Motherboard Data 
print(extractRams())                  #Extract RAM Data
print(extractStorage1())              #Extract Storages Data
print(extractStorage2())              #Extract Storages Data too
print(extractCabinets())              #Extract Cabinet Data
print(extractCoolers())               #Extract Coolers Data
print(extractGPUs())                  #Extract GPU Data
print(extractPowerSupplys())          #Extract PowerSupplyUnit Data
print(extractDisplays())              #Extract Display Data

Scrap News Data

This scraps live and archive news data from One India News

To Scrap news,

from ExtractKit import newsExtractor as news

data = news.extractNewsArchives('english',2021,7,1)
print(data)

Parameters, language - string data type, enter the language in which you want to search Avaliable Languages are

  • tamil
  • telugu
  • malayalam
  • bengali
  • gujarati
  • hindi
  • kannada

year - integer data type, enter the year in which you want to search month - integer data type, enter the month in which you want to search date - integer data type, enter the date in which you want to search

Result Data Looks like this:

    {
    "language": "english",
    "1": {
        "category": "bengaluru",
        "news": "No interviews to fill up vacancies to post of assistant professors, principles in Karnataka\u2019s higher education"
    },
    "2":{...},
    "3":{...},
    ....

Credits

Karthik Raja K

Vignesh R    


Source Code Available in Github

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

extractkit-1.0.1.tar.gz (7.8 kB view hashes)

Uploaded Source

Built Distribution

extractkit-1.0.1-py3-none-any.whl (10.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page