A package to scrap the web dynamically

These details have not been verified by PyPI

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Dynamic Scrapper STILL IN DEVELOPMENT

The Goal Of This Package:

The goal of this package is being able to scrap any site given and gather any data no matter they type/style of said site.

This project was a huge part of understanding webscrapping in general.

How to use:

The package is equipped with a CLICK CLI to give the input needed to extract the data needed.

Steps:

First step is giving the link of the site you want scrapped. (it can be either a page that contains a bunch of links to click and extract data from inside every link or one single page that you want data extracted from).
If the link you gave is of a page containing a bunch of links to get data from, write the class name of the
containing the "a" tag that contains these links (in most cases these links always have the same class name)[LEAVE EMPTY IF IT'S A SINGLE PAGE EXTRACTION].
Choose the type of pagination used in said page to be able to extract data from all pages:

number: for sites with numbered pagination
see_more: for sites with a see more button that expands the rest of the information
none: incase of a single page or a infinite scrollp age

Write the class name of the pagination class used according to:

number: write the class name of the button "next" or ">" that takes you to the next page
see_more: write the class name of the see_more button
none: leave empty

Choosing the items to scrap This step is gonna look like this:

Choose a name for the item
Choose the type you want to use to extract said item: - tag-name: using html tag names - class-name: using class-names - id: using id - name: using name attribute - link-text: by giving the text inside a tag

Choose the name of the type you specified before: - tag-name : the tag name you want to extract from (h1, h2, a ...) - class-name: the class name needed - id: the id needed - name: the name inside the attribute - link-text: the text inside the tag

IF you are trying to get the value inside an attribute of a tag, write the tag name you want to get (for example you need img url inside an "img" tag, you write "src" to get the text inside the src)

Choose Y or N to continue adding items to scrap or stop.

## Tips for number 5:

use tag-name when there is only one tag of that kind (h1..)
class-name is the most used but make sure to take a unique name so you dont get other items by mistake.
be carefull when using id as an id can change from page to page of the same element.

Project details

These details have not been verified by PyPI

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

10.0.7

Nov 6, 2023

10.0.6

Nov 3, 2023

10.0.5

Nov 3, 2023

10.0.3

Nov 3, 2023

10.0.2

Nov 3, 2023

10.0.1

Nov 3, 2023

10.0.0

Nov 3, 2023

9.4.6

Nov 3, 2023

9.4.5

Nov 3, 2023

9.4.4

Nov 3, 2023

9.4.3

Nov 3, 2023

9.4.2

Nov 3, 2023

5.4.2

Nov 3, 2023

5.4.1

Nov 3, 2023

5.4.0

Nov 3, 2023

5.3.9

Nov 3, 2023

5.3.8

Nov 3, 2023

5.3.7

Nov 3, 2023

5.3.6

Nov 3, 2023

5.3.5

Nov 3, 2023

5.3.4

Nov 3, 2023

5.3.3

Nov 3, 2023

5.3.2

Nov 3, 2023

5.3.1

Nov 3, 2023

5.3.0

Nov 3, 2023

4.3.0

Nov 3, 2023

3.3.0

Nov 3, 2023

This version

2.3.0

Nov 3, 2023

1.3.0

Nov 3, 2023

0.3.0

Nov 3, 2023

0.2.0

Nov 3, 2023

0.1.0

Nov 3, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dynamicscrapper-2.3.0.tar.gz (5.3 kB view hashes)

Uploaded Nov 3, 2023 Source

Hashes for dynamicscrapper-2.3.0.tar.gz

Hashes for dynamicscrapper-2.3.0.tar.gz
Algorithm	Hash digest
SHA256	`bd3421565bf7c8a4c0dd3aa1e15e5088305070369dff61aba63d397fb9630f5b`
MD5	`68d306dc2aece59c50d7517786b2f4c0`
BLAKE2b-256	`a75aebdabac91113639187b6720945290ce5f563242fda5a948f6e2b26410c57`