SpiderNet

A python package to simplify web scraping . Built using REgex and Curl

These details have not been verified by PyPI

Project links

Homepage

Project description

A simple and lightweight library for scraping the web

Built on Curl and Regex in python , SpiderNet offers similar functionality to the (BeautifulSoup and requests) alternative . For the package to work , you need to have curl installed in your system .

Install the latest version from Pypi or the releases page

pip install SpiderNet

Features
- Scrape tags from websites
- Scrape the text within the tags
- Obtain href attributes for the tag (anchor tag)
- The package contains new Datatypes made for easier workflow which integrate with the parameters and values of the package.

The main class is `GenSpider` .

from SpiderNet import GenSpider
web=GenSpider(<website>)

The methods are

website_text

find_all_html_tags

extract_text_from_html

find_all_tags_by_classname

get_href_from_a_tags

get_src_from_img_tags

Example code of extracting Comic Book Chapters from readallcomics , using the new DataTypes , and their respective href attributes

from SpiderNet import HashMap , ForEach , GenSpider , Str


string=Str("https://readallcomics.com/category/chakra-the-invincible/")
web=GenSpider(string)
x=web.find_all_tags_by_classname('ul','list-story')
arr=HashMap()
for d in x:
  
    w=web.find_all_html_tags('a',text=d)
    num=1
    link_content=web.get_href_from_a_tags(text=d)
    for y in range(len(w)):
        text_content = web.extract_text_from_html(w[y])
        
        arr.add(text_content,link_content[y])
        num+=1

ForEach(arr).unit()

The output of the code will be as follows

Chakra The Invincible 010 (2016) => https://readallcomics.com/chakra-the-invincible-010-2016/
Chakra The Invincible 009 (2016) => https://readallcomics.com/chakra-the-invincible-009-2016/
Chakra The Invincible 008 (2016) => https://readallcomics.com/chakra-the-invincible-008-2016/
Chakra The Invincible 007 (2016) => https://readallcomics.com/chakra-the-invincible-007-2016/
Chakra The Invincible 006 (2015) => https://readallcomics.com/chakra-the-invincible-006-2015/
Chakra The Invincible 005 (2015) => https://readallcomics.com/chakra-the-invincible-005-2015/
Chakra The Invincible 004 (2015) => https://readallcomics.com/chakra-the-invincible-004-2015/
Chakra The Invincible 003 (2015) => https://readallcomics.com/chakra-the-invincible-003-2015/
Chakra The Invincible 002 (2015) => https://readallcomics.com/chakra-the-invincible-002-2015/
Chakra The Invincible 001 (2015) => https://readallcomics.com/chakra-the-invincible-001-2015/

For more examples look at :

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

1.3

Jul 30, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spidernet-1.3.tar.gz (5.1 kB view hashes)

Uploaded Jul 30, 2024 Source

Built Distribution

SpiderNet-1.3-py3-none-any.whl (5.8 kB view hashes)

Uploaded Jul 30, 2024 Python 3

Hashes for spidernet-1.3.tar.gz

Hashes for spidernet-1.3.tar.gz
Algorithm	Hash digest
SHA256	`de9a9b420bfefecdc40b4ac724069d7b6ac58785b82d3d98611dbef295175fec`
MD5	`95a07ca580337f0a9042f97c86eeed15`
BLAKE2b-256	`dd424a35467e7fe30ccb7822cd5ede7826e8447c662ce849b47ff8928ea3c3e1`

Hashes for SpiderNet-1.3-py3-none-any.whl

Hashes for SpiderNet-1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f02ff83df917e6d25432d3b8de2e5bffbd6161a78fdb90cb9fd3acded34f06b3`
MD5	`5eb951a1a67166fc1ddec371c6a5d514`
BLAKE2b-256	`285b41cb729254a7cf905d4995af3e0fa462222e7eafa3fe8d1a14bc5c8925f0`

SpiderNet 1.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Install the latest version from Pypi or the releases page

The main class is `GenSpider` .

The methods are

Example code of extracting Comic Book Chapters from readallcomics , using the new DataTypes , and their respective href attributes

The output of the code will be as follows

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

SpiderNet 1.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Install the latest version from Pypi or the releases page

The main class is GenSpider .

The methods are

Example code of extracting Comic Book Chapters from readallcomics , using the new DataTypes , and their respective href attributes

The output of the code will be as follows

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

The main class is `GenSpider` .