Skip to main content

Extracting information from the website

Project description

putali

Under construction! Give a try at it!

Developed by Ujjawal Shah (c) 2022

Examples of How To Use Package

Count number of total URL

import putali

urls = putali.Getallurls('https://www.bok.com.np/')
total_unique_url = len(urls.uniqueurls())
print(f'total unique urls: {total_unique_url}')

Get all urls from which information can be extracted. Those webpages which are non-image can only be used to extract information i.e webpages excluding '.pdf', '.jpeg', '.jpg', '.zip', '.png' extension

import putali

urls = putali.Getallurls('https://www.bok.com.np/')

#usefulurls() will filter the urls except the ones with '.pdf', '.jpeg', '.jpg', '.zip', '.png' extension
useful_urls = urls.usefulurls()
print(f'useful urls: {useful_urls}')

Print all the emails from the website

import putali

urls = putali.Getallurls('https://www.bok.com.np/')
email_address = urls.emails()
print(f'emails: {email_address}')

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

putali-0.0.1.tar.gz (3.1 kB view hashes)

Uploaded Source

Built Distribution

putali-0.0.1-py3-none-any.whl (3.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page