Skip to main content

A Python library for scraping the Google search engine.

Project description

googlesearch

googlesearch is a Python library for searching Google, easily. googlesearch uses requests and BeautifulSoup4 to scrape Google.

Installation

To install, run the following command:

python3 -m pip install googlesearch-python

Usage

To get results for a search term, simply use the search function in googlesearch. For example, to get results for "Google" in Google, just run the following program:

from googlesearch import search
search("Google")

Additional options

googlesearch supports a few additional options. By default, googlesearch returns 10 results. This can be changed. To get a 100 results on Google for example, run the following program.

from googlesearch import search
search("Google", num_results=100)

If you want to have unique links in your search result, you can use the unique option as in the following program.

from googlesearch import search
search("Google", num_results=100, unique=True)

In addition, you can change the language google searches in. For example, to get results in French run the following program:

from googlesearch import search
search("Google", lang="fr")

You can also specify the region (Country Codes) for your search results. For example, to get results specifically from the US run the following program:

from googlesearch import search
search("Google", region="us")

If you want to turn off the safe search function (this function is on by default), you can do this:

from googlesearch import search
search("Google", safe=None)

To extract more information, such as the description or the result URL, use an advanced search:

from googlesearch import search
search("Google", advanced=True)
# Returns a list of SearchResult
# Properties:
# - title
# - url
# - description

If requesting more than 100 results, googlesearch will send multiple requests to go through the pages. To increase the time between these requests, use sleep_interval:

from googlesearch import search
search("Google", sleep_interval=5, num_results=200)
If requesting more than 10 results, but want to manage the batching yourself? 
Use `start_num` to specify the start number of the results you want to get:
```python
from googlesearch import search
search("Google", sleep_interval=5, num_results=200, start_result=10)

If you are using a HTTP Rotating Proxy which requires you to install their CA Certificate, you can simply add ssl_verify=False in the search() method to avoid SSL Verification.

from googlesearch import search

proxy = 'http://API:@proxy.host.com:8080/'

j = search("proxy test", num_results=100, lang="en", proxy=proxy, ssl_verify=False)
for i in j:
    print(i)

Asyncio implementations disabled the ssl_verify key, which is seemingly not accepted by httpx. A simple example:

import asyncio
from googlesearch import asearch

async def main():
    proxy='http://API:@proxy.host.com:8080'
    r = asearch("hello world", advanced=True, proxy=proxy)
    async for i in r:
        print(i)

r = asyncio.run(main())

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mi_googlesearch_python-1.3.0.post1.tar.gz (5.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mi_googlesearch_python-1.3.0.post1-py3-none-any.whl (7.6 kB view details)

Uploaded Python 3

File details

Details for the file mi_googlesearch_python-1.3.0.post1.tar.gz.

File metadata

File hashes

Hashes for mi_googlesearch_python-1.3.0.post1.tar.gz
Algorithm Hash digest
SHA256 ede3d79fca5390a32cb72d8c9a310eeab55e3ebc513d03c47e1895031c365bae
MD5 f097e3a421f2900a4f80a9a6122d8850
BLAKE2b-256 2d14546eff9f7815ed9e8d762ddf92e586846ec0e16149cee69064d536e7285b

See more details on using hashes here.

File details

Details for the file mi_googlesearch_python-1.3.0.post1-py3-none-any.whl.

File metadata

File hashes

Hashes for mi_googlesearch_python-1.3.0.post1-py3-none-any.whl
Algorithm Hash digest
SHA256 0a15cddf814a49932c338f65e91dc1e607823ac5e9c7bbb05a68c0aff3bc548b
MD5 f3eac79f1f0669b7501538b1a2d349f8
BLAKE2b-256 ca32f14fac842c75247ebe390f12dfe6219f3172b9bec9329853ca07d1f1e2b0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page