Skip to main content

Toolkit para web scraping combinando Selenium e BeautifulSoup

Project description

Quick Scraping

Uma biblioteca Python completa para web scraping e automação, combinando o poder do Selenium e BeautifulSoup.

Características

  • Módulo Selenium: Automação completa de navegadores web
  • Módulo BeautifulSoup: Análise e extração avançada de HTML
  • Integração: Funções para trabalhar com ambas as bibliotecas de forma integrada
  • Ferramentas de Utilidade: Processamento de texto, CSV, JSON e mais
  • Logging: Sistema de registro completo para todas as operações

Instalação

pip install quick-Scraping

Para instalar com dependências opcionais:

# Para desenvolvimento
pip install quick-Scraping[dev]

# Para documentação
pip install quick-Scraping[docs]

Uso Básico

Exemplo com Selenium

from quick_Scraping.selenium_functions import SeleniumHelper

# Inicializa o Selenium Helper
with SeleniumHelper(browser_type="chrome", headless=True) as selenium:
    # Navega para uma URL
    selenium.navigate.to("https://www.exemplo.com")
    
    # Espera por um elemento e clica nele
    selenium.element.wait_for_element("id", "meu-botao", timeout=10)
    selenium.interact.click("id", "meu-botao")
    
    # Obtém o HTML da página
    html = selenium.driver.page_source

Exemplo com BeautifulSoup

from quick_Scraping.beautifulsoup_functions import HTMLParser, DataExtractor

# Inicializa o parser HTML e o extrator de dados
parser = HTMLParser()
extractor = DataExtractor()

# Carrega HTML de um arquivo ou URL
soup = parser.load_from_url("https://www.exemplo.com")

# Extrai dados estruturados
links = extractor.extract_links(soup)
table_headers, table_data = extractor.extract_table(soup, table_selector="table.dados")

# Extrai artigo completo
article = extractor.extract_article_content(soup)

Exemplo de Integração

from quick_Scraping.common import ScrapingHelper

# Inicializa o helper integrado
scraper = ScrapingHelper()

# Configura extração de dados
extraction_config = {
    "title": {"type": "text", "selector": "h1.title"},
    "products": {"type": "table", "selector": "table.products"},
    "links": {"type": "links", "selector": "a.product-link"}
}

# Navega e extrai dados
results = scraper.extract_data_with_selenium(
    url="https://www.exemplo.com/produtos",
    extraction_config=extraction_config,
    wait_time=3
)

# Salva os resultados
scraper.save_results(results, "produtos.json", format="json")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

quick_scraping-1.0.1-py3-none-any.whl (51.1 kB view details)

Uploaded Python 3

File details

Details for the file quick_scraping-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: quick_scraping-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 51.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.1

File hashes

Hashes for quick_scraping-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 4dbd7443c8e1e08797b49c2269f89f587f7b04a0eb7ab6b5e8c6bbbe75e1c94a
MD5 56e1c2dd63f0f1c5c76bf8ecabe70a82
BLAKE2b-256 ad94d6f816ba8811b73e3f36883081e93f6d4152241642f2adddb27e311da283

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page