Skip to main content

Módulo de scraping de tabelas web para o JornadaRPA

Project description

JornadaRPA.WebScrap

JornadaRPA.WebScrap é um módulo Python projetado para facilitar o scraping de dados de tabelas em páginas web, utilizando o BotCity Web Automation e Pandas.


🚀 Funcionalidades

  • Extrai dados tabulares de páginas web.
  • Suporte para automação com o framework BotCity.
  • Retorna os dados em um DataFrame do Pandas.

🛠️ Pré-requisitos

Certifique-se de ter os seguintes pacotes instalados:

  • botcity-framework-web
  • pandas

Para instalá-los:

pip install botcity-framework-web pandas

## 📦 Como usar

1. Inicie o BotCity WebBot

from botcity.web import WebBot

# Inicializando o bot
bot = WebBot()
bot.start_browser()
bot.navigate_to("https://sua-pagina-web.com")

2. Use o módulo WebScrap
from jornadaRPA.webScrap import Webscrap

# Configurando o scraper
scraper = Webscrap()

# Extraindo dados da tabela
data = scraper.webscrap(
    inBot=bot,
    inLines=10,               # Máximo de linhas a extrair
    inNext="//button[@id='next']",  # XPath do botão "Próximo"
    inXPATH="//table[@id='data']"  # XPath da tabela
)

# Visualizando os dados
print(data)


## 🛡️ Licença
Este projeto está licenciado sob a MIT License. Você pode usar, modificar e distribuir este código livremente, desde que mantenha os créditos.


## 📫 Contato
Se você tiver dúvidas, sugestões ou problemas, entre em contato:

Email: alexdiogo@desafiosrpa.com.br

---

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jornadarpa_webscrap-0.1.3.tar.gz (4.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

JornadaRPA.WebScrap-0.1.3-py3-none-any.whl (4.8 kB view details)

Uploaded Python 3

File details

Details for the file jornadarpa_webscrap-0.1.3.tar.gz.

File metadata

  • Download URL: jornadarpa_webscrap-0.1.3.tar.gz
  • Upload date:
  • Size: 4.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.0

File hashes

Hashes for jornadarpa_webscrap-0.1.3.tar.gz
Algorithm Hash digest
SHA256 7f0466ebaf05918da8729b5d211fba012d818c3a1790e55f2c1128d0413bde9d
MD5 ead0f45284d9cd883b7ff0c27b8f9518
BLAKE2b-256 f82dc86c333ff9ee997a555651481c1318e62cfc204ead39ad9559c50fb9c0fd

See more details on using hashes here.

File details

Details for the file JornadaRPA.WebScrap-0.1.3-py3-none-any.whl.

File metadata

File hashes

Hashes for JornadaRPA.WebScrap-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 4b387184205d0b2c071178a81530190a5cf08cebfc98bdec0f7bcb7a8815c69a
MD5 649d44cf9c77131c129ea72b986feb74
BLAKE2b-256 ce7fb00ca536ab64cb8c60b6daec6db7e2c807b1e4170e7d0cfa631f8a9ff472

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page