Skip to main content

Módulo de scraping de tabelas web para o JornadaRPA

Project description

JornadaRPA.WebScrap

JornadaRPA.WebScrap é um módulo Python projetado para facilitar o scraping de dados de tabelas em páginas web, utilizando o BotCity Web Automation e Pandas.


🚀 Funcionalidades

  • Extrai dados tabulares de páginas web.
  • Suporte para automação com o framework BotCity.
  • Retorna os dados em um DataFrame do Pandas.

🛠️ Pré-requisitos

Certifique-se de ter os seguintes pacotes instalados:

  • botcity-framework-web
  • pandas

Para instalá-los:

pip install botcity-framework-web pandas

## 📦 Como usar

1. Inicie o BotCity WebBot

from botcity.web import WebBot

# Inicializando o bot
bot = WebBot()
bot.start_browser()
bot.navigate_to("https://sua-pagina-web.com")

2. Use o módulo WebScrap
from jornadaRPA.webScrap import Webscrap

# Configurando o scraper
scraper = Webscrap()

# Extraindo dados da tabela
data = scraper.webscrap(
    inBot=bot,
    inLines=10,               # Máximo de linhas a extrair
    inNext="//button[@id='next']",  # XPath do botão "Próximo"
    inXPATH="//table[@id='data']"  # XPath da tabela
)

# Visualizando os dados
print(data)


## 🛡️ Licença
Este projeto está licenciado sob a MIT License. Você pode usar, modificar e distribuir este código livremente, desde que mantenha os créditos.


## 📫 Contato
Se você tiver dúvidas, sugestões ou problemas, entre em contato:

Email: alexdiogo@desafiosrpa.com.br

---

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jornadarpa_webscrap-0.1.1.tar.gz (4.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

JornadaRPA.WebScrap-0.1.1-py3-none-any.whl (4.7 kB view details)

Uploaded Python 3

File details

Details for the file jornadarpa_webscrap-0.1.1.tar.gz.

File metadata

  • Download URL: jornadarpa_webscrap-0.1.1.tar.gz
  • Upload date:
  • Size: 4.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.0

File hashes

Hashes for jornadarpa_webscrap-0.1.1.tar.gz
Algorithm Hash digest
SHA256 1d3ee6a2d7a84d3005adaadba9f351248298e119f5e03c10521fdc535c78313d
MD5 cd9be0af7aa5fd9808c08fb2e5517452
BLAKE2b-256 3efbe8920ae33052a555f9eab1dc6ac34822aeb2e3cd028808c2afcc54926c5a

See more details on using hashes here.

File details

Details for the file JornadaRPA.WebScrap-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for JornadaRPA.WebScrap-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 572953ebd619cbe7c40684d47b09e22aaa16a99f2dfa3218e202c48f8b4b1405
MD5 3c0fd751dabcefa900a48310ee9b7c79
BLAKE2b-256 f26802cf88b05c935215e3e3e6829f3c4673307b274f507b2eac7b5c5dedc910

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page