Skip to main content

Módulo de scraping de tabelas web para o JornadaRPA

Project description

JornadaRPA.WebScrap

JornadaRPA.WebScrap é um módulo Python projetado para facilitar o scraping de dados de tabelas em páginas web, utilizando o BotCity Web Automation e Pandas.


🚀 Funcionalidades

  • Extrai dados tabulares de páginas web.
  • Suporte para automação com o framework BotCity.
  • Retorna os dados em um DataFrame do Pandas.

🛠️ Pré-requisitos

Certifique-se de ter os seguintes pacotes instalados:

  • botcity-framework-web
  • pandas

Para instalá-los:

pip install botcity-framework-web pandas

## 📦 Como usar

1. Inicie o BotCity WebBot

from botcity.web import WebBot

# Inicializando o bot
bot = WebBot()
bot.start_browser()
bot.navigate_to("https://sua-pagina-web.com")

2. Use o módulo WebScrap
from jornadaRPA.webScrap import Webscrap

# Configurando o scraper
scraper = Webscrap()

# Extraindo dados da tabela
data = scraper.webscrap(
    inBot=bot,
    inLines=10,               # Máximo de linhas a extrair
    inNext="//button[@id='next']",  # XPath do botão "Próximo"
    inXPATH="//table[@id='data']"  # XPath da tabela
)

# Visualizando os dados
print(data)


## 🛡️ Licença
Este projeto está licenciado sob a MIT License. Você pode usar, modificar e distribuir este código livremente, desde que mantenha os créditos.


## 📫 Contato
Se você tiver dúvidas, sugestões ou problemas, entre em contato:

Email: alexdiogo@desafiosrpa.com.br

---

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

JornadaRPA.WebScrap-0.1.0.tar.gz (3.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

JornadaRPA.WebScrap-0.1.0-py3-none-any.whl (4.5 kB view details)

Uploaded Python 3

File details

Details for the file JornadaRPA.WebScrap-0.1.0.tar.gz.

File metadata

  • Download URL: JornadaRPA.WebScrap-0.1.0.tar.gz
  • Upload date:
  • Size: 3.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.0

File hashes

Hashes for JornadaRPA.WebScrap-0.1.0.tar.gz
Algorithm Hash digest
SHA256 326a36418ae60006fef879b10dd24c58b62187f6b793a394b563b281a7622b08
MD5 e8241ce62d4279b011127870c3e608c7
BLAKE2b-256 7cd490584bb6b1a2493c820ff3c22a121bb7da4e33e1f5189d79c8e36a61130d

See more details on using hashes here.

File details

Details for the file JornadaRPA.WebScrap-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for JornadaRPA.WebScrap-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f85af6ff49624080380f41186d2b5b96aa85b4bf27d652a00e9d10e18a70ed1d
MD5 5349c800c9b4027db6ea3ec2488999d7
BLAKE2b-256 db21ab57e5a8ef504fbe8295c1baa7c175a564aace85448ee6b24e81b0c0faef

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page