scrapy-html

🌐 Um simples scraper que retorna o HTML completo de uma URL usando BeautifulSoup

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

🚀 README.md

# 🌐 Scrapy-HTML

🔍 **Scrapy-HTML** é um pacote Python simples e eficiente que faz scraping do conteúdo HTML completo de qualquer página web fornecida. Ele utiliza as bibliotecas **BeautifulSoup4** e **Requests** para realizar a raspagem e retornar o HTML de forma estruturada e legível.

---

## ✨ **Características Principais**

- 🌎 Faz scraping de qualquer página web com uma URL válida.
- ⚡ Retorna o HTML formatado e legível usando `BeautifulSoup.prettify()`.
- 🔒 Tratamento de erros robusto para URLs inválidas ou problemas de rede.
- 💡 Leve e fácil de usar, com dependências mínimas.

---

## ⚡ **Instalação**

Para instalar o pacote diretamente do **PyPI**, execute:

```bash
pip install scrapy_html

💻 Como Usar

🔥 Exemplo básico de uso:

from scrapy_html.scraper import get_html_content

# 🌐 URL da página que deseja raspar
url = "https://www.example.com"

# 🔄 Obtendo o conteúdo HTML da página
html = get_html_content(url)

# 📝 Exibindo o HTML formatado
print(html)

🔍 Saída esperada:

<html>
  <head>
    <title>Example Domain</title>
  </head>
  <body>
    <div>
      <h1>Example Domain</h1>
      <p>This domain is for use in illustrative examples in documents.</p>
    </div>
  </body>
</html>

🛠 Requisitos

Python >= 3.6
beautifulsoup4
requests

As dependências são instaladas automaticamente com o comando pip install scrapy-html.

🧪 Testes

Este projeto inclui testes básicos usando pytest. Para rodar os testes localmente:

pip install pytest
pytest tests/

🎨 Recursos Futuros

🌐 Suporte a diferentes parsers (lxml, html5lib).
🔄 Scraping assíncrono para maior desempenho.
⚡ Download de recursos estáticos (imagens, CSS, JS).
🎛 Parâmetros adicionais para scraping parcial.
🧪 Testes automatizados avançados com requests-mock.

🏗 Estrutura do Projeto

scrapy_html/
│
├── scrapy_html/             # 📦 Código principal
│   ├── __init__.py
│   └── scraper.py           # ⚡ Função principal do scraper
│
├── tests/                   # 🧪 Testes automatizados
│   └── test_scraper.py
│
├── setup.py                 # ⚙️ Configuração para PyPI
├── pyproject.toml           # 📦 Configuração moderna
├── README.md                # 📚 Documentação do projeto
├── LICENSE                  # 📜 Licença MIT
└── MANIFEST.in              # 📋 Inclusão de arquivos extras

🔧 Contribuindo

Contribuições são bem-vindas! 🚀
Para contribuir, siga estas etapas:

Fork este repositório.

Crie uma nova branch:

git checkout -b minha-nova-funcionalidade

Faça suas alterações e faça commit:

git commit -m "✨ Adicionando nova funcionalidade incrível"

Envie para o branch:

git push origin minha-nova-funcionalidade

Abra um Pull Request. 💡

📝 Licença

Distribuído sob a Licença MIT. Veja o arquivo LICENSE para mais informações.

👨‍💻 Autor

Desenvolvido por Roberto Lima 🚀✨

💬 Contato

📧 Email: robertolima.izphera@gmail.com
💼 LinkedIn: Roberto Lima

⭐ Gostou do projeto?

Deixe uma ⭐ no repositório e compartilhe com a comunidade! 🚀✨


---

## 🌟 **O que este README oferece?**
- 🎯 **Descrição clara** do projeto e seu propósito.  
- 🛠 **Instruções detalhadas de instalação** e **uso prático**.  
- 🧪 **Guia de testes** para garantir que o código funciona.  
- 🏗 **Estrutura do projeto** para facilitar a navegação.  
- 🔄 **Seção de contribuição** para quem deseja ajudar no desenvolvimento.  
- 📝 **Licença e informações do autor** para transparência.

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

1.4.1

May 29, 2025

1.4.0

May 29, 2025

1.3.0

May 6, 2025

1.2.0

Apr 29, 2025

1.1.5

Apr 16, 2025

1.1.4

Feb 25, 2025

0.1.4

Feb 24, 2025

This version

0.1.3

Feb 24, 2025

0.1.2

Feb 24, 2025

0.1.1

Feb 24, 2025

0.1.0

Feb 24, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapy_html-0.1.3.tar.gz (4.3 kB view details)

Uploaded Feb 24, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

scrapy_html-0.1.3-py3-none-any.whl (4.3 kB view details)

Uploaded Feb 24, 2025 Python 3

File details

Details for the file scrapy_html-0.1.3.tar.gz.

File metadata

Download URL: scrapy_html-0.1.3.tar.gz
Upload date: Feb 24, 2025
Size: 4.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.7

File hashes

Hashes for scrapy_html-0.1.3.tar.gz
Algorithm	Hash digest
SHA256	`5bc165bfb260700183809e31422590d92f5e955318432af674f43d04a9d1d65c`
MD5	`f3bb3f160d519057146dd720ead32cd8`
BLAKE2b-256	`a8448342d903e5c402540a6e2da8f13f1e3cffc48c079d6297f11238d2a9ec22`

See more details on using hashes here.

File details

Details for the file scrapy_html-0.1.3-py3-none-any.whl.

File metadata

Download URL: scrapy_html-0.1.3-py3-none-any.whl
Upload date: Feb 24, 2025
Size: 4.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.7

File hashes

Hashes for scrapy_html-0.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`976b75b1fd65685bfec21588c4a960654867bd3efc5bf298f2f66353ff44de6d`
MD5	`927c6c76250d913fdfea95c4a155be21`
BLAKE2b-256	`2fdce9fbb34e1dfcf1dcd83f4f8bac87f4c99da89f6b10ce9736d8b25ac38945`

See more details on using hashes here.

scrapy-html 0.1.3

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

🚀 README.md

💻 Como Usar

🔥 Exemplo básico de uso:

🔍 Saída esperada:

🛠 Requisitos

🧪 Testes

🎨 Recursos Futuros

🏗 Estrutura do Projeto

🔧 Contribuindo

📝 Licença

👨‍💻 Autor

💬 Contato

⭐ Gostou do projeto?

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes