A Python-based web scraping tool for collecting financial data from multiple sources
Project description
Financial Scraper
A Python-based web scraping tool for collecting and analyzing financial data from multiple sources. This project helps you gather information about stocks from various financial websites.
Features
- Scrapes financial data from multiple sources:
- FundsExplorer
- StatusInvest
- Investidor10
- TradingView
- DadosDeMercado
- Collects information about:
- Stocks
- REITs (Brazilian FIIs) dividends
- Stock details from TradingView (name, sector, logo)
- Complete list of stocks from B3 (Brazilian stock exchange)
- Automatically saves data in organized CSV format
- Modular architecture for easy extension
Disclaimer: This tool relies on web scraping techniques to collect data from financial websites. If any of the algorithms stop working, it may be due to changes in the structure or content of the websites being scraped. Web scraping is inherently fragile and dependent on website stability. Regular maintenance may be required to adapt to website changes.
Prerequisites
- Python 3.10 or higher
- Poetry (for dependency management)
Installation
- Clone the repository:
git clone https://github.com/johnazedo/financial-scraper.git
cd financial-scraper
- Install dependencies using Poetry:
poetry install
Usage
Collect Stock Data
Get stocks financial data from status invest site or fundamentus site
poetry run example_status_invest
poetry run example_fundamentus
poetry run example_investor_ten
poetry run example_trading_view
poetry run example_market_data
Python API
You can also use Financial Scraper as a Python library in your own code:
Using the Status Invest Provider
from financial_scraper import StatusInvestProvider
import os
# Set the download path
download_path = os.path.dirname(os.path.abspath(__file__))
# Initialize the provider
provider = StatusInvestProvider(
download_path=download_path,
)
# Fetch all stocks
provider.run()
# Fetch stocks from a specific sector
provider.run(sector=StatusInvestProvider.Sector.FINANCIAL_AND_OTHERS)
Using the Fundamentus Provider
from financial_scraper import FundamentusProvider
import os
# Set the download path
download_path = os.path.dirname(os.path.abspath(__file__))
# Initialize the provider
provider = FundamentusProvider(
download_path=download_path,
)
# Fetch and save data
provider.run()
Using the InvestorTen Provider
from financial_scraper import InvestorTenProvider
import os
# Set the download path
download_path = os.path.dirname(os.path.abspath(__file__))
# Initialize the provider
provider = InvestorTenProvider(
download_path=download_path,
)
# Fetch REIT dividend data for a specific year
provider.run(year="2023")
# You can also specify a custom filename for the output
provider = InvestorTenProvider(
download_path=download_path,
filename="fiis-dividends-2023.csv"
)
provider.run(year="2023")
Using the TradingView Provider
from financial_scraper import TradingViewProvider
import os
# Set the download path
download_path = os.path.dirname(os.path.abspath(__file__))
# Initialize the provider
provider = TradingViewProvider(
download_path=download_path,
)
# Fetch stock data for specific tickers
provider.run(stocks=["PETR4", "VALE3", "ITUB4", "BBDC4"])
# You can also specify a custom filename for the output
provider = TradingViewProvider(
download_path=download_path,
filename="brazilian_stocks_info.csv"
)
provider.run(stocks=["PETR4", "VALE3", "ITUB4", "BBDC4"])
Using the MarketData Provider
from financial_scraper import MarketDataService
import os
# Set the download path
download_path = os.path.dirname(os.path.abspath(__file__))
# Initialize the provider
provider = MarketDataService(
download_path=download_path,
)
# Download the complete list of B3 stocks
provider.run()
# You can also specify a custom filename for the output
provider = MarketDataService(
download_path=download_path,
filename="b3_stocks_list.csv",
show_browser=True # Set to True to see the browser during execution
)
provider.run()
Project Structure
├── LICENSE
├── poetry.lock
├── pyproject.toml
├── README.md
├── CONTRIBUTING.md
├── mkdocs.yml
├── docs/ # Documentation files
│ ├── index.md # Main documentation page
│ ├── examples.md # Usage examples
│ ├── getting-started/# Installation and basic usage
│ └── modules/ # Module-specific documentation
├── examples/ # Example usage scripts
│ └── usage.py # Example implementation
├── financial_scraper/ # Core package
│ ├── __init__.py # Package exports
│ ├── config/ # Configuration utilities
│ │ ├── __init__.py
│ │ ├── selenium.py # Selenium configuration
│ │ └── utils.py # Utility functions and logging
│ └── providers/ # Data providers
│ ├── __init__.py
│ ├── fundamentus.py # Fundamentus scraper
│ ├── investor_ten.py # Investidor10 scraper
│ ├── market_data.py # DadosDeMercado scraper
│ ├── status_invest.py # StatusInvest scraper
│ └── trading_view.py # TradingView scraper
Dependencies
- beautifulsoup4 - Web scraping and parsing
- requests - HTTP requests
- selenium - Web browser automation
- pandas - Data manipulation and analysis
Author
- João Pedro Limão (jplimao077@gmail.com)
License
This project is licensed under the terms of the LICENSE file included in the repository.
Documentation
The documentation for this project, including code comments and provider-specific guides, was enhanced using AI assistance. The AI helped to create comprehensive docstrings, usage examples, and module explanations to make the codebase more accessible to contributors and users.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file financial_scraper-1.1.0.tar.gz.
File metadata
- Download URL: financial_scraper-1.1.0.tar.gz
- Upload date:
- Size: 23.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.2.1 CPython/3.10.19 Linux/6.14.0-1012-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f1d5cf339ae4e97220e892597d2637700d49d57cdbb1fab5ab52d10df7518ac8
|
|
| MD5 |
9101a67e30b6cd8995d9745476126853
|
|
| BLAKE2b-256 |
1c1bf3632d9585bf56d953bcbafc19ad497c0561c88e713d09a32f456c0e26f2
|
File details
Details for the file financial_scraper-1.1.0-py3-none-any.whl.
File metadata
- Download URL: financial_scraper-1.1.0-py3-none-any.whl
- Upload date:
- Size: 31.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.2.1 CPython/3.10.19 Linux/6.14.0-1012-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1c3902b10db3a31b2e89e49cf2f6c664fda7f30e5f227e0cf8cc95c1b6e91852
|
|
| MD5 |
58afb3eeba8230e7911ef057b72b8122
|
|
| BLAKE2b-256 |
2ecb17219ff9cc437917d3bb2917ac97ca99f928b0abc896a65855a491109a05
|