Web scraping API for Finnish websites
Project description
finscraper
The library provides an easy-to-use API for fetching data from various Finnish websites:
Website | URL | Type | Spider API class |
---|---|---|---|
IltaSanomat | https://www.is.fi | News article | ISArticle |
Iltalehti | https://www.il.fi | News article | ILArticle |
YLE | https://www.yle.fi/uutiset | News article | YLEArticle |
Vauva | https://www.vauva.fi | Discussion thread | VauvaPage |
Oikotie | https://asunnot.oikotie.fi/myytavat-asunnot | Apartment ad | OikotieApartment |
Documentation is available at https://finscraper.readthedocs.io and simple online demo here.
Installation
pip install finscraper
Quickstart
Fetch 10 news articles as a pandas DataFrame from IltaSanomat:
from finscraper.spiders import ISArticle
spider = ISArticle().scrape(10)
articles = spider.get()
Contributing
When websites change, spiders tend to break. I can't make a promise to keep this repository up-to-date all by myself - pull requests are more than welcome!
Jesse Myrberg (jesse.myrberg@gmail.com)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
finscraper-0.0.1.dev18.tar.gz
(18.6 kB
view hashes)