obscraper: scrape posts from the overcomingbias blog
Project description
obscraper: scrape posts from the overcomingbias blog
obscraper lets you scrape blog posts and associated metadata from the overcomingbias blog.
It’s easy to get a single post:
>>> import obscraper >>> intro_url = 'https://www.overcomingbias.com/2006/11/introduction.html' >>> post = obscraper.get_post_by_url(intro_url) >>> post.title 'How To Join' >>> post.plaintext 'How can we better believe what is true? ...' >>> post.internal_links {'http://www.overcomingbias.com/2007/02/moderate_modera.html': 1, 'http://www.overcomingbias.com/2006/12/contributors_be.html': 1} >>> post.comments 20
Or a full list of post URLs and edit dates:
>>> import obscraper >>> edit_dates = obscraper.get_edit_dates() ... >>> len(edit_dates) 4352 >>> {url: str(edit_dates[url]) for url in list(edit_dates)[:5]} {'2022/01/much-talk-is-sales-patter': '2022-01-14 20:46:35+00:00', '2022/01/old-man-rant': '2022-01-13 15:21:33+00:00', '2022/01/my-11-bets-at-10-1-odds-on-10m-covid-deaths-by-2022': '2022-01-12 19:15:10+00:00', '2022/01/to-innovate-unify-or-fragment': '2022-01-11 01:03:44+00:00', '2022/01/on-what-is-advice-useful': '2022-01-10 18:46:26+00:00'}
Features
Get posts by their URLs or edit dates, or get all posts hosted on the overcomingbias site
Provides detailed post metadata including post URLs, titles, authors, tags, publish dates, and last edit dates
Provides summary of post content including full post text as HTML or plaintext, and a list of hyperlinks to other overcomingbias posts
Asynchronous execution and caching for fast downloads
Use via import obscraper or the simple command line interface
Comprehensively tested
Supports python 3.8+
Documentation
Read the full documentation here, including the Installation and Getting Started Guide and the Public API Reference.
Bugs/Requests
Please use the GitHub issue tracker to submit bugs or request features.
Changelog
See the Changelog for a list of fixes and enhancements at each version.
License
Copyright (c) 2022 Christopher McDonald
Distributed under the terms of the MIT license.
All overcomingbias posts are copyright the original authors.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file obscraper-0.8.3.tar.gz
.
File metadata
- Download URL: obscraper-0.8.3.tar.gz
- Upload date:
- Size: 38.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.11.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 98c54e35430805443d660138b3847c45f982ffd3db7c6ea5cb5f9d42c1ee6c75 |
|
MD5 | 40d4c11aff9e7822a4b11f85a4a6b880 |
|
BLAKE2b-256 | b8162fab493d46755b30c4972fb43cf93e55d469ad2e947ca1230e6cc581ab10 |
File details
Details for the file obscraper-0.8.3-py3-none-any.whl
.
File metadata
- Download URL: obscraper-0.8.3-py3-none-any.whl
- Upload date:
- Size: 21.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.11.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 35f717e5a0764176f1e537161fcfbc61123f51be6fa9533cc65e688f3fe2884b |
|
MD5 | 08f20f2b590fe632e8145463318109b8 |
|
BLAKE2b-256 | 8b34606c14e523299c8495de1718fcdf055f7dcc3608c31e0dde9a75397932e1 |