API for http://abclinuxu.cz.
Project description
Introduction
This module contains basic API for crawling the http://abclinuxu.cz website.
Installation
Module is hosted at PYPI, and can be installed using PIP:
pip install abclinuxuapi
Documentation
Full module documentation is hosted at ReadTheDocs: http://abclinuxuapi.readthedocs.org
Disclaimer
The API was made by me (Bystroushaak) and it is not officially related to the http://abclinuxu.cz project.
Examples
Iterate over all published blogs:
>>> import abclinuxuapi
>>> for blog in abclinuxuapi.iter_blogposts():
... print blog.title
...
Czech blacklist 1.0.21 iOS aplikace, filemanager, prehravani multimedii... ENCFS - lze doporucit? mozna uskali? Vývoj v C# + Oracle ODP.NET + EntityFramework Skončila svoboda? Abclinuxu - vyjádření k útokům Eliptické křivky - vztah Weierstrass, Montgomery, Edwards kopirovanie raspbianu na microsd kartu Půjdem dolem, půjdem horem? Podotčeno… Abclinuxu presmerovano... Dead man Valentýn 2018 (genderově korektní mikrozápisek) Textilosaurus - co je nového? Kvíz: Znáte český kraj? Název filmu Trilium Notes jako platforma pro mini-aplikace Marketingový "průzkum" pro zjištění obětí na další útok Vítězný únor 2018 Reverse engineering komunikace Xorg a nvidia driveru Vtipná konstrukce v shellu Anketa: Kdy budou další presidentské volby v ČR? Debian 9 a data corruption s detektivní zápletkou Proč je tolik povyku s meltdownem mezi normálními usery Tabletové skúsenosti pre ľahší život. ...
Get structured information for specific blog:
>>> blog = abclinuxuapi.Blogpost("https://www.abclinuxu.cz/blog/bystroushaak/2017/9/autorske-okenko-neal-asher", lazy=False)
>>> blog.created_ts
1506733800.0
>>> blog.last_modified_ts
1508752260.0
>>> blog.tags
['knihy', 'ProtectedByTagManager', 'recenze', 'sci-fi']
>>> blog.has_tux
False
>>> blog.rating
Rating(100%@5)
>>> blog.readed
1470
>>> blog.comments_n
73
>>> blog.comments[65]
Comment(username=andrea, id=18)
>>> blog.comments[65].registered
False
>>> blog.comments[65].timestamp
1506861120.0
>>> print blog.comments[65].text
supr blogísky, ráda je čtu.
<p class="separator"></p>
myslím že jsem tu od Tebe viděla souhrn knih, které jsi přečetl. měl bys třeba top50 sci-fi, které bych si určitě měla přečíst? nebo alespoň top 10, první trojka?
>>> blog.comments[65].responses
[Comment(username=bystroushaak, id=19)]
>>> print blog.text
<h2>Autorské okénko: Neal Asher</h2>
<p>Dvacátého září jsem dočetl všechno...
Changelog
0.4.16
abclinuxu_uploader.py; detect images bigger than 1MB. Added –url parameter to handle these.
concept.py; Detect upload of images bigger than 1MB and raise ValueError in such cases.
0.4.15
Added better error detection when too long title is used.
0.4.14
Fixed bug in parsing of number of comments from blog description.
0.4.13
Fixed parsing of http://www.abclinuxu.cz/blog/luv/2016/4/mockgeofix-mock-geolokace-kompatibilni-s-android-emulatorem where there are no comments.
0.4.12
Added abclinxuapi.number_of_blog_pages() function to find out how many blogs is there.
0.4.11
Added banlist for comment parsing on certain blogs (see HTML source on http://abclinuxu.cz/blog/Strider_BSD_koutek/2006/8/objevil-jsem-ameriku for details).
0.4.0 - 0.4.10
Added badges to README.
Blogpost.comments are now by default blank list instead of None.
Fixed bugs in uploader.
Parsing of the tags updated.
Added support for Blog.uid.
Fixed bugs in tests (new year parsing).
Added possibility to bypass lazy tag parsing.
Fixed bug in date parsing function.
Added support for parsing of more obscure date formats used by articles on abclinuxu.
Fixed another bug in date parsing function.
Added verify=False, because the SSL library pisses me off.
Added another special case of parsing the date.
Fixed another problem with date formats.
Fixed problem with parsing comments on the http://abclinuxu.cz/blog/msk/2016/8/hlada-sa-linux-embedded-vyvojar - there are no links to comments.
Fixed comment parsing in case of http://abclinuxu.cz/blog/leos/2007/2/prepis-diskusniho-fora-hw-sekce#31
0.3.0 - 0.3.11
Added parsing of comments under blogposts.
Fixed bugs.
Fixed bugs in user.py.
Added iter_blogposts(), first_blog_page() functions for browsing the bloglist.
Implemented Blogpost.get_image_urls().
Added date_izolator(). Fixed bugs in comments parsing with relative dates.
Fixed bug in parsing of Blogpost’s content.
Added blog iterator tor User object.
Fixed #4 - bug in username parsing.
Fixed parsing of censored comments.
Added Comment.censored.
Comment.registered_user renamed to Comment.registered.
Fixed bug which skipped censored comments.
Fixed problems with old blogs (different HTML).
Implemented #6: .__repr__() for all important classes.
Fixed #7 - blogs with opening HTML comments in perex.
Fixed bug in Blogpost._parse_content_tag().
Another attempt to solve shit in old blogs. There are missing tags, crossed tags and a lot of other shitfucks.
Fixed bug caused by http://abclinuxu.cz/blog/Mostly_IMDB/2008/6/radeon-hd-4850-a-tak-vubec#17
Added a lot of documentation, fixed docstrings and so on.
User.has_blog() changed to bool property User.has_blog.
Concept class refactored.
Added new parameter data for shared.download().
User.ts_to_concept_date moved to shared.ts_to_concept_date().
0.2.0
Added a lot of features.
Fixed broken setup.py.
0.1.0
Created.
It can be now used to read data from the abclinuxu, but it is incomplete and it will need a lot of work to do.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.