A Bookmark.html file parser, merger and data viewer using pure python
Project description
PyBookmark
A bookmark.html parser, merger and viewer using pure python
- parse bookmark.html files from browsers with html structure included
- merge the parsed bookmark.html files
- export the parsed and merged bookmarks as a JSON archive
- GUI to view, edit, and add to the bookmarks stored in JSON archive
Package Justification
PyBookmark exists to solve a problem you may not have. Read the following to understand the trade off.
Why
You should use PyBookmark if:
- you have many different bookmark html files saved over time
- you wish to merge your bookmark history from multiple computers or files into one view
- you wish to separate the bookmark manager from the browser
- reduce possibility for tracking fingerprint (what bookmarks exist, unique icon file checksums or URLs)
- you wish to reduce clutter in bookmarks (icons)
- you are tired of Firefox (or other) changing which fields are supported to edit/view
- example: description, keywords, tags are intermittently viewable
- you are tired of Firefox (or other) breaking or changing how bookmark edit occurs
- example: recently Firefox made it so edits in the bookmark organizer did not save
- you want a more powerful bookmark search method
- you like control
Why Not
You should not use PyBookmark if:
- you are happy with native browser bookmark management
- you have very few bookmarks or all your bookmarks are in one file already
- you need or want in application multiple device synchronization or cloud backup support
- you primarily browse the internet using a smartphone or proprietary platform apps (facebook/reddit)
- you do not use bookmarks (why did you read this far?)
- you have no interest in understanding code or data structure
- eventually a browser change will mean the file format you try to import won't work and you will have to figure out why
Implementation Details
Assumptions
- Bookmark data is stored in html format. It is possible to extend to merge in json and other backups but that has not been the focus.
- Bookmark data has additional folder structure that
- is important
- indicates relationships between bookmarks
- these assumptions are why a complex parsing of beautiful soup is implemented to extract the URLs and related content
- Colons are useful separators of descriptive location in bookmark labels (not the URL)
- Duplicate bookmarks are bad but merging should be controlled
- You intend to migrate to a separate bookmark manager
- You will always be on a platform that can read the output json structure
Run Options (How to Use)
- parse single file
- library: pybookmark.bookmarks_parse.py
- merge files
- scripts: scripts.bookmarks_merge.py
- parses single or multiple bookmark.html files using pybookmark.bookmarks_parse.py library
- merges bookmarks across html files
- reduces duplication of information based on user defined mappings
- you only need to do this once if you start using the viewer as your bookmark manager
- viewer:
- viewer allows view, edit, add/remove of json bookmark collection
- library: pybookmark.pybookmarkjsonviewer.py
- can be called from command line
- $ python pybookmarkjsonviewer.py -f /path_to_json_file/sample.json
- script: scripts.PyBookmark_viewer.py
- runs against predefined yaml configuration in the same path
- Uses Tk to provide GUI
- note to run from a desktop launcher in linux may require a separate shell script with interactive mode enabled see reference
File Layout
- Data contains
- reference YAML configurations
- example input bookmark.html files
- example output json files
- pybookmark
- where the library code is, see run options above for types
- where the icon file is
- scripts
- where command line tools live
- see run options above for more details
Data Structures
The core data structure is AddrStruct.
addrStruct: dictionary of url keys with list of list values
key = URL address
[0] = label
[1] = age
[2] = tags
[3] = location
[4] = description
[5] = file location
With Version 1.1.0 the AddrStruct has been mapped to classes:
- bookmarkAttr
- defines basic bookmark attribute data object
- fundamentally a list of lists
- note the age uses new class AgeAsInt
- bookmarks
- the colleciton of bookmarks is fundamentally a dictionary
- key = url and value = bookmarkAttr object
Requirements Overview
Created using Python 3.7 or higher and Beautiful Soup 4.
Version History
Version | Description |
---|---|
1.0.0 | first release |
1.1.0 | refactored to use classes |
1.1.1 | fix pypi file due to bug |
1.1.1.1 | fixed list display bugs in viewer |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
PyBookmark-1.1.1.1.tar.gz
(112.2 kB
view hashes)
Built Distribution
PyBookmark-1.1.1.1-py3-none-any.whl
(112.6 kB
view hashes)
Close
Hashes for PyBookmark-1.1.1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f6fcbdcb39c2c147cb6f7e27fb7eb2bc985474bdcacae268b9bce572c7408671 |
|
MD5 | 3d031fcc9cdf601641b2995a597deb2b |
|
BLAKE2b-256 | 045538591904514a28a2d834df28b53deea88b183053a62910cb3bd2f0bf6836 |