Skip to main content

TagMark is a tag based bookmark solution for intensive github users. This tool tagmark (python) is the data processing part of TagMark solution.

Project description

tagmark (Python)

PyPI - Python Version codecov Codacy Badge Twitter Follow

1. Introduction, User Guide and the Demo Page

TagMark is a tag-based bookmark solution I created for:

  • Those who have a multitude of bookmarks and want to efficiently organize, easily retrieve, and share them with others.
  • Individuals who frequently work with GitHub, have starred numerous repositories, yet struggle with how to efficiently retrieve and effectively utilize this vast amount of information.

Watch this video TagMark - Introduction and User Guide for details:

TagMark - Introduction and User Guide

Here is the demo page of TagMark, which collected all my bookmarks:

https://pwnfan.github.io/my-tagmarks / https://tagmark.pwn.fan

Features of the page:

  • Substantial tag based bookmarks
    • 2700+ tagged bookmarks (1800+ curated Github Repos) mainly focus on cybersecurity and related development
    • 1000+ tags with detailed tag definitions
  • Full featured tags
    • tag definitions (show / hide definition by left click on tags)
    • tag overview with counts
    • color difference depending on counts
  • Simple but powerful header filter for each column
    • thick client: static, pure frontend and js based, so it's fast responding
    • simple and useful filter grammar
    • quickly input tag name into filter by just a right click
    • press CTRL/CMD with left click in any filter input to call out multiple language document (English / Japanese / Chinese)
  • Supporting for URL GET parameters based filtering
    • static, pure frontend and js based
    • easy for sharing
  • Columns related things
    • detailed Github repository information
    • suppressible columns
  • Template Tag Doc

2. Why TagMark?

The introduction video summarized the reasons why I made TagMark, for the detailed reasons you can read my blog (TL;DR ๐Ÿ˜…) TagMark: Maybe a Better Browser Bookmark Solution

3. TagMark Related Projects

  • tagmark-py (this repo)
    • exporting tagged bookmarked data from other third party services, e.g. diigo
    • converting other bookmark formats into Tagmark format, i.e tagmarks.jsonl
    • checking every tag has a been defined, i.e. checking tag consistency in tagmarks.jsonl and tags.json
    • getting tag definitions automatically with ChatGPT, i.e setting the values of the key definition in tags.json
    • making document from a template containing tag related syntaxes, i.e making tag-doc.md
  • tagmark-ui
    • a web page showing tagmarks.jsonl, tags.json and related docs
  • my-tagmarks

4. TagMark Architecture, Workflow and Customizing Guide

If you want to customize your own my-tagmarks, here is a overview of TagMark architecture and workflow you need to get familiar with:

     โ”‚                                                                         https://pwnfan.github.io/my-tagmarks    
   #0โ”‚   โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•—          โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•—       โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•—  i.e. https://tagmark.pwn.fan            
   start โ•‘  [original  โ•‘          โ•‘[exported data]โ•‘       โ•‘  {tagmark-py}   โ•‘                     โ–ฒ                    
     โ””โ”€โ”€>โ•‘  bookmark   โ•‘          โ•‘               โ•‘       โ•‘   (this repo)   โ•‘            #9 deployโ”‚Github Pages        
         โ•‘    data]    โ•‘          โ•‘ โ–‘diigoโ–‘toolโ–‘โ–‘ โ•‘       โ•‘                 โ•‘                     โ”‚                    
         โ•‘             โ•‘    โ”Œโ”€โ”€โ”€โ”€โ”€โ•‘>exportedโ–‘data โ•‘       โ•‘   โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ  โ•‘  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”           
         โ•‘ โ–‘pwnfan'sโ–‘โ–‘ โ•‘    โ”‚     โ•‘ โ–‘โ–‘โ–‘(.html)โ–‘โ–‘โ–‘ โ•‘ โ”Œโ”€โ”€#2b.2โ”€โ”€โ–ˆsubcommandโ–ˆ  โ•‘  โ”‚         โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•—โ”‚           
    โ”Œโ”€โ”€โ”€โ”€โ•‘โ”€โ–‘untaggedโ–‘โ–‘ โ•‘    โ”‚     โ•‘               โ•‘ โ”‚     โ•‘   โ–ˆโ–ˆโ–ˆexportโ–ˆโ–ˆโ–ˆ  โ•‘  โ”‚         โ•‘ {my-tagmarks}  โ•‘โ”‚  #6.1     
    #1   โ•‘ โ–‘bookmarksโ–‘ โ•‘    โ”‚     โ•‘ โ–‘โ–‘diigoโ–‘APIโ–‘โ–‘ โ•‘โ”€โ”ผโ”€โ”€โ”  โ•‘   โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ<โ”€โ•‘โ”€โ”€โ”ผโ”€โ”       โ•‘   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ•‘โ”ผโ”€โ”€manuallyโ”€
 manuallyโ•‘             โ•‘    โ”‚     โ•‘ โ–‘dumpedโ–‘dataโ–‘<โ•‘โ”€โ”˜ #3.1โ•‘any              โ•‘  โ”‚ โ”‚       โ•‘   โ”‚tag-doc.โ”‚  โ”‚โ•‘โ”‚  make     
 setโ”‚tagsโ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•    โ”‚     โ•‘ โ–‘โ–‘(.jsonl)โ–‘โ–‘โ–‘ โ•‘   format  โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ<โ”€โ•‘โ”€โ”€โ”˜ โ”‚ โ”Œโ”€#6.2โ•‘โ”€โ”€โ”€โ”‚templateโ”‚<โ”€โ”˜โ•‘โ”‚           
 andโ”‚add                  #2a.2   โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•    โ””โ”€โ”€โ•‘โ”€โ”€>โ–ˆsubcommandโ–ˆ  โ•‘    โ”‚ โ”‚     โ•‘   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ•‘โ”‚ #7.1      
   into                     โ”‚                             โ•‘   โ–ˆโ–ˆconvertโ–ˆโ–ˆโ–ˆโ”€โ”€โ•‘โ”€โ”  โ”‚ โ”‚     โ•‘  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ•‘โ”‚update     
    โ”‚ โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•—  โ”‚    #3.2 add Githubโ”Œrepoโ”€โ”€โ”€โ”€โ”€โ•‘โ”€โ”€โ”€โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ  โ•‘ โ”‚  โ”‚ โ”‚   โ”Œโ”€โ•‘โ”€>โ”‚tag-doc.mdโ”‚  โ•‘โ”‚Github     
    โ”‚ โ•‘   {third-party   โ•‘  โ”‚    info and covertโ”‚into     โ•‘                 #7.2 โ”‚ โ”‚ #6.3โ•‘  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ•‘โ”‚ repo      
    โ”‚ โ•‘bookmark & taggingโ•‘  โ”‚                   โ”‚         โ•‘   โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ  โ•‘ โ”‚  โ”‚ โ”‚   โ”‚ โ•‘   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ•‘โ”‚ info      
    โ”‚ โ•‘     service}     โ•‘  โ”‚    โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•—โ”‚ โ”Œโ”€โ”€#4.1โ”€โ•‘โ”€โ”€>โ–ˆsubcommandโ–ˆ  โ•‘ โ””โ”€โ”€โ”ผโ”€โ”ผโ”€โ”€โ”€โ”ผโ”€โ•‘โ”€โ”€>โ”‚tagmarksโ”œโ”€โ”€โ”€โ•‘everyday    
    โ”‚ โ•‘                  โ•‘  โ”‚    โ•‘  [TagMark   โ•‘โ”‚ โ”‚       โ•‘   โ–ˆโ–ˆchecktagโ–ˆโ–ˆ  โ•‘    โ”‚ โ”‚ โ”Œโ”€โ”ผโ”€โ•‘โ”€โ”€>โ”‚ .json  โ”‚   โ•‘            
    โ”‚ โ•‘    โ–ˆโ–ˆโ–ˆdiigoโ–ˆโ–ˆโ–ˆ   โ•‘  โ”‚    โ•‘    data]    โ•‘โ”‚ โ”‚  โ”Œโ”€โ”€โ”€โ”€โ•‘โ”€โ”€โ”€โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ  โ•‘    โ”‚ โ”‚ โ”‚ โ”‚ โ•‘   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ•‘            
    โ””โ”€โ•‘โ”€โ”€โ”€>โ–ˆโ–ˆbrowserโ–ˆโ–ˆ   โ•‘  โ”‚    โ•‘             โ•‘โ”‚ โ”‚ #4.2 add                โ•‘    โ”‚ โ”‚ โ”‚ โ”‚ โ•‘  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ•‘            
      โ•‘    โ–ˆextensionโ–ˆ   โ•‘  โ”‚  โ”Œโ”€โ•‘โ”€โ”€โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘<โ”€โ•‘โ”˜ โ”‚ missing   โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ  โ•‘    โ”‚ โ”‚โ”Œโ”ผโ”€โ”ผโ”€โ•‘โ”€>โ”‚tags.jsonโ”‚   โ•‘            
      โ•‘         โ”‚        โ•‘  โ”‚ #3.3  โ–‘TagMarkโ–‘  โ•‘  โ”‚ tags  โ•‘   โ–ˆsubcommandโ–ˆ  โ•‘    โ”‚ โ”‚โ”‚โ”‚ โ”‚ โ•‘  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ•‘            
      โ•‘         โ”‚        โ•‘  โ”‚  โ”‚ โ•‘  bookmarks  โ•‘  โ”‚  โ”‚    โ•‘   โ–ˆautotagdefโ–ˆ  โ•‘    โ”‚ โ”‚โ”‚โ”‚ โ”‚ โ•‘  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ•‘            
      โ•‘         โ–ผ        โ•‘  โ”‚  โ”‚ โ•‘  (tagmarks  โ•‘โ”€โ”€โ”˜  โ”‚ โ”Œ#5.1โ”€>โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ”€โ”€โ•‘โ”€โ”€โ” โ”‚ โ”‚โ”‚โ”‚ โ”‚ โ•‘  โ”‚tagmark-uiโ”‚  โ•‘            
      โ•‘    โ–ˆโ–ˆโ–ˆdiigoโ–ˆโ–ˆโ–ˆ   โ•‘  โ”‚  โ”‚ โ•‘  โ–‘.jsonl)โ–‘  โ•‘     โ”‚ โ”‚  โ•‘                 โ•‘  โ”‚ โ”‚ โ”‚โ”‚โ”‚ โ”‚ โ•‘  โ””โ”€โ”€โ”€โ”€โ–ฒโ”€โ”€โ”€โ”€โ”€โ”˜  โ•‘            
      โ•‘    โ–ˆโ–ˆwebsiteโ–ˆโ–ˆ   โ•‘  โ”‚  โ”‚ โ•‘  โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘  โ•‘     โ”‚ โ”‚  โ•‘   โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ  โ•‘  โ”‚ โ”‚ โ”‚โ”‚โ”‚ โ”‚ โ•‘       โ”‚        โ•‘            
      โ•‘         โ”‚        โ•‘  โ”‚  โ”‚ โ•‘             โ•‘โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”ผ#6.1โ”€>โ–ˆsubcommandโ–ˆ<โ”€โ•‘โ”€โ”€โ”ผโ”€โ”ผโ”€โ”˜โ”‚โ”‚ โ”‚ โ•šโ•โ•โ•โ•โ•โ•#8โ•โ•โ•โ•โ•โ•โ•โ•โ•            
      โ•‘         โ”‚  #2a.1 โ•‘  โ”‚  โ”‚ โ•‘ โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘ โ•‘     โ”‚ โ”‚  โ•‘   โ–ˆmaketagdocโ–ˆโ”€โ”€โ•‘โ”€โ”€โ”ผโ”€โ”ผโ”€โ”€โ”ผโ”ผโ”€โ”˜         โ”‚                     
      โ•‘         โ”‚ manually  โ”‚  โ”‚ โ•‘ โ–‘โ–‘TagMarkโ–‘โ–‘<โ•‘โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚  โ•‘   โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ  โ•‘  โ”‚ โ”‚  โ”‚โ”‚  โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•—          
      โ•‘ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€runโ”€onโ•‘  โ”‚โ”Œโ”€โ”ผโ”€โ•‘>โ–‘tagsโ–‘infoโ–‘โ”€โ•‘โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•  โ”‚ โ”‚  โ”‚โ”‚  โ•‘   {tagmark-ui}    โ•‘          
      โ•‘ โ”‚        diggo page โ”‚โ”‚ โ”‚ โ•‘ (tags.json)<โ•‘โ”€#5.2โ”€defineโ”€tagsโ”€withโ”€ChatGPTโ”€โ”˜ โ”‚  โ”‚โ”‚  โ•‘ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”      โ•‘          
      โ•‘ โ”‚              โ”‚ โ•‘  โ”‚โ”‚ โ”‚ โ•‘ โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ”€โ•‘โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€#5.3โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”˜โ”‚  โ•‘ โ”‚filter docโ”œโ”€โ”    โ•‘          
      โ•‘ โ–ผ              โ–ผ โ•‘  โ”‚โ”‚ โ”‚ โ•‘      โ”‚      โ•‘                                 โ”‚   โ”‚  โ•‘ โ”‚(EN/CN/JP)โ”‚ โ”œโ”€โ”€โ” โ•‘          
      โ•‘ โ–ˆdiigoโ–ˆ โ–ˆdiigoโ–ˆโ–ˆ โ•‘  โ”‚โ”‚ โ”‚ โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•                                 โ”‚   โ”‚  โ•‘ โ””โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚  โ”‚ โ•‘          
      โ•‘ webโ–ˆAPI โ–ˆexportโ–ˆโ”€โ•‘โ”€โ”€โ”˜โ”‚ โ”‚        โ”‚                                        โ”‚   โ”‚  โ•‘   โ””โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚ โ•‘          
      โ•‘ โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ โ–ˆโ–ˆtoolโ–ˆโ–ˆ โ•‘   โ””โ”€โ”ผโ”€โ”€โ”€โ”€โ”€#4.3 manually set the values of keys        โ”‚   โ”‚  โ•‘      โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ•‘          
      โ•‘   โ”‚              โ•‘     โ”‚     `abbr/alias/full_name/gpt_prompt_context    โ”‚   โ”‚  โ•‘  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ•‘          
      โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•     โ”‚     /prefer_format` for new added tags          โ”‚   โ”‚  โ•‘  โ”‚Web Page Codeโ”‚  โ•‘          
          โ”‚                    โ”‚                                                 โ”‚   โ”‚  โ•‘  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ•‘          
          โ”‚                    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”˜  โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•          
          โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€#2b.1โ”€respondโ”€toโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                                     
                                                                                                                       
                                                                                                                       
Steps Flow:                                                                                                            
              (option a)     โ”Œโ”€>#3.1โ”€โ”€โ”€โ”€>#3.2โ”€โ”€โ”€โ”€>#3.3  โ”Œโ”€โ”€>#5.1โ”€โ”€โ”€โ”€>#5.2โ”€โ”€โ”€โ”€>#5.3โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                 
          โ”Œโ”€>#2a.1โ”€โ”€>#2a.2โ”€โ”€โ”€โ”ค                     โ”‚    โ”‚                       โ”‚   โ””โ”€>โ”‚ #7.1โ”€โ”€โ”€>7.2 โ”‚                 
      #1โ”€โ”€โ”ค                  โ”‚    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ”‚     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜      โ”‚             โ”‚                 
          โ””โ”€>#2b.1โ”€โ”€>#2b.2โ”€โ”€โ”€โ”˜    โ–ผ                     โ”‚     โ–ผ                     โ”Œโ”€>โ”‚   #8   #9   โ”‚                 
              (option b)        #4.1โ”€โ”€โ”€โ”€>#4.2โ”€โ”€โ”€โ”€>#4.3โ”€โ”€โ”˜   #6.1โ”€โ”€โ”€โ”€>#6.2โ”€โ”€โ”€โ”€>#6.3โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                 
                                               (suggested)  (------optional-------)                                     

Steps note and customizing suggestions:

  • Steps requiring manual works
    • #1:
    • #2a.x
      • use alternative #2b is suggested
      • #2a.1 does't work well recently, may be due to some problems on the Diigo Tools / Export service side๏ผŒwhich impelled me to made an alternative #2b instead
      • notice that #2b exploits a web API of diigo and acts like a crawler to retrieve your own bookmarks, it's a trade-off option so we'd better not frequently use it, and I have added some sleep time between successive requests
      • Diigo has its own official API for retrieving bookmarks but it is a premium (paid) feature, may be it's a better option to become a premium user and add the related retrieving feature (plugin) into tagmark-py export subcommand
    • #4.3
      • optional but suggested if you want reading-friendly tag names and exact tag definitions shown in the web page (i.e. tagmark-ui)
      • similar to #1, the first time involves a full workload, which may take a considerable amount of time, but subsequent efforts only involve incremental tasks and are much more easier
    • #6.x
      • optional, if you don't need a TagMark tag doc, you can skip these steps
      • may take a considerable amount of time if you have many bookmarks and tags, and want to well categorize them into different topics, but fortunately this is just an one-off work
  • #7, #8, and #9 form a unit in which the prerequisite dependencies are Steps #1 through #6. However, Steps #7, #8, and #9 are independent of each other and have no interdependencies
  • Some steps are auto done by Github Actions, most of which are located in repo my-tagmarks
    • to ensure these actions function correctly, you may need to set repo vars and secrets which will be used in these actions
    • so the repo vars and secrets need to set are
      • ${{ secrets.GH_PAT_TAGMARK }}
        • it is a personal access tokens (aka PAT) having the Contents(Read and Write access to code) permission to the code of repo my-tagmarks
        • you need to set it in both tagmark-ui and my-tagmarks if you need the UI code synchronizing feature
      • ${{ vars.TAGMARK_DATA_EXPIRED_HOURS }}
        • it determines the expiring time of the Github repo info to a bookmark, see tagmark-py subcommand covert for details
        • the value I've set is 23
        • only need to be set in repo my-tagmarks

5. tagmark-py User Guide

5.1. Installation

  1. install Python >=3.11 and a virtual environment (virtualenv / pyenv / conda)
  2. install tagmark-py
    pip install tagmark
    
  3. check tagmark runs well command line options:
    (tagmark-py3.11) vscode โžœ /workspaces/tagmark-py (dev) $ tagmark_cli 
    Usage: tagmark_cli [OPTIONS] COMMAND [ARGS]...
    
    Options:
      -h, --help  Show this message and exit.
    
    Commands:
      export      export tagged bookmarked data from third party services...
      convert     convert other bookmark formats into TagMark format...
      checktag    check tag consistency in tagmark data file (json-lines) and...
      autotagdef  get tag definition automatically with ChatGPT
      maketagdoc  make document from a template containing tag related syntaxes
    

5.2. Usage

5.2.1. subcommand: export

(tagmark-py3.11) vscode โžœ /workspaces/tagmark-py (dev) $ tagmark_cli export -h
Usage: tagmark_cli export [OPTIONS]

  export tagged bookmarked data from third party services into jsonlines file

Options:
  -f, --format [diigo_web]        third party service  [default: diigo_web]
  -m, --max-sleep-seconds-between-requests FLOAT
                                  if multiple requests are needed to retrieve
                                  the data, in order to prevent excessive load
                                  on the target server, a random time sleep is
                                  necessary, this option set the maximum sleep
                                  seconds  [default: 3]
  -o, --output-file-path FILE     output file path  [default:
                                  diigo_web_exported.jsonl]
  -h, --help                      Show this message and exit.
  • export retrieves bookmarks data (with tags) from third party bookmark manager services which support tags
  • even though -f diigo_web is the only supported third party service now, export subcommand is designed to supported different services
  • -f diigo_web requires the diigo web cookie and reads its value from the key DIIGO_COOKIE stored in the .env file or environment variables, so you need to set it before run export .env file example:
    DIIGO_COOKIE="{YOUR DIIGO WEB COOKIE HERE}"
    
  • setting -m, --max-sleep-seconds-between-requests to more than 3 is recommended, though it may take longer to retrieve the whole data
  • note that export is different from other subcommands, if you run tagmark_cli export without any arguments, it will not print the help message, instead it will run directly with the default values of the arguments

5.2.2. subcommand: convert

(tagmark-py3.11) vscode โžœ /workspaces/tagmark-py (dev) $ tagmark_cli convert 
Usage: tagmark_cli convert [OPTIONS]

  convert other bookmark formats into TagMark format (json-lines)

Options:
  -i, --input-file-path FILE      input file path
  -f, --format [diigo_web_exported|diigo_exported_chrome_format|tagmark_jsonlines]
                                  format of the input file  [default:
                                  diigo_web_exported]
  -o, --output-file-path FILE     output tagmark jsonlines data file path
                                  [default: tagmarks.jsonl]
  -k, --keep_empty_keys BOOLEAN   whether keep keys with empty values
                                  [default: False]
  -c, --condition-json-path FILE  json file containing the condition for
                                  fitlering TagmarkItem  [default:
                                  /workspaces/tagmark-
                                  py/tagmark/condition_example.json]
  -b, --is-ban-condition BOOLEAN  If set to True, a TagmarkItem hits the
                                  `condition` will be banned, or it will be
                                  remained  [default: True]
  -t, --github_token TEXT         the GITHUB_TOKEN to access Github API,
                                  default will read from the .env file of the
                                  root dir of this project
  -u, --update-github-info-after-hours FLOAT
                                  update github info only when user specified
                                  number of hours has passed since the last
                                  update  [default: 23]
  -h, --help                      Show this message and exit.
  • convert helps to convert other bookmark formats (i.e. -f diigo_web_exported | diigo_exported_chrome_format) into TagMark data file (json-lines) and add Github info for Github repo bookmarks command example:
    tagmark_cli convert -i tagmarks_all.jsonl -f diigo_web_exported -c data/my-condition.json
    
  • convert can also be used to only update the Github repo info (stars, late commit data, etc) of a converted TagMark data file (json-lines) (i.e. -f tagmark_jsonlines) command example:
    tagmark_cli convert -i data/tagmark_ui_data.jsonl -c data/my-condition.json -f tagmark_jsonlines
    
  • before running covert, you may need setup Github PAT, if you don't have any Github Repo Bookmarks, this step can be skipped
    • create a github personal access token(PAT)
    • tagmark requires PAT to access the Github API to get the repo info(stars, forks etc.) when a bookmark url is a Github repo url. The default settings to the PAT is recommended, which has no any privilege for any action to any of your repos or settings.
    • export reads the Github PAT from key GITHUB_TOKEN stored in the .env file or environment variables .env file example:
      GITHUB_TOKEN=github_pat_XXX
      
    • you can also tell export your Github PAT by adding the -t parameter, which is no recommended because it may remain in your bash history
  • if you have bad network connection to Github API server (access from China, e.g.), you may got a lot connection errors when convert tries to get Github repo info, in this case you may need to continue to run convert -f tagmark_jsonlines again and again until all the missed Github repo info have been completed
  • please refer to section Core Options Explanation and Design Details for details of the options.

5.2.3. subcommand: checktag

(tagmark-py3.11) vscode โžœ /workspaces/tagmark-py (dev) $ tagmark_cli checktag
Usage: tagmark_cli checktag [OPTIONS]

  check tag consistency in tagmark data file (json-lines) and tags info file
  (json)

Options:
  -d, --tagmark-jsonlines-data-path FILE
                                  the tagmark jsonlines data file path, which
                                  may be the output file generated by the `-o`
                                  parameter of the `convert` subcommand
  -t, --tags-json-path FILE       tags.json file path
  -c, --condition-json-path FILE  json file containing the condition for
                                  filtering TagmarkItem, here only the value
                                  of `tags` field in the file will be used,
                                  and this condition must be a ban condition
                                  [default: /workspaces/tagmark-
                                  py/tagmark/condition_example.json]
  -a, --add-new-tags BOOLEAN      if set to `True`, a new tags.json file will
                                  be generated, which includes old tags in
                                  tag.json file, and new tags in the tagmark
                                  data file(specified by -t).  [default: True]
  -h, --help                      Show this message and exit.
  • checktag helps to verify if every tag in the output file generated by the -o parameter of the convert subcommand has relate tag info in the tag info json file. This ensures the web UI tagmark-ui functions correctly. command example:
    tagmark_cli checktag -d data/tagmarks.jsonl -t data/tags.json -c data/my-condition.json
    
  • if you specify -a true to run checktag, tags only in tagmark json lines data but not in tags info json file will be added and output to a new tags info json file, before jump into the next step, you may need to manually check the newly added tags in the new tags json file
    • you can find them by searching "definition": null
    • in most cases you need to manually set the values of keys abbr/alias/full_name/gpt_prompt_context for the new tags, which is the step #4.3 in the TagMark workflow diagram
    • this step is strongly suggested if you want reading-friendly tag names and exact tag definitions shown in the web page (i.e. tagmark-ui)
  • if you run checktag for the first time, i.e. you don't hava a tags info json file (tags.json), you need to make an empty one by run:
    echo "{}" > tags.json
    

5.2.4. subcommand: autotagdef

(tagmark-py3.11) vscode โžœ /workspaces/tagmark-py (dev) $ tagmark_cli autotagdef
Usage: tagmark_cli autotagdef [OPTIONS]

  get tag definition automatically with ChatGPT

Options:
  -d, --tags-info-json-path FILE  tags.json (tags information) file path
  -c, --gpt-config-file-path FILE
                                  the config file for invoking ChatGPT API, we
                                  sugguest setting `access_token` in the
                                  config file, see https://github.com/acheong0
                                  8/ChatGPT#--optional-configuration for
                                  details.
  -i, --gpt-conversation-id TEXT  the id of conversation in which to (continue
                                  to) interact with ChatGPT, if set to `None`
                                  a new conversation will be created. See http
                                  s://github.com/acheong08/ChatGPT/wiki/V1#ask
                                  for details.
  -t, --gpt-timeout INTEGER       the timeout that GPT answers one question
                                  (get one tag definition)  [default: 60]
  -l, --little-info-tag-is-ok BOOLEAN
                                  [default: False]
  -h, --help                      Show this message and exit.
  • autotagdef helps to get tag definition automatically from ChatGPT command example:
    tagmark_cli autotagdef -d data/tags.json -c gpt_config.json -l true
    
  • how it work? for example a user edited tag info (Step #4.3 in TagMark workflow diagram):
    "bom": {
        "abbr": "BOM",
        "alias": null,
        "definition": null,
        "full_name": "Bill of Materials",
        "gpt_prompt_context": "computer science and cybersecurity",
        "prefer_format": "{abbr} ({full_name})"
    }
    
    autotagdef will ask ChatGPT with prompt "in {gpt_prompt_context}, what is {prefer_format}?", i.e. "in computer science and cybersecurity, what is BOM (Bill of Materials)?", and set the value of key "definition" according to the response from ChatGPT
  • the -l, --little-info-tag-is-ok option is applying for tags like:
    "checklist": {
        "abbr": null,
        "alias": null,
        "definition": null,
        "full_name": null,
        "gpt_prompt_context": null,
        "prefer_format": "{tag}"
    }
    
    if -l true is set, then question will be sent to ChatGPT, i.e "what is checklist?" or if -l false is set (default value), an error NoEnoughTagInfoForGptPromptException will be raised this option is for ensuring that user didn't miss editing any tag info
  • just like the convert subcommand, if you have bad network connection (access from China, e.g.) to OpenAI API server (Or maybe a API proxy server for the revChatGPT lib) , you may got a lot connection errors when autotagdef tries to ask ChatGPT, in this case you may need to continue to run autotagdef -f tagmark_jsonlines again and again until all the tags have got a definition from ChatGPT
  • autotagdef use the python lib revChatGPT to communicate with ChatGPT
    • but unfortunately revChatGPT was archived in 2023.08.10, I am not sure how long it will remain functional
    • maybe I need to find an alternative lib for revChatGPT, if you got related errors in running autotagdef, please tell me in the issue of this repo
  • -c, --gpt-config-file-path is required for revChatGPT, and a config file content looks like:
    {
    "access_token": "{YOUR_ACCESS_TOKEN}"
    }
    
    and access_token can be got by accessing https://chat.openai.com/api/auth/session

5.2.5. subcommand: maketagdoc

(tagmark-py3.11) vscode โžœ /workspaces/tagmark-py (dev) $ tagmark_cli maketagdoc
Usage: tagmark_cli maketagdoc [OPTIONS]

  make document from a template containing tag related syntaxes

Options:
  -d, --tagmark-jsonlines-data-path FILE
                                  the tagmark jsonlines data file path, which
                                  may be the output file generated by the `-o`
                                  parameter of the `convert` subcommand
  -t, --tags-json-path FILE       tags.json file path
  -s, --config-path FILE          (formatter) configuration file path
                                  [default: /workspaces/tagmark-
                                  py/tagmark/tools/maketagdoc.toml.default]
  -u, --url-base TEXT             url base for generating formatted links
                                  [default: ./]
  -c, --condition-json-path FILE  json file containing the condition for
                                  filtering TagmarkItem, here only the value
                                  of `tags` field in the file will be used,
                                  and this condition must be a ban condition
                                  [default: /workspaces/tagmark-
                                  py/tagmark/condition_example.json]
  -b, --is-ban-condition BOOLEAN  If set to True, a TagmarkItem hits the
                                  `condition` will be banned, or it will be
                                  remained  [default: True]
  -m, --template-path FILE        template file path
  -o, --output-file-path FILE     the output file (formatted according to the
                                  template file) path  [default:
                                  /workspaces/tagmark-py/formatted_tag_doc.md]
  -h, --help                      Show this message and exit
  • maketagdoc make document from a template containing tag related syntaxes, a config file is also required but there is a default value for it command example:
    tagmark_cli maketagdoc -d data/tagmarks.jsonl -u https://pwnfan.github.io/my-tagmarks/ -t data/tags.json -m data/maketagdoc/tag-doc.template -o data/maketagdoc/tag-doc.md
    
  • though I use maketagdoc to make markdown format tag doc in my-tagmarks, actually maketagdoc can be generated as any doc format
  • if you just want to make your own markdown doc tag-doc.md, just refer to these template files about how to apply TagMark tag doc template syntaxes:

5.2.6. Condition File Details

you may have noticed that some subcommand have these parameters:

  • -c, --condition-json-path FILE
  • -b, --is-ban-condition BOOLEAN

condition file was not included in the architecture and workflow diagram to avoid becoming too complicated to understand.

Here we will talk about the workflow containing condition file:

                                            โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                                            โ”‚      -b      โ”‚
                                            โ”‚  This is a   โ”‚
                                            โ”‚ban-condition?โ”‚
                    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”          โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                    โ”‚TagmarkItem โ”‚                  โ”‚
                    โ”‚โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”                 โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”       โ””โ”คTagmarkItem โ”‚        โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”       โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  TagMark  โ”‚        โ”‚โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”โ”€โ”€โ”€โ”€โ”€โ”€>โ”‚Filter Conditionโ”‚โ”€โ”€โ”€โ”€โ”€โ”€>โ”‚Subcommandโ”‚
โ”‚ Data File โ”‚โ”€โ”€โ”€โ”€โ”€โ”€โ”€>โ””โ”คTagmarkItem โ”‚       โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜       โ”‚Processingโ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜         โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”              โ–ฒ               โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                      โ””โ”€โ”คTagmarkItem โ”‚              โ”‚
                        โ”‚            โ”‚     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                        โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ”‚       -c       โ”‚
                                           โ”‚ Condition File โ”‚
                                           โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

-c specify a json file containing the condition for fitlering TagmarkItem, the default condition file is tagmark/condition_example.json, with content of:

{
    "tags": ["Diigo"],
    "valid": true
}

What is a TagmarkItem? Taking a look at the output file format of -o, which is a json-lines format file, with one json data in one line. It is the json dump of TagmarkItem object, one line json data looks like:

{
    "url": "https://github.com/jonschlinkert/remarkable",
    "id": 2,
    "valid": true,
    "title": "jonschlinkert/remarkable: Markdown parser, done right. Commonmark support, extensions, syntax plugins, high speed - all in one. Gulp and metalsmith plugins available. Used by Facebook, Docusaurus and many others! Use https://github.com/breakdance/breakdan",
    "tags": ["dev", "frontend", "javascript", "markdown"],
    "is_github_url": true,
    "github_repo_info": {
        "url": "https://github.com/jonschlinkert/remarkable",
        "owner": "jonschlinkert",
        "name": "remarkable",
        "description": "Markdown parser, done right. Commonmark support, extensions, syntax plugins, high speed - all in one. Gulp and metalsmith plugins available. Used by Facebook, Docusaurus and many others! Use https://github.com/breakdance/breakdance for HTML-to-markdown conversion. Use https://github.com/jonschlinkert/markdown-toc to generate a table of contents.",
        "time_created": "2014-09-01T17:57:42Z",
        "time_last_commit": "2023-03-30T05:55:40Z",
        "count_star": 5514,
        "count_fork": 396,
        "count_watcher": 5514,
        "topics": [
            "commonmark",
            "compile",
            "docusaurus",
            "gfm",
            "javascript",
            "jonschlinkert",
            "markdown",
            "markdown-it",
            "markdown-parser",
            "md",
            "node",
            "nodejs",
            "parse",
            "parser",
            "syntax-highlighting"
        ]
    },
    "time_added": "1682907038"
}

you can treat this json structure as the data structure of a TagmarkItem, -c condition file and -b specify a filter telling tagmark if or not to output a TagmarkItem into the -o output file.

for example, if you do not need any lines with tag javascript or css to be output in the output file, you should specify your condition file by -c my_condition.json with the content below:

{
    "tags": ["javascript", "css"],
}

and you need to specify the -b True (default) option, which means if a TagmarkItem meets the condition, it will be banned and will not be exported into to output file.

On the contrary, if you only need lines with tag javascript or css to be output into the output file, you need to specify the -b False option, which means if a TagmarkItem meets the condition, it will be picked out(not banned) and put into the output file.

Note that not all keys in TagmarkItem are supported in condition filter files, here is a table for details:

key value type supported in condition file condition example meaning
url string yes "url": ["github", "stackoverflow"] url contains "github" or "stackoverflow"
id int no - -
valid boolean yes "valid": true the url is valid(valid check haven't been implemented)
title string yes (similar to url) (similar to url)
tags array yes "tags": ["python", "javascript"] tags contains "python" or "javascript"
is_github_url boolean yes (similar to valid) (similar to valid)
github_repo_info nested object no - -
time_added string no - -

All values in condition file is case-sensitive.

5.3. Changelog

see docs/CHANGELOG.md

5.4. Contributing and Development Guide

Welcome you to join the development of tagmark. Please see docs/CONTRIBUTING.md

5.5. TODO

  • lib.data: skip dumping some tagmark item according to user input
  • Tagmark.get_github_repo_infos add condition filter
  • add msg to show rate of process in convert command because it may be slow when there are a plenty of github repo urls
  • lib.data: add github repo license info into TagmarkItem
  • validate url availability and set TagmarkItem.valid according to the result
    • github repo url
    • not github repo url
  • automatically find a forked repo of invalid github repo, replace the old repo url with forked repo url, and add comment to explain why
  • update github info only when user specified number of hours has passed since the last update.
  • add subcommand cheatsheet maketagdoc to make a cheat sheet from a pre-defined template file
  • add test case for cli.py
  • make customized tag supporting bookmark collector for TagMark

5.6. Credits

6. Similar Tools / Projects

  • tag supporting bookmark manager (tagmark-py & tagmark-ui alternative)
    • Diigo: Better reading and research with annotation, highlighter, sticky notes, archiving, bookmarking & more.
  • cybersecurity tool collection with tags (my-tagmarks alternative)
    • offsec.tools: A vast collection of security tools for bug bounty, pentest and red teaming
    • WebHackersWeapons: Web Hacker's Weapons / A collection of cool tools used by Web hackers.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tagmark-1.0.0.tar.gz (37.6 kB view hashes)

Uploaded Source

Built Distribution

tagmark-1.0.0-py3-none-any.whl (32.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page