Skip to main content

export/interface with firefox history/site metadata

Project description

ffexport

Primary function here is to export/interact with my firefox history. Functionality for Chrome are vestigal and I've left them there in case someone wants to mess with it. I recommend you take a look at promnesia if you want immediate support for that.

See here for how firefox stores its history.

Install

pip install ffexport

Usage

Usage: ffexport [OPTIONS] COMMAND [ARGS]...

Options:
  --help  Show this message and exit.

Commands:
  inspect  Extracts history/site metadata from one sqlite database.
  merge    Extracts history/site metadata from multiple sqlite databases.
  save     Backs up the current firefox sqlite history file.

save

Usage: ffexport save [OPTIONS]

  Backs up the current firefox sqlite history file.

Options:
  --browser [firefox|chrome]  Provide either 'firefox' or 'chrome' [defaults
                              to firefox]
  --profile TEXT              Use to pick the correct profile to back up. If
                              unspecified, will assume a single profile
  --to PATH                   Directory to store backup to  [required]

Since firefox (and browsers in general) seem to remove old history seemingly randomly, I'd recommend running the following periodically:

$ ffexport save --to ~/data/firefox/dbs
[I 200828 15:30:58 save_hist:67] backing up to /home/sean/data/firefox/dbs/places-20200828223058.sqlite
[I 200828 15:30:58 save_hist:71] done!

That atomically copies the firefox sqlite database which contains your history --to some backup directory.

inspect

Usage: ffexport inspect SQLITE_DB

  Extracts history/site metadata from one sqlite database. Provide a firefox
  history sqlite databases as the first argument. Drops you into a REPL to
  access the data.

As an example:

$ ffexport inspect ~/data/firefox/dbs/places-20200828231237.sqlite
[I 200828 17:08:23 parse_db:73] Parsing visits from /home/sean/data/firefox/dbs/places-20200828231237.sqlite...
[I 200828 17:08:23 parse_db:92] Parsing sitedata from /home/sean/data/firefox/dbs/places-20200828231237.sqlite...
Demo: Your most common sites....
[('github.com', 13778),
 ('www.youtube.com', 8114),
 ('duckduckgo.com', 8054),
 ('www.google.com', 6542),
 ('discord.com', 6141),
 ('stackoverflow.com', 2528),
 ('gitlab.com', 1608),
 ('trakt.tv', 1362),
 ('letterboxd.com', 1053),
 ('www.reddit.com', 708)]

Use mvis or msite to access raw visits/site data, vis for the merged data

In [1]: ....

That drops you into a REPL with access to the history from that database (vis and mvis/msite)

merge

Similar to inspect, but accepts multiple database backups, merging the Visits together and dropping you into a REPL

Usage: ffexport merge [OPTIONS] SQLITE_DB...

  Extracts history/site metadata from multiple sqlite databases.

  Provide multiple sqlite databases as positional arguments, e.g.: ffexport
  merge ~/data/firefox/dbs/*.sqlite

  Provides a similar interface to insepct; drops you into a REPL to access
  the data.

Options:
  --include-live              In addition to any provided databases, copy
                              current (firefox) history to /tmp and merge it
                              as well

(also accepts the --browser and --profile arguments like save)

Example:

$ ffexport merge --include-live ~/data/firefox/dbs/*.sqlite
[I 200828 18:53:54 save_hist:67] backing up to /tmp/tmp8tvyotv9/places-20200829015354.sqlite
[I 200828 18:53:54 save_hist:71] done!
[D 200828 18:53:54 merge_db:52] merging information from 3 databases...
[I 200828 18:53:54 parse_db:71] Parsing visits from /home/sean/data/firefox/dbs/places-20200828223058.sqlite...
[I 200828 18:53:55 parse_db:90] Parsing sitedata from /home/sean/data/firefox/dbs/places-20200828223058.sqlite...
[I 200828 18:53:56 parse_db:71] Parsing visits from /home/sean/data/firefox/dbs/places-20200828231237.sqlite...
[I 200828 18:53:56 parse_db:90] Parsing sitedata from /home/sean/data/firefox/dbs/places-20200828231237.sqlite...
[I 200828 18:53:57 parse_db:71] Parsing visits from /tmp/tmp8tvyotv9/places-20200829015354.sqlite...
[I 200828 18:53:58 parse_db:90] Parsing sitedata from /tmp/tmp8tvyotv9/places-20200829015354.sqlite...
[D 200828 18:53:59 merge_db:64] Summary: removed 183,973 duplicates...
[D 200828 18:53:59 merge_db:65] Summary: returning 92,066 visit entries...
Python 3.8.5 (default, Jul 27 2020, 08:42:51)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.17.0 -- An enhanced Interactive Python. Type '?' for help.

Use merged_vis to access merged data from all databases

Library Usage

Can also import and provide files from python elsewhere.

>>> import ffexport, glob
>>> visits = list(ffexport.read_and_merge(*glob.glob('data/firefox/dbs/*.sqlite')))  # note the splat, read_and_merge accepts variadic arguments
[D 200828 18:51:00 merge_db:52] merging information from 2 databases...
[I 200828 18:51:00 parse_db:71] Parsing visits from /home/sean/data/firefox/dbs/places-20200828223058.sqlite...
[I 200828 18:51:01 parse_db:90] Parsing sitedata from /home/sean/data/firefox/dbs/places-20200828223058.sqlite...
[I 200828 18:51:01 parse_db:71] Parsing visits from /home/sean/data/firefox/dbs/places-20200828231237.sqlite...
[I 200828 18:51:02 parse_db:90] Parsing sitedata from /home/sean/data/firefox/dbs/places-20200828231237.sqlite...
[D 200828 18:51:03 merge_db:64] Summary: removed 91,976 duplicates...
[D 200828 18:51:03 merge_db:65] Summary: returning 91,997 visit entries...
>>> visits[10000]
Visit(url='https://github.com/python-mario/mario', visit_date=datetime.datetime(2020, 6, 24, 2, 23, 32, 482000, tzinfo=<UTC>), visit_type=1, title='python-mario/mario: Powerful Python pipelines for your shell', description='Powerful Python pipelines for your shell . Contribute to python-mario/mario development by creating an account on GitHub.', preview_image='https://repository-images.githubusercontent.com/185277224/2ce27080-b915-11e9-8abc-088ab263dbd9')

For an example, see my HPI integration.

Notes

See here for what the visit_type enum means.

I considered using cachew but because of the volume of the data, it ends up being slower than reading directly from the sqlite database exports. Both the visits and sitedata functions are cachew compliant though, you'd just have to wrap it yourself.


save_hist.py/initial structure is modified from karlicoss/promnesia

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ffexport-0.1.0.tar.gz (12.3 kB view details)

Uploaded Source

File details

Details for the file ffexport-0.1.0.tar.gz.

File metadata

  • Download URL: ffexport-0.1.0.tar.gz
  • Upload date:
  • Size: 12.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.6.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.5

File hashes

Hashes for ffexport-0.1.0.tar.gz
Algorithm Hash digest
SHA256 8e938aee940566459d337c90702119dabc77de01b857be06dbead9260609e873
MD5 b633de67e825a293e73937d434eeae9c
BLAKE2b-256 09a5708d79392b54153dd5f01a7e514210f0fb56d32697c0b047a5e465cf44da

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page