Skip to main content

export/interface with firefox history/site metadata

Project description

ffexport

PyPi version Python 3.6|3.7|3.8|3.9 PRs Welcome

This backs up firefox history and parses the resulting history (sqlite) files.

Primary function here is to export/interact with my firefox history. Functionality for Chrome are vestigal and I've left them there in case someone wants to mess with it. I recommend you take a look at promnesia if you want immediate support for that.

See here for how firefox stores its history.

Install

pip3 install ffexport

Requires python3.6+

Usage

Usage: ffexport [OPTIONS] COMMAND [ARGS]...

Options:
  --help  Show this message and exit.

Commands:
  inspect  Extracts history/site metadata from one sqlite database.
  merge    Extracts history/site metadata from multiple sqlite databases.
  save     Backs up the current firefox sqlite history file.

The inspect and merge commands also accept a --json flag, which dumps the result to STDOUT as JSON. Dates are serialized to epoch time.

Logs are hidden by default. To show the debug logs set export FFEXPORT_LOGS=10 (uses logging levels)

save

Usage: ffexport save [OPTIONS]

  Backs up the current firefox sqlite history file.

Options:
  --browser [firefox|chrome]  Provide either 'firefox' or 'chrome' [defaults
                              to firefox]
  --profile TEXT              Use to pick the correct profile to back up. If
                              unspecified, will assume a single profile
  --to PATH                   Directory to store backup to  [required]

Since firefox (and browsers in general) seem to remove old history seemingly randomly, I'd recommend running the following periodically:

$ ffexport save --to ~/data/firefox
[D 200828 15:30:58 save_hist:67] backing up /home/sean/.mozilla/firefox/jfkdfwx.dev-edition-default/places.sqlite to /home/sean/data/firefox/places-20200828223058.sqlite
[D 200828 15:30:58 save_hist:71] done!

That atomically copies the firefox sqlite database which contains your history --to some backup directory.

inspect

Usage: ffexport inspect [OPTIONS] SQLITE_DB

  Extracts history/site metadata from one sqlite database.

  Provide a firefox history sqlite databases as the first argument. Drops
  you into a REPL to access the data.

Options:
  --json  Print result to STDOUT as JSON

As an example:

ffexport inspect ~/data/firefox/places-20200828223058.sqlite
[D 210225 04:38:27 parse_db:77] Reading individual visits from /home/sean/data/firefox/places-20200828223058.sqlite...
[D 210225 04:38:28 parse_db:96] Reading sitedata from /home/sean/data/firefox/places-20200828223058.sqlite...
Demo: Your most common sites....
[('github.com', 13775),
 ('www.youtube.com', 8108),
 ('duckduckgo.com', 8054),
 ('www.google.com', 6542),
 ('discord.com', 6134),
 ('sean.fish', 3264),
 ('stackoverflow.com', 2528),
 ('gitlab.com', 1608)]

Use mvis or msite to access raw visits/site data, vis for the merged data

In [1]: ....

That drops you into a REPL with access to the history from that database (vis and mvis/msite)

merge

Similar to inspect, but accepts multiple database backups, removing any duplicates and dropping you into a REPL

Usage: ffexport merge [OPTIONS] SQLITE_DB...

  Extracts history/site metadata from multiple sqlite databases.

  Provide multiple sqlite databases as positional arguments, e.g.: ffexport
  merge ~/data/firefox/dbs/*.sqlite

  Provides a similar interface to inspect; drops you into a REPL to access
  the data.

Options:
  --include-live              In addition to any provided databases, copy
                              current (firefox) history to /tmp and merge it
                              as well
  --json                      Print result to STDOUT as JSON

(also accepts the --browser and --profile flags like the save command, provide those if you have multiple profiles and are using the --include-live flag.

Example:

ffexport merge --include-live ~/data/firefox/*.sqlite
[D 210225 04:39:43 save_hist:71] backing up /home/sean/.mozilla/firefox/lsinsptf.dev-edition-default/places.sqlite to /tmp/tmpv2ct3t0g/places-20210225123943.sqlite
[D 210225 04:39:43 save_hist:75] done!
[D 210225 04:39:43 merge_db:48] merging information from 10 databases...
[D 210225 04:39:43 parse_db:142] Reading visits from /home/sean/data/firefox/places-20200828223058.sqlite...
[D 210225 04:39:44 parse_db:142] Reading visits from /home/sean/data/firefox/places-20201010031025.sqlite...
[D 210225 04:39:45 parse_db:142] Reading visits from /home/sean/data/firefox/places-20201031031121.sqlite...
[D 210225 04:39:46 parse_db:142] Reading visits from /home/sean/data/firefox/places-20201121031143.sqlite...
[D 210225 04:39:47 parse_db:142] Reading visits from /home/sean/data/firefox/places-20201212031157.sqlite...
[D 210225 04:39:49 parse_db:142] Reading visits from /home/sean/data/firefox/places-20201226031225.sqlite...
[D 210225 04:39:50 parse_db:142] Reading visits from /home/sean/data/firefox/places-20210109031324.sqlite...
[D 210225 04:39:52 parse_db:142] Reading visits from /home/sean/data/firefox/places-20210208052811.sqlite...
[D 210225 04:39:53 parse_db:142] Reading visits from /home/sean/data/firefox/places-20210222065542.sqlite...
[D 210225 04:39:55 parse_db:142] Reading visits from /tmp/tmpv2ct3t0g/places-20210225123943.sqlite...
[D 210225 04:39:57 merge_db:60] Summary: removed 1,597,629 duplicates...
[D 210225 04:39:57 merge_db:61] Summary: returning 235,681 visit entries...

Use merged_vis to access merged data from all databases

To dump all that info to json:

$ ffexport merge --include-live --json ~/data/firefox/*.sqlite > ./history.json
$ du -h history.json
67M     history.json

Library Usage

Can also import and provide files from python elsewhere. The two major functions you'd use are ffexport.read_visits (which reads/parses one database) and ffexport.read_and_merge (which parses multiple).

>>> import ffexport, glob
>>> visits = list(ffexport.read_and_merge(*glob.glob('data/firefox/*.sqlite')))  # note the splat, read_and_merge accepts variadic arguments
>>> visits[10000]
Visit(
  url="https://github.com/python-mario/mario",
  visit_date=datetime.datetime(2020, 6, 24, 2, 23, 32, 482000, tzinfo=<UTC>),
  visit_type=1,
  title="python-mario/mario: Powerful Python pipelines for your shell",
  description="Powerful Python pipelines for your shell . Contribute to python-mario/mario development by creating an account on GitHub.",
  preview_image="https://repository-images.githubusercontent.com/185277224/2ce27080-b915-11e9-8abc-088ab263dbd9",
)

For another example, see my HPI integration.

Notes

See here for what the visit_type enum means.

I considered using cachew but because of the volume of the data, it ends up being slower than reading directly from the sqlite database exports. Both the visits and sitedata functions are cachew compliant though, you'd just have to wrap it yourself. See here for more info.


save_hist.py/initial structure is modified from karlicoss/promnesia


Testing

git clone https://github.com/seanbreckenridge/ffexport
cd ./ffexport
pip install '.[testing]'
mypy ./ffexport
pytest

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ffexport-0.1.10.tar.gz (19.1 kB view details)

Uploaded Source

Built Distribution

ffexport-0.1.10-py3-none-any.whl (14.5 kB view details)

Uploaded Python 3

File details

Details for the file ffexport-0.1.10.tar.gz.

File metadata

  • Download URL: ffexport-0.1.10.tar.gz
  • Upload date:
  • Size: 19.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.7.3 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.9.3

File hashes

Hashes for ffexport-0.1.10.tar.gz
Algorithm Hash digest
SHA256 a6b461387f57282877a0260b83b1793e21b6e3b50df4e82750ad63a28c0394d2
MD5 a6710fb7723210206c358cd3287023a4
BLAKE2b-256 a1d9ba31e29995552a6c57b726458946af9e6be94aa0c606d38d4d3133355116

See more details on using hashes here.

File details

Details for the file ffexport-0.1.10-py3-none-any.whl.

File metadata

  • Download URL: ffexport-0.1.10-py3-none-any.whl
  • Upload date:
  • Size: 14.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.7.3 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.9.3

File hashes

Hashes for ffexport-0.1.10-py3-none-any.whl
Algorithm Hash digest
SHA256 485f43094d33929f92c22ba025d64f796d05ee58468f5f04266f194b3d7e0f2a
MD5 129a5222cc6bc1685c6335ade4ee7140
BLAKE2b-256 c302457d8b2b07490f8b8092b37912edca36d5e2715204a97906e6816d958fe2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page