Skip to main content

A CLI for downloading posts in bulk from Bluesky from specified a account

Project description

mass-downloader-for-bluesky

mass-downloader-for-bluesky (mdfb) is a Python cli application that can download large amounts of posts from bluesky from any given account.

Installation

You will need Python to be installed to use this CLI.

You can install via pip by:

pip install mdfb

Manual

Have Poetry installed.

Then clone the project, open a poetry shell and then install all dependencies.

git clone git@github.com:IbrahimHajiAbdi/mass-downloader-for-bluesky.git
cd mdfb
poetry shell
poetry install

Usage

mdfb works by using the public API offered by bluesky to retrieve posts liked, reposted or posted by the desired account.

mdfb will download the information for a post and the accompanying media, video or image(s). If there is no image(s) or video, it will just download the information of the post. The information of the post will be a JSON file and have lots of accompanying data, such as the text in the post, creation time of the post and author details. Currently, the retrieved posts start from the latest post to the oldest.

You will need to be inside a poetry shell to use mdfb if installed manually

Examples

Some example commands would be:

mdfb download --handle bsky.app -l 10 --like --threads 3 --format "{RKEY}_{HANDLE}" ./media/
mdfb download -d did:plc:z72i7hdynmk6r22z27h6tvur --archive --like --threads 3 --format "{DID}_{HANDLE}" ./media/
mdfb download --handle bsky.app --update --like --threads 3 --format "{RKEY}_{HANDLE}" ./media/

Naming Convention

By default, mdfb's naming convention is: "{rkey}_{handle}_{text}". If it is downloading a post with multiple images then the naming will be: "{rkey}_{handle}_{text}_{i}", where "i" represents the order of the images in the post ranging from 1 - 4. In addition, the filenames are limited to 256 bytes and will be truncated down to that size.

However, you can specify the name of the files by using the --format flag and passing a valid format string, e.g. "{RKEY}_{DID}". You can put anything in the format string inbetween the keywords. This is case-sensitive.

For --format, the valid keywords are:

  • RKEY
  • DID
  • HANDLE
  • TEXT
  • DISPLAY_NAME

Download Amount

When specifying the limit, this will be true for all types of post downloaded. For example:

mdfb download --handle bsky.app -l 100 --like --repost --post ./media/

This would download 100 likes, reposts and post, totalling 300 posts downloaded.

Furthermore, you can archive whole accounts. For exmaple:

mdfb download --handle bsky.app --archive --like --repost --threads 3 --format "{DID}_{HANDLE}" ./media/

This would download all likes and reposts.

Database

When downloading posts, mdfb inserts into the database some post identifiers. This allows for you to download only new posts from an account that you haven't downloaded yet.

However, there are some constraints, if you delete a file, this is not reflected in the database and thus, if you use the --update flag, it will not redownload it. Furthermore, the posts identifiers are only committed to the database once all posts have been downloaded, so if mdfb topples over during downloading, none of the posts downloaded will be reflected into the database.

The database is stored in: (Linux) ~/.local/share/mdfb/, (Windows) C:\\Users\\$USER\\AppData\\Local\\mdfb and (macOS) /Users/$USER/Library/Application Support/mdfb.

Example

mdfb db --delete_user bsky.app

Note

The maximum number of threads is currently 3, that can be changed in the mdfb/utils/constants.py file. Furthermore, there are more constants that can be changed in that file, such as delay between each request and the number of retires before marking that post as a failure and continuing.

Subcommands and arguments

  • download
    • --handle
      • The handle of the target account.
    • --did, -d
      • The DID of the target account.
    • --limit, -l
      • The amount of posts that want to be downloaded.
    • --archive
      • Downloads all posts from the selected post type.
    • --update, -u
      • Downloads all of the latest posts that haven't been downloaded.
    • directory
      • Positional argument, where all the downloaded files are to be located. Required.
    • --threads
      • The amount of threads wanted to download posts more efficiently, maximum number of threads is 3.
    • --format
      • Format string that file's will use for their name. Furthermore the keywords used are case-sensitive and should be all upper case.
    • --like
      • To retrieved liked posts
    • --repost
      • To retrieved reposts
    • --post
      • To retrieved posts
  • db
    • --delete_user
      • Deletes all posts associated with the given user from the database. Have to pass the handle of the user.

Note

At least one of the flags: --like, --repost, --post are required (when using download).

Both (--did, -d and --handle) and (--archive, --limit, -l and --update) are mutually exclusive, and one of each of them is required as well (when using download).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mdfb-1.3.0.tar.gz (13.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mdfb-1.3.0-py3-none-any.whl (15.7 kB view details)

Uploaded Python 3

File details

Details for the file mdfb-1.3.0.tar.gz.

File metadata

  • Download URL: mdfb-1.3.0.tar.gz
  • Upload date:
  • Size: 13.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for mdfb-1.3.0.tar.gz
Algorithm Hash digest
SHA256 e8775c326fa144bd6b741f1b5ac783ffe52e26e7c9b0ec7242d0e88bb14d1a96
MD5 59972a057b48a2804f9aa8c6e6117d57
BLAKE2b-256 e2e5263cc0ee4d2d95d81a1cd56bbe41939c12a7a80111e1ee2706cdd6136445

See more details on using hashes here.

File details

Details for the file mdfb-1.3.0-py3-none-any.whl.

File metadata

  • Download URL: mdfb-1.3.0-py3-none-any.whl
  • Upload date:
  • Size: 15.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for mdfb-1.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7e20e3758e8900d651b22b7e9e17fad8ceb1173fd1780a7260f1749b856ef1c8
MD5 e5a0f49974ae2a7527fd24ee71863028
BLAKE2b-256 6e50189cd1921a7b304a2e7970c56e85c684c8c301047190f240d2a011230dcf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page