Skip to main content

Convert the display names of hashtags in a Mastodon instance to PascalCase

Project description

Mastodon Tag Case Corrector

A Python script for Mastodon instance admins to convert the display names of hashtags to PascalCase, when possible.

There are two possible strategies:

  • Find tagged posts to determine the most used casing, and/or
  • Split words with wordninja.

Warning

This tool DOES NOT replace due diligence!!! There is NO GUARANTEE that this tool produces correct splittings of hashtags!!! ALWAYS review the log!!!

(However, the tool will not revisit any tags that have been changed either by the tool or manually by the admins.)

Usage

This tool is developed with Python 3.13. Support for previous versions will be provided on a best effort basis.

pip install mastodon-tag-case-corrector

# for all options
mastodon-tag-case-corrector -h

# sample usage
mastodon-tag-case-corrector -i mstdn.example -a MY_SECRET_TOKEN

Configuration

Configuration can be done through command-line options or by specifying variables.

Variable name CLI option Required Explanation
MASTODON_INSTANCE -i, --instance [INSTANCE] Yes The domain that the Mastodon API is on, eg. example.com.
MASTODON_API_KEY -a, --auth [AUTH] Yes Mastodon API access token with at least admin:read and admin:write permissions. read:statuses is needed if tagged post analysis is used. To get one, create a new application in User Settings => Development, then navigate to the application detail page and copy "your access token."
MASTODON_CHECK_TAG_LANGUAGE -l, --languages [LANGUAGES ...] No A list of language codes, separated by comma. If supplied, before feeding the hashtag into wordninja, the script will check the language with the most tagged posts in the last 30 days for each hashtag. If the language is one of the supplied languages, the hashtag will be processed; otherwise it will be ignored. This does not affect tagged post checking, and has no effect if WORDNINJA_DISABLE is enabled. Recommended to be set to en for optimal results (as the default corpus used by wordninja is intended to only cover English words), however do note that the /api/v1/dimensions endpoint for determining a hashtag's language tends to be quite slow on instances, so omitting it could shorten execution time, at the possible expense of accuracy.
MASTODON_DO_ALL_TAGS -t, --all-tags No If set to 1, the script will process all tags (from /api/v1/admin/tags), not just trending tags (from /api/v1/admin/trends/tags). Not recommended due to performance issue on the instance side, and that most hashtags tend to lack post statistics for language detection or post analysis to work (even if MASTODON_TAGS_OFFSET is used in combination).
MASTODON_TAGS_OFFSET -o, --offset [OFFSET] No First offset to pass to the first tag-listing request. 0 by default.
MASTODON_TAGS_DAYS --language-days [LANGUAGE_DAYS] No How many days of posts to consider for each hashtag to determine its language. 7 by default.
MASTODON_CAP_FIRST_LETTER_WHEN_POSSIBLE -c, --cap No For hashtags that only consist of one word, cap the first letter if it is most often capped in practice (ie. enforce strict PascalCase even for one word).
MASTODON_DISABLE_POST_ANALYSIS --no-analysis No Disable tagged post analysis.
MASTODON_DRY_RUN -d, --dry-run No Disable actually editing the tag in Mastodon, which is recommended for development and testing purposes.
WORDNINJA_DISABLE --no-wordninja No Disable wordninja detection.
WORDNINJA_DICTIONARY --dictionary [DICTIONARY] No Relative path to a gzipped text file of a list of words (must be in lower case) to consider in descending importance. If not provided, wordninja's default corpus will be used (note that this does not include fediverse jargons).

Paths are relative to the working folder. For boolean arguments, the equivalent environment variable should be set to 1 for true.

License

Copyright 2024-2025 Austin Huang im@austinhuang.me (https://austinhuang.me). Apache License 2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mastodon_tag_case_corrector-1.0.0.tar.gz (11.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mastodon_tag_case_corrector-1.0.0-py3-none-any.whl (26.3 kB view details)

Uploaded Python 3

File details

Details for the file mastodon_tag_case_corrector-1.0.0.tar.gz.

File metadata

File hashes

Hashes for mastodon_tag_case_corrector-1.0.0.tar.gz
Algorithm Hash digest
SHA256 6a00756afde3f59fd66e737cf7e0a65dcd032477f5ead44af51dc7ba68566e48
MD5 7c45ed54dac6a34b809fb0934fbe72ec
BLAKE2b-256 4a1b5b6a2fe734b52e02b295944df7dd823e0a4989df0f5f58d1dcf08ae408e1

See more details on using hashes here.

File details

Details for the file mastodon_tag_case_corrector-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for mastodon_tag_case_corrector-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5c7a915f3577bd733e3405d8817c6d95ad87d9c4513a0ac61a451bc92ac81fc0
MD5 4ade8b18ceba4b8a420d89ec70bb1d5a
BLAKE2b-256 33449abf3613a46b2f7f8ce590ec918ce79f3f6a09acf0d5adac5bbdbfd2ee1e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page