Skip to main content

Build fcitx5/RIME dictionaries from MediaWiki sites

Project description

[!NOTE] 如果您需要下载萌娘百科 (zh.moegirl.org.cn) 词库,请参见此页

For the pre-built dictionary for Moegirlpedia (zh.moegirl.org.cn), see the wiki.

[!WARNING] mw2fcitx 0.20.0 包含一些主要和繁简转换相关的 breaking changes。请查看 BREAKING_CHANGES.md 了解更多信息。


mw2fcitx

Build fcitx5/RIME dictionaries from MediaWiki sites.

PyPI Tests codecov: Coverage

pip install mw2fcitx
# or if you want to just install for current user
pip install mw2fcitx --user
# or if you want to just run it (needs Pipx)
pipx run mw2fcitx
# or if you need to use OpenCC for text conversion
pip install mw2fcitx[opencc]

CLI Usage

mw2fcitx -c config_script.py

Configuration Script Format

from mw2fcitx.tweaks.moegirl import tweaks
# By default we assume the configuration is located at a variable
#     called "exports".
# You can change this with `-n any_name` in the CLI.

exports = {
    # Source configurations.
    "source": {
        # MediaWiki api.php path, if to fetch titles from online.
        "api_path": "https://zh.moegirl.org.cn/api.php",
        # Title file path, if to fetch titles from local file. (optional)
        # Can be a path or a list of paths.
        "file_path": ["titles.txt"],
        "kwargs": {
            # Title number limit for fetching. (optional)
            "title_limit": 120,
            # Title number limit for fetching via API. (optional)
            # Overrides title_limit.
            "api_title_limit": 120,
            # Title number limit for each fetch via file. (optional)
            # Overrides title_limit.
            "file_title_limit": 60,
            # Partial session file on exception (optional)
            "partial": "partial.json",
            # Title list export path. (optional)
            "output": "titles.txt",
            # Delay between MediaWiki API requests in seconds. (optional)
            "request_delay": 2,
            # Deprecated. Please use `source.kwargs.api_params.aplimit` instead. (optional)
            "aplimit": "max",
            # Override ALL parameters while calling MediaWiki API.
            "api_params": {
                # Results per API request; same as `aplimit` in MediaWiki docs. (optional)
                "aplimit": "max"
            },
            # User-Agent used while requesting the API. (optional)
            "user_agent": "MW2Fcitx/development"
        }
    },
    # Tweaks configurations as an list.
    # Every tweak function accepts a list of titles and return
    #     a list of title.
    "tweaks":
        tweaks,
    # Converter configurations.
    "converter": {
        # pypinyin is a built-in converter.
        # For custom converter functions, just give the function itself.
        "use": "pypinyin",
        "kwargs": {
            # Replace "m" to "mu" and "n" to "en". Default: False.
            # See more in https://github.com/outloudvi/mw2fcitx/issues/29 .
            "disable_instinct_pinyin": False,
            # Pinyin results to replace. (optional)
            # Format: { "汉字": "pin'yin" }
            # The result will be sent into `pypinyin` as a phrase, so words containing this phrase are also affected.
            "fixfile": "fixfile.json",
            # Characters to omit during pinyin conversion. (optional)
            # These characters will be automatically removed while trying to convert to pinyin.
            # As a result, words containing these characters will not be skipped in the dictionary.
            "characters_to_omit": ["·"],
        }
    },
    # Generator configurations.
    "generator": [{
        # rime is a built-in generator.
        # For custom generator functions, just give the function itself.
        "use": "rime",
        "kwargs": {
            # Destination dictionary filename. (optional)
            "output": "moegirl.dict.yml"
        }
    }, {
        # pinyin is a built-in generator.
        # This generator depends on `libime`.
        "use": "pinyin",
        "kwargs": {
            # Destination dictionary filename. (mandatory)
            "output": "moegirl.dict"
        }
    }]
}

A sample config file is here: sample_config.py

Advanced mode

As mw2fcitx provides the feature to append and override MediaWiki API parameters, it is possible to use it to collect other types of lists in addition to allpages. Please note that if list, action or format is overriden in api_params, mw2fcitx will not automatically append any default parameter (except for format) while sending MediaWiki API requests. Please determine the parameters needed by yourself. A configuration in tests may be helpful for your reference.

Breaking changes across versions

Read BREAKING_CHANGES.md for details.

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mw2fcitx-0.24.2.tar.gz (25.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mw2fcitx-0.24.2-py3-none-any.whl (17.1 kB view details)

Uploaded Python 3

File details

Details for the file mw2fcitx-0.24.2.tar.gz.

File metadata

  • Download URL: mw2fcitx-0.24.2.tar.gz
  • Upload date:
  • Size: 25.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for mw2fcitx-0.24.2.tar.gz
Algorithm Hash digest
SHA256 5e2e4d5c14649485b69f0adadf00dd9cfae19cb7e61fc1be10c6b7a120fd08f3
MD5 a3399327ae5ce1d99888613a4cce8ca9
BLAKE2b-256 82be0dffaea06fe2eb59ff32a91681cfe8215bdc4b52832516a549af720e48bc

See more details on using hashes here.

Provenance

The following attestation bundles were made for mw2fcitx-0.24.2.tar.gz:

Publisher: publish_package.yml on outloudvi/mw2fcitx

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mw2fcitx-0.24.2-py3-none-any.whl.

File metadata

  • Download URL: mw2fcitx-0.24.2-py3-none-any.whl
  • Upload date:
  • Size: 17.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for mw2fcitx-0.24.2-py3-none-any.whl
Algorithm Hash digest
SHA256 4ab53f408133d07cc1e4df944091d9a428fb18a19cb33fdcc230fa263e70fed1
MD5 b27a1fc436e022e9cdf5daf489d815c0
BLAKE2b-256 5c4aecdbc81b2039e42be1f92d65634a5a6ea82ecb8293a064f0fd84b0f3ea06

See more details on using hashes here.

Provenance

The following attestation bundles were made for mw2fcitx-0.24.2-py3-none-any.whl:

Publisher: publish_package.yml on outloudvi/mw2fcitx

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page