Skip to main content

Build fcitx5/RIME dictionaries from MediaWiki sites

Project description

[!NOTE] 如果您需要下载萌娘百科 (zh.moegirl.org.cn) 词库,请参见此页

For the pre-built dictionary for Moegirlpedia (zh.moegirl.org.cn), see the wiki.

[!WARNING] mw2fcitx 0.20.0 包含一些主要和繁简转换相关的 breaking changes。请查看 BREAKING_CHANGES.md 了解更多信息。


mw2fcitx

Build fcitx5/RIME dictionaries from MediaWiki sites.

PyPI Tests codecov: Coverage

pip install mw2fcitx
# or if you want to just install for current user
pip install mw2fcitx --user
# or if you want to just run it (needs Pipx)
pipx run mw2fcitx
# or if you need to use OpenCC for text conversion
pip install mw2fcitx[opencc]

CLI Usage

mw2fcitx -c config_script.py

Configuration Script Format

from mw2fcitx.tweaks.moegirl import tweaks
# By default we assume the configuration is located at a variable
#     called "exports".
# You can change this with `-n any_name` in the CLI.

exports = {
    # Source configurations.
    "source": {
        # MediaWiki api.php path, if to fetch titles from online.
        "api_path": "https://zh.moegirl.org.cn/api.php",
        # Title file path, if to fetch titles from local file. (optional)
        # Can be a path or a list of paths.
        "file_path": ["titles.txt"],
        "kwargs": {
            # Title number limit for fetching. (optional)
            "title_limit": 120,
            # Title number limit for fetching via API. (optional)
            # Overrides title_limit.
            "api_title_limit": 120,
            # Title number limit for each fetch via file. (optional)
            # Overrides title_limit.
            "file_title_limit": 60,
            # Partial session file on exception (optional)
            "partial": "partial.json",
            # Title list export path. (optional)
            "output": "titles.txt",
            # Delay between MediaWiki API requests in seconds. (optional)
            "request_delay": 2,
            # Deprecated. Please use `source.kwargs.api_params.aplimit` instead. (optional)
            "aplimit": "max",
            # Override ALL parameters while calling MediaWiki API.
            "api_params": {
                # Results per API request; same as `aplimit` in MediaWiki docs. (optional)
                "aplimit": "max"
            },
            # User-Agent used while requesting the API. (optional)
            "user_agent": "MW2Fcitx/development"
        }
    },
    # Tweaks configurations as an list.
    # Every tweak function accepts a list of titles and return
    #     a list of title.
    "tweaks":
        tweaks,
    # Converter configurations.
    "converter": {
        # pypinyin is a built-in converter.
        # For custom converter functions, just give the function itself.
        "use": "pypinyin",
        "kwargs": {
            # Replace "m" to "mu" and "n" to "en". Default: False.
            # See more in https://github.com/outloudvi/mw2fcitx/issues/29 .
            "disable_instinct_pinyin": False,
            # Pinyin results to replace. (optional)
            # Format: { "汉字": "pin'yin" }
            # The result will be sent into `pypinyin` as a phrase, so words containing this phrase are also affected.
            "fixfile": "fixfile.json",
            # Characters to omit during pinyin conversion. (optional)
            # These characters will be automatically removed while trying to convert to pinyin.
            # As a result, words containing these characters will not be skipped in the dictionary.
            "characters_to_omit": ["·"],
        }
    },
    # Generator configurations.
    "generator": [{
        # rime is a built-in generator.
        # For custom generator functions, just give the function itself.
        "use": "rime",
        "kwargs": {
            # Destination dictionary filename. (optional)
            "output": "moegirl.dict.yml"
        }
    }, {
        # pinyin is a built-in generator.
        # This generator depends on `libime`.
        "use": "pinyin",
        "kwargs": {
            # Destination dictionary filename. (mandatory)
            "output": "moegirl.dict"
        }
    }]
}

A sample config file is here: sample_config.py

Advanced mode

As mw2fcitx provides the feature to append and override MediaWiki API parameters, it is possible to use it to collect other types of lists in addition to allpages. Please note that if list, action or format is overriden in api_params, mw2fcitx will not automatically append any default parameter (except for format) while sending MediaWiki API requests. Please determine the parameters needed by yourself. A configuration in tests may be helpful for your reference.

Breaking changes across versions

Read BREAKING_CHANGES.md for details.

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mw2fcitx-0.25.1.tar.gz (26.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mw2fcitx-0.25.1-py3-none-any.whl (17.2 kB view details)

Uploaded Python 3

File details

Details for the file mw2fcitx-0.25.1.tar.gz.

File metadata

  • Download URL: mw2fcitx-0.25.1.tar.gz
  • Upload date:
  • Size: 26.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for mw2fcitx-0.25.1.tar.gz
Algorithm Hash digest
SHA256 ce4827ea8180989bd67c28a4b419837dc3ec749bc9437b3ffbe3c17e07fb964e
MD5 948876d4e257adf3bdead250480e8639
BLAKE2b-256 56ab7dccc7a34ea9eb01e33879f784e7ea54e051edb59bd3b0e7198233b7424d

See more details on using hashes here.

Provenance

The following attestation bundles were made for mw2fcitx-0.25.1.tar.gz:

Publisher: publish_package.yml on outloudvi/mw2fcitx

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mw2fcitx-0.25.1-py3-none-any.whl.

File metadata

  • Download URL: mw2fcitx-0.25.1-py3-none-any.whl
  • Upload date:
  • Size: 17.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for mw2fcitx-0.25.1-py3-none-any.whl
Algorithm Hash digest
SHA256 78469ae9062a2e143f163d3c8645c69033b659db6e406f186298af6cc6adb1b1
MD5 409925aa7cb735220803d3c503f54bc2
BLAKE2b-256 151e0cad28409a7bddb22c43f85ef37e854e379169fe4ae8b5f4e6b70ea4b5ce

See more details on using hashes here.

Provenance

The following attestation bundles were made for mw2fcitx-0.25.1-py3-none-any.whl:

Publisher: publish_package.yml on outloudvi/mw2fcitx

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page