Skip to main content
  1. Posts/

Get Pinyin Initials of Chinese characters

··112 words·1 min·
Table of Contents

To get a reasonably large collection of traditional and simplified Chinese characters in use toady, we may use zhon:

pip install zhon

To get pinyin of a Chinese character, we use python-pinyin:

pip install pypinyin

Here a script to get pinyin initials for valid Chinese character without tones:

from zhon import cedict
from pypinyin import pinyin, Style
import yaml


def main():
    all_chars = set(cedict.all)

    ch_initials = {}
    for c in all_chars:
        ch_pinyin = pinyin(c, style=Style.NORMAL, errors='ignore')
        # if no pinyin for this char exists
        if not ch_pinyin:
            continue

        py_init = ch_pinyin[0][0][0]
        ch_initials[c] = py_init

    fname = 'zh_char_initial.yaml'
    with open(fname, 'w') as f:
        yaml.dump(ch_initials, f)


if __name__ == "__main__":
    main()

Ref
#

Related

Install pyav inside Ubuntu Docker
··491 words·3 mins
Cross the Wall
·380 words·2 mins
How to Download Files from Google Cloud Storage in the Databricks Workspace Notebook
··551 words·3 mins