Skip to main content
  1. Posts/

Get Pinyin Initials of Chinese characters

··112 words·1 min·
Note
Table of Contents

To get a reasonably large collection of traditional and simplified Chinese characters in use toady, we may use zhon:

pip install zhon

To get pinyin of a Chinese character, we use python-pinyin:

pip install pypinyin

Here a script to get pinyin initials for valid Chinese character without tones:

from zhon import cedict
from pypinyin import pinyin, Style
import yaml


def main():
    all_chars = set(cedict.all)

    ch_initials = {}
    for c in all_chars:
        ch_pinyin = pinyin(c, style=Style.NORMAL, errors='ignore')
        # if no pinyin for this char exists
        if not ch_pinyin:
            continue

        py_init = ch_pinyin[0][0][0]
        ch_initials[c] = py_init

    fname = 'zh_char_initial.yaml'
    with open(fname, 'w') as f:
        yaml.dump(ch_initials, f)


if __name__ == "__main__":
    main()

Ref
#

Related

Install pyav inside Ubuntu Docker
··491 words·3 mins
Note Ffmpeg Pyav Docker Ubuntu
Cross the Wall
·380 words·2 mins
Note
Liveness and Readiness Check in Kubernetes
·213 words·1 min
Note Kubernetes GCP