mkdocs-material/docs/blog/posts/chinese-search-support.md
2024-04-25 10:51:05 +07:00

2.9 KiB
Raw Permalink Blame History

date authors title description categories links
2022-05-05
squidfunk
Chinese search support Insiders adds Chinese language support for the built-in search plugin a feature that has been requested many times
Search
blog/posts/search-better-faster-smaller.md
plugins/search.md#segmentation
insiders/index.md#how-to-become-a-sponsor

Chinese search support 中文搜索​支持

Insiders adds experimental Chinese language support for the built-in search plugin a feature that has been requested for a long time given the large number of Chinese users.

After the United States and Germany, the third-largest country of origin of Material for MkDocs users is China. For a long time, the built-in search plugin didn't allow for proper segmentation of Chinese characters, mainly due to missing support in lunr-languages which is used for search tokenization and stemming. The latest Insiders release adds long-awaited Chinese language support for the built-in search plugin, something that has been requested by many users.

Material for MkDocs終於支持中文文本正確分割並且容易找到。 { style="display: inline" }

This article explains how to set up Chinese language support for the built-in search plugin in a few minutes. { style="display: inline" }

Configuration

Chinese language support for Material for MkDocs is provided by jieba, an excellent Chinese text segmentation library. If jieba is installed, the built-in search plugin automatically detects Chinese characters and runs them through the segmenter. You can install jieba with:

pip install jieba

The next step is only required if you specified the separator configuration in mkdocs.yml. Text is segmented with zero-width whitespace characters, so it renders exactly the same in the search modal. Adjust mkdocs.yml so that the separator includes the \u200b character:

plugins:
  - search:
      separator: '[\s\u200b\-]'

That's all that is necessary.

Usage

If you followed the instructions in the configuration guide, Chinese words will now be tokenized using jieba. Try searching for :octicons-search-24: 支持 to see how it integrates with the built-in search plugin.


Note that this is an experimental feature, and I, @squidfunk, am not proficient in Chinese (yet?). If you find a bug or think something can be improved, please open an issue.