2.9 KiB
date | authors | title | description | categories | links | |||||
---|---|---|---|---|---|---|---|---|---|---|
2022-05-05 |
|
Chinese search support | Insiders adds Chinese language support for the built-in search plugin – a feature that has been requested many times |
|
|
Chinese search support – 中文搜索支持
Insiders adds experimental Chinese language support for the built-in search plugin – a feature that has been requested for a long time given the large number of Chinese users.
After the United States and Germany, the third-largest country of origin of Material for MkDocs users is China. For a long time, the built-in search plugin didn't allow for proper segmentation of Chinese characters, mainly due to missing support in lunr-languages which is used for search tokenization and stemming. The latest Insiders release adds long-awaited Chinese language support for the built-in search plugin, something that has been requested by many users.
Material for MkDocs終於支持中文了!文本被正確分割並且更容易找到。 { style="display: inline" }
This article explains how to set up Chinese language support for the built-in search plugin in a few minutes. { style="display: inline" }
Configuration
Chinese language support for Material for MkDocs is provided by jieba, an excellent Chinese text segmentation library. If jieba is installed, the built-in search plugin automatically detects Chinese characters and runs them through the segmenter. You can install jieba with:
pip install jieba
The next step is only required if you specified the separator
configuration in mkdocs.yml
. Text is segmented with zero-width whitespace
characters, so it renders exactly the same in the search modal. Adjust
mkdocs.yml
so that the separator
includes the \u200b
character:
plugins:
- search:
separator: '[\s\u200b\-]'
That's all that is necessary.
Usage
If you followed the instructions in the configuration guide, Chinese words will now be tokenized using jieba. Try searching for :octicons-search-24: 支持 to see how it integrates with the built-in search plugin.
Note that this is an experimental feature, and I, @squidfunk, am not proficient in Chinese (yet?). If you find a bug or think something can be improved, please open an issue.