2022-05-05 10:36:38 +03:00
|
|
|
|
---
|
2022-09-11 20:25:40 +03:00
|
|
|
|
date: 2022-05-05
|
|
|
|
|
authors: [squidfunk]
|
2022-05-05 10:36:38 +03:00
|
|
|
|
title: Chinese search support
|
|
|
|
|
description: >
|
2023-09-14 20:09:18 +03:00
|
|
|
|
Insiders adds Chinese language support for the built-in search plugin – a
|
2022-05-05 10:36:38 +03:00
|
|
|
|
feature that has been requested many times
|
2022-09-11 20:25:40 +03:00
|
|
|
|
categories:
|
|
|
|
|
- Search
|
|
|
|
|
links:
|
|
|
|
|
- blog/posts/search-better-faster-smaller.md
|
2023-09-14 20:09:18 +03:00
|
|
|
|
- plugins/search.md#segmentation
|
2022-09-11 20:25:40 +03:00
|
|
|
|
- insiders/index.md#how-to-become-a-sponsor
|
2022-05-05 10:36:38 +03:00
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
# Chinese search support – 中文搜索支持
|
|
|
|
|
|
2023-09-14 20:09:18 +03:00
|
|
|
|
__Insiders adds experimental Chinese language support for the [built-in search
|
2022-05-05 10:36:38 +03:00
|
|
|
|
plugin] – a feature that has been requested for a long time given the large
|
|
|
|
|
number of Chinese users.__
|
|
|
|
|
|
|
|
|
|
After the United States and Germany, the third-largest country of origin of
|
2022-06-05 19:16:51 +03:00
|
|
|
|
Material for MkDocs users is China. For a long time, the [built-in search plugin]
|
2023-09-14 20:09:18 +03:00
|
|
|
|
didn't allow for proper segmentation of Chinese characters, mainly due to
|
2022-06-05 19:16:51 +03:00
|
|
|
|
missing support in [lunr-languages] which is used for search tokenization and
|
|
|
|
|
stemming. The latest Insiders release adds long-awaited Chinese language support
|
2022-05-05 10:36:38 +03:00
|
|
|
|
for the built-in search plugin, something that has been requested by many users.
|
|
|
|
|
|
2022-09-11 20:25:40 +03:00
|
|
|
|
<!-- more -->
|
|
|
|
|
|
2022-05-05 10:36:38 +03:00
|
|
|
|
_Material for MkDocs終於支持中文了!文本被正確分割並且更容易找到。_
|
|
|
|
|
{ style="display: inline" }
|
|
|
|
|
|
|
|
|
|
_This article explains how to set up Chinese language support for the built-in
|
|
|
|
|
search plugin in a few minutes._
|
|
|
|
|
{ style="display: inline" }
|
|
|
|
|
|
2023-09-15 10:25:50 +03:00
|
|
|
|
[built-in search plugin]: ../../plugins/search.md
|
2022-05-05 10:36:38 +03:00
|
|
|
|
[lunr-languages]: https://github.com/MihaiValentin/lunr-languages
|
|
|
|
|
|
|
|
|
|
## Configuration
|
|
|
|
|
|
|
|
|
|
Chinese language support for Material for MkDocs is provided by [jieba], an
|
2022-06-05 19:16:51 +03:00
|
|
|
|
excellent Chinese text segmentation library. If [jieba] is installed, the
|
|
|
|
|
built-in search plugin automatically detects Chinese characters and runs them
|
2022-05-05 10:36:38 +03:00
|
|
|
|
through the segmenter. You can install [jieba] with:
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
pip install jieba
|
|
|
|
|
```
|
|
|
|
|
|
2023-09-14 20:09:18 +03:00
|
|
|
|
The next step is only required if you specified the [`separator`][separator]
|
|
|
|
|
configuration in `mkdocs.yml`. Text is segmented with [zero-width whitespace]
|
2022-06-05 19:16:51 +03:00
|
|
|
|
characters, so it renders exactly the same in the search modal. Adjust
|
|
|
|
|
`mkdocs.yml` so that the [`separator`][separator] includes the `\u200b`
|
|
|
|
|
character:
|
2022-05-05 10:36:38 +03:00
|
|
|
|
|
|
|
|
|
``` yaml
|
|
|
|
|
plugins:
|
|
|
|
|
- search:
|
2022-05-14 10:40:59 +03:00
|
|
|
|
separator: '[\s\u200b\-]'
|
2022-05-05 10:36:38 +03:00
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
That's all that is necessary.
|
|
|
|
|
|
|
|
|
|
## Usage
|
|
|
|
|
|
2023-09-14 20:09:18 +03:00
|
|
|
|
If you followed the instructions in the configuration guide, Chinese words will
|
2022-05-05 10:36:38 +03:00
|
|
|
|
now be tokenized using [jieba]. Try searching for
|
2023-09-14 20:09:18 +03:00
|
|
|
|
[:octicons-search-24: 支持][q=支持] to see how it integrates with the
|
2022-05-05 10:36:38 +03:00
|
|
|
|
built-in search plugin.
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
2023-09-14 20:09:18 +03:00
|
|
|
|
Note that this is an experimental feature, and I, @squidfunk, am not
|
2022-05-05 10:36:38 +03:00
|
|
|
|
proficient in Chinese (yet?). If you find a bug or think something can be
|
|
|
|
|
improved, please [open an issue].
|
|
|
|
|
|
|
|
|
|
[jieba]: https://pypi.org/project/jieba/
|
|
|
|
|
[zero-width whitespace]: https://en.wikipedia.org/wiki/Zero-width_space
|
|
|
|
|
[separator]: ../../setup/setting-up-site-search.md#separator
|
|
|
|
|
[q=支持]: ?q=支持
|
|
|
|
|
[open an issue]: https://github.com/squidfunk/mkdocs-material/issues/new/choose
|