mkdocs-material/docs/blog/posts/chinese-search-support.md

84 lines
2.9 KiB
Markdown
Raw Permalink Normal View History

2022-05-05 10:36:38 +03:00
---
2022-09-11 20:25:40 +03:00
date: 2022-05-05
authors: [squidfunk]
2022-05-05 10:36:38 +03:00
title: Chinese search support
description: >
2023-09-14 20:09:18 +03:00
Insiders adds Chinese language support for the built-in search plugin a
2022-05-05 10:36:38 +03:00
feature that has been requested many times
2022-09-11 20:25:40 +03:00
categories:
- Search
links:
- blog/posts/search-better-faster-smaller.md
2023-09-14 20:09:18 +03:00
- plugins/search.md#segmentation
2022-09-11 20:25:40 +03:00
- insiders/index.md#how-to-become-a-sponsor
2022-05-05 10:36:38 +03:00
---
# Chinese search support 中文搜索​支持
2023-09-14 20:09:18 +03:00
__Insiders adds experimental Chinese language support for the [built-in search
2022-05-05 10:36:38 +03:00
plugin] a feature that has been requested for a long time given the large
number of Chinese users.__
After the United States and Germany, the third-largest country of origin of
2022-06-05 19:16:51 +03:00
Material for MkDocs users is China. For a long time, the [built-in search plugin]
2023-09-14 20:09:18 +03:00
didn't allow for proper segmentation of Chinese characters, mainly due to
2022-06-05 19:16:51 +03:00
missing support in [lunr-languages] which is used for search tokenization and
stemming. The latest Insiders release adds long-awaited Chinese language support
2022-05-05 10:36:38 +03:00
for the built-in search plugin, something that has been requested by many users.
2022-09-11 20:25:40 +03:00
<!-- more -->
2022-05-05 10:36:38 +03:00
_Material for MkDocs終於支持中文文本正確分割並且容易找到。_
{ style="display: inline" }
_This article explains how to set up Chinese language support for the built-in
search plugin in a few minutes._
{ style="display: inline" }
2023-09-15 10:25:50 +03:00
[built-in search plugin]: ../../plugins/search.md
2022-05-05 10:36:38 +03:00
[lunr-languages]: https://github.com/MihaiValentin/lunr-languages
## Configuration
Chinese language support for Material for MkDocs is provided by [jieba], an
2022-06-05 19:16:51 +03:00
excellent Chinese text segmentation library. If [jieba] is installed, the
built-in search plugin automatically detects Chinese characters and runs them
2022-05-05 10:36:38 +03:00
through the segmenter. You can install [jieba] with:
```
pip install jieba
```
2023-09-14 20:09:18 +03:00
The next step is only required if you specified the [`separator`][separator]
configuration in `mkdocs.yml`. Text is segmented with [zero-width whitespace]
2022-06-05 19:16:51 +03:00
characters, so it renders exactly the same in the search modal. Adjust
`mkdocs.yml` so that the [`separator`][separator] includes the `\u200b`
character:
2022-05-05 10:36:38 +03:00
``` yaml
plugins:
- search:
separator: '[\s\u200b\-]'
2022-05-05 10:36:38 +03:00
```
That's all that is necessary.
## Usage
2023-09-14 20:09:18 +03:00
If you followed the instructions in the configuration guide, Chinese words will
2022-05-05 10:36:38 +03:00
now be tokenized using [jieba]. Try searching for
2023-09-14 20:09:18 +03:00
[:octicons-search-24: 支持][q=支持] to see how it integrates with the
2022-05-05 10:36:38 +03:00
built-in search plugin.
---
2023-09-14 20:09:18 +03:00
Note that this is an experimental feature, and I, @squidfunk, am not
2022-05-05 10:36:38 +03:00
proficient in Chinese (yet?). If you find a bug or think something can be
improved, please [open an issue].
[jieba]: https://pypi.org/project/jieba/
[zero-width whitespace]: https://en.wikipedia.org/wiki/Zero-width_space
[separator]: ../../plugins/search.md#config.separator
2022-05-05 10:36:38 +03:00
[q=支持]: ?q=支持
[open an issue]: https://github.com/squidfunk/mkdocs-material/issues/new/choose