Documentation

This commit is contained in:
squidfunk 2022-06-05 18:16:51 +02:00
parent 187711fa29
commit 10859be356
7 changed files with 102 additions and 45 deletions

View File

@ -1,3 +1,7 @@
mkdocs-material-8.3.2+insiders-4.17.2 (2022-06-05)
* Added support for custom jieba dictionaries (Chinese search)
mkdocs-material-8.3.2+insiders-4.17.1 (2022-06-05) mkdocs-material-8.3.2+insiders-4.17.1 (2022-06-05)
* Added support for cookie consent reject button * Added support for cookie consent reject button

View File

@ -197,8 +197,8 @@ the following steps are taken:
remain. Linking is necessary, as search results are grouped by page. remain. Linking is necessary, as search results are grouped by page.
2. __Tokenization__: The `title` and `text` values of each section are split 2. __Tokenization__: The `title` and `text` values of each section are split
into tokens by using the [separator] as configured in `mkdocs.yml`. into tokens by using the [`separator`][separator] as configured in
Tokenization itself is carried out by `mkdocs.yml`. Tokenization itself is carried out by
[lunr's default tokenizer][default tokenizer], which doesn't allow for [lunr's default tokenizer][default tokenizer], which doesn't allow for
lookahead or separators spanning multiple characters. lookahead or separators spanning multiple characters.
@ -216,7 +216,7 @@ more magic involved, e.g., search results are [post-processed] and [rescored] to
account for some shortcomings of [lunr], but in general, this is how data gets account for some shortcomings of [lunr], but in general, this is how data gets
into and out of the index. into and out of the index.
[separator]: ../../setup/setting-up-site-search.md#separator [separator]: ../../setup/setting-up-site-search.md#search-separator
[default tokenizer]: https://github.com/olivernn/lunr.js/blob/aa5a878f62a6bba1e8e5b95714899e17e8150b38/lunr.js#L413-L456 [default tokenizer]: https://github.com/olivernn/lunr.js/blob/aa5a878f62a6bba1e8e5b95714899e17e8150b38/lunr.js#L413-L456
[post-processed]: https://github.com/squidfunk/mkdocs-material/blob/ec7ccd2b2d15dd033740f388912f7be7738feec2/src/assets/javascripts/integrations/search/_/index.ts#L249-L272 [post-processed]: https://github.com/squidfunk/mkdocs-material/blob/ec7ccd2b2d15dd033740f388912f7be7738feec2/src/assets/javascripts/integrations/search/_/index.ts#L249-L272
[rescored]: https://github.com/squidfunk/mkdocs-material/blob/ec7ccd2b2d15dd033740f388912f7be7738feec2/src/assets/javascripts/integrations/search/_/index.ts#L274-L275 [rescored]: https://github.com/squidfunk/mkdocs-material/blob/ec7ccd2b2d15dd033740f388912f7be7738feec2/src/assets/javascripts/integrations/search/_/index.ts#L274-L275
@ -421,9 +421,9 @@ On to the next step in the process: __tokenization__.
### Tokenizer lookahead ### Tokenizer lookahead
The [default tokenizer] of [lunr] uses a regular expression to split a given The [default tokenizer] of [lunr] uses a regular expression to split a given
string by matching each character against the [separator] as defined in string by matching each character against the [`separator`][separator] as
`mkdocs.yml`. This doesn't allow for more complex separators based on defined in `mkdocs.yml`. This doesn't allow for more complex separators based
lookahead or multiple characters. on lookahead or multiple characters.
Fortunately, __our new search implementation provides an advanced tokenizer__ Fortunately, __our new search implementation provides an advanced tokenizer__
that doesn't have these shortcomings and supports more complex regular that doesn't have these shortcomings and supports more complex regular
@ -439,14 +439,14 @@ characters at which the string should be split, the following three sections
explain the remainder of the regular expression.[^4] explain the remainder of the regular expression.[^4]
[^4]: [^4]:
As a fun fact: the [separator default value] of the search plugin being As a fun fact: the [`separator`][separator] [default value] of the search
`[\s\-]+` always has been kind of irritating, as it suggests that multiple plugin being `[\s\-]+` always has been kind of irritating, as it suggests
characters can be considered being a separator. However, the `+` is that multiple characters can be considered being a separator. However, the
completely irrelevant, as regular expression groups involving multiple `+` is completely irrelevant, as regular expression groups involving
characters were never supported by multiple characters were never supported by
[lunr's default tokenizer][default tokenizer]. [lunr's default tokenizer][default tokenizer].
[separator default value]: https://www.mkdocs.org/user-guide/configuration/#separator [default value]: https://www.mkdocs.org/user-guide/configuration/#separator
#### Case changes #### Case changes

View File

@ -32,7 +32,7 @@ number of Chinese users.__
--- ---
After the United States and Germany, the third-largest country of origin of After the United States and Germany, the third-largest country of origin of
Material for MkDocs users is China. For a long time, the built-in search plugin Material for MkDocs users is China. For a long time, the [built-in search plugin]
didn't allow for proper segmentation of Chinese characters, mainly due to didn't allow for proper segmentation of Chinese characters, mainly due to
missing support in [lunr-languages] which is used for search tokenization and missing support in [lunr-languages] which is used for search tokenization and
stemming. The latest Insiders release adds long-awaited Chinese language support stemming. The latest Insiders release adds long-awaited Chinese language support
@ -58,10 +58,11 @@ through the segmenter. You can install [jieba] with:
pip install jieba pip install jieba
``` ```
The next step is only required if you specified the [separator] configuration The next step is only required if you specified the [`separator`][separator]
in `mkdocs.yml`. Text is segmented with [zero-width whitespace] characters, so configuration in `mkdocs.yml`. Text is segmented with [zero-width whitespace]
it renders exactly the same in the search modal. Adjust `mkdocs.yml` so that characters, so it renders exactly the same in the search modal. Adjust
the [separator] includes the `\u200b` character: `mkdocs.yml` so that the [`separator`][separator] includes the `\u200b`
character:
``` yaml ``` yaml
plugins: plugins:

View File

@ -33,11 +33,12 @@ number of Chinese users.__
--- ---
After the United States and Germany, the third-largest country of origin of After the United States and Germany, the third-largest country of origin of
Material for MkDocs users is China. For a long time, the built-in search plugin Material for MkDocs users is China. For a long time, the [built-in search plugin]
didn't allow for proper segmentation of Chinese characters, mainly due to didn't allow for proper segmentation of Chinese characters, mainly due to
missing support in [lunr-languages] which is used for search tokenization and missing support in [`lunr-languages`][lunr-languages] which is used for search
stemming. The latest Insiders release adds long-awaited Chinese language support tokenization and stemming. The latest Insiders release adds long-awaited Chinese
for the built-in search plugin, something that has been requested by many users. language support for the built-in search plugin, something that has been
requested by many users.
[:octicons-arrow-right-24: Continue reading][Chinese search support 中文搜索​支持] [:octicons-arrow-right-24: Continue reading][Chinese search support 中文搜索​支持]

View File

@ -6,6 +6,10 @@ template: overrides/main.html
## Material for MkDocs Insiders ## Material for MkDocs Insiders
### 4.17.2 <small>_ June 5, 2022</small> { id="4.17.2" }
- Added support for custom jieba dictionaries (Chinese search)
### 4.17.1 <small>_ June 5, 2022</small> { id="4.17.1" } ### 4.17.1 <small>_ June 5, 2022</small> { id="4.17.1" }
- Added support for cookie consent reject button - Added support for cookie consent reject button

View File

@ -104,15 +104,15 @@ The following properties are available:
: [:octicons-tag-24: insiders-4.17.1][Insiders] · :octicons-milestone-24: : [:octicons-tag-24: insiders-4.17.1][Insiders] · :octicons-milestone-24:
Default: `[accept, manage]` This property defines which buttons are shown Default: `[accept, manage]` This property defines which buttons are shown
and in which order, e.g. to allow the user to manage settings and accept and in which order, e.g. to allow the user to accept cookies and manage
the cookie: settings:
``` yaml ``` yaml
extra: extra:
consent: consent:
actions: actions:
- manage
- accept - accept
- manage
``` ```
The cookie consent form includes three types of buttons: The cookie consent form includes three types of buttons:

View File

@ -92,12 +92,6 @@ The following configuration options are supported:
part of this list by automatically falling back to the stemmer yielding the part of this list by automatically falling back to the stemmer yielding the
best result. best result.
!!! tip "Chinese search support 中文搜索​支持"
Material for MkDocs recently added __experimental language support for
Chinese__ as part of [Insiders]. [Read the blog article][chinese search]
to learn how to set up search for Chinese in a matter of minutes.
`separator`{ #search-separator } `separator`{ #search-separator }
: :octicons-milestone-24: Default: _automatically set_ The separator for : :octicons-milestone-24: Default: _automatically set_ The separator for
@ -112,10 +106,9 @@ The following configuration options are supported:
``` ```
1. Tokenization itself is carried out by [lunr's default tokenizer], which 1. Tokenization itself is carried out by [lunr's default tokenizer], which
doesn't allow for lookahead or separators spanning multiple characters. doesn't allow for lookahead or multi-character separators. For more
finegrained control over the tokenization process, see the section on
For more finegrained control over the tokenization process, see the [tokenizer lookahead].
section on [tokenizer lookahead].
<div class="mdx-deprecated" markdown> <div class="mdx-deprecated" markdown>
@ -142,14 +135,9 @@ The following configuration options are supported:
</div> </div>
The other configuration options of this plugin are not officially supported
by Material for MkDocs, which is why they may yield unexpected results. Use
them at your own risk.
[search support]: https://github.com/squidfunk/mkdocs-material/releases/tag/0.1.0 [search support]: https://github.com/squidfunk/mkdocs-material/releases/tag/0.1.0
[lunr]: https://lunrjs.com [lunr]: https://lunrjs.com
[lunr-languages]: https://github.com/MihaiValentin/lunr-languages [lunr-languages]: https://github.com/MihaiValentin/lunr-languages
[chinese search]: ../blog/2022/chinese-search-support.md
[lunr's default tokenizer]: https://github.com/olivernn/lunr.js/blob/aa5a878f62a6bba1e8e5b95714899e17e8150b38/lunr.js#L413-L456 [lunr's default tokenizer]: https://github.com/olivernn/lunr.js/blob/aa5a878f62a6bba1e8e5b95714899e17e8150b38/lunr.js#L413-L456
[site language]: changing-the-language.md#site-language [site language]: changing-the-language.md#site-language
[tokenizer lookahead]: #tokenizer-lookahead [tokenizer lookahead]: #tokenizer-lookahead
@ -157,13 +145,72 @@ them at your own risk.
[prebuilt index]: https://www.mkdocs.org/user-guide/configuration/#prebuild_index [prebuilt index]: https://www.mkdocs.org/user-guide/configuration/#prebuild_index
[50% smaller]: ../blog/2021/search-better-faster-smaller.md#benchmarks [50% smaller]: ../blog/2021/search-better-faster-smaller.md#benchmarks
#### Chinese language support
[:octicons-heart-fill-24:{ .mdx-heart } Sponsors only][Insiders]{ .mdx-insiders } ·
[:octicons-tag-24: insiders-4.14.0][Insiders] ·
:octicons-beaker-24: Experimental
[Insiders] adds search support for the Chinese language (see our [blog article]
[chinese search] from May 2022) by integrating with the text segmentation
library [jieba], which can be installed with `pip`.
``` sh
pip install jieba
```
If [jieba] is installed, the [built-in search plugin] automatically detects
Chinese characters and runs them through the segmenter. The following
configuration options are available:
`jieba_dict`{ #jieba-dict }
: [:octicons-tag-24: insiders-4.17.2][Insiders] · :octicons-milestone-24:
Default: _none_ This option allows for specifying a [custom dictionary]
to be used by [jieba] for segmenting text, replacing the default dictionary:
``` yaml
plugins:
- search:
jieba_dict: dict.txt # (1)!
```
1. The following alternative dictionaries are provided by [jieba]:
- [dict.txt.small] 占用内存较小的词典文件
- [dict.txt.big] 支持繁体分词更好的词典文件
`jieba_dict_user`{ #jieba-dict-user }
: [:octicons-tag-24: insiders-4.17.2][Insiders] · :octicons-milestone-24:
Default: _none_ This option allows for specifying an additional
[user dictionary] to be used by [jieba] for segmenting text, augmenting the
default dictionary:
``` yaml
plugins:
- search:
jieba_dict_user: user_dict.txt
```
User dictionaries can be used for tuning the segmenter to preserve
technical terms.
[chinese search]: ../blog/2022/chinese-search-support.md
[jieba]: https://pypi.org/project/jieba/
[built-in search plugin]: #built-in-search-plugin
[custom dictionary]: https://github.com/fxsjy/jieba#%E5%85%B6%E4%BB%96%E8%AF%8D%E5%85%B8
[dict.txt.small]: https://github.com/fxsjy/jieba/raw/master/extra_dict/dict.txt.small
[dict.txt.big]: https://github.com/fxsjy/jieba/raw/master/extra_dict/dict.txt.big
[user dictionary]: https://github.com/fxsjy/jieba#%E8%BD%BD%E5%85%A5%E8%AF%8D%E5%85%B8
### Rich search previews ### Rich search previews
[:octicons-heart-fill-24:{ .mdx-heart } Sponsors only][Insiders]{ .mdx-insiders } · [:octicons-heart-fill-24:{ .mdx-heart } Sponsors only][Insiders]{ .mdx-insiders } ·
[:octicons-tag-24: insiders-3.0.0][Insiders] · [:octicons-tag-24: insiders-3.0.0][Insiders] ·
:octicons-beaker-24: Experimental :octicons-beaker-24: Experimental
Insiders ships rich search previews as part of the [new search plugin], which [Insiders] ships rich search previews as part of the [new search plugin], which
will render code blocks directly in the search result, and highlight all will render code blocks directly in the search result, and highlight all
occurrences inside those blocks: occurrences inside those blocks:
@ -186,7 +233,7 @@ occurrences inside those blocks:
[:octicons-tag-24: insiders-3.0.0][Insiders] · [:octicons-tag-24: insiders-3.0.0][Insiders] ·
:octicons-beaker-24: Experimental :octicons-beaker-24: Experimental
Insiders allows for more complex configurations of the [`separator`][separator] [Insiders] allows for more complex configurations of the [`separator`][separator]
setting as part of the [new search plugin], yielding more influence on the way setting as part of the [new search plugin], yielding more influence on the way
documents are tokenized: documents are tokenized: