Documentation

This commit is contained in:
squidfunk 2022-06-05 18:16:51 +02:00
parent 187711fa29
commit 10859be356
7 changed files with 102 additions and 45 deletions

View File

@ -1,3 +1,7 @@
mkdocs-material-8.3.2+insiders-4.17.2 (2022-06-05)
* Added support for custom jieba dictionaries (Chinese search)
mkdocs-material-8.3.2+insiders-4.17.1 (2022-06-05)
* Added support for cookie consent reject button

View File

@ -197,8 +197,8 @@ the following steps are taken:
remain. Linking is necessary, as search results are grouped by page.
2. __Tokenization__: The `title` and `text` values of each section are split
into tokens by using the [separator] as configured in `mkdocs.yml`.
Tokenization itself is carried out by
into tokens by using the [`separator`][separator] as configured in
`mkdocs.yml`. Tokenization itself is carried out by
[lunr's default tokenizer][default tokenizer], which doesn't allow for
lookahead or separators spanning multiple characters.
@ -216,7 +216,7 @@ more magic involved, e.g., search results are [post-processed] and [rescored] to
account for some shortcomings of [lunr], but in general, this is how data gets
into and out of the index.
[separator]: ../../setup/setting-up-site-search.md#separator
[separator]: ../../setup/setting-up-site-search.md#search-separator
[default tokenizer]: https://github.com/olivernn/lunr.js/blob/aa5a878f62a6bba1e8e5b95714899e17e8150b38/lunr.js#L413-L456
[post-processed]: https://github.com/squidfunk/mkdocs-material/blob/ec7ccd2b2d15dd033740f388912f7be7738feec2/src/assets/javascripts/integrations/search/_/index.ts#L249-L272
[rescored]: https://github.com/squidfunk/mkdocs-material/blob/ec7ccd2b2d15dd033740f388912f7be7738feec2/src/assets/javascripts/integrations/search/_/index.ts#L274-L275
@ -421,9 +421,9 @@ On to the next step in the process: __tokenization__.
### Tokenizer lookahead
The [default tokenizer] of [lunr] uses a regular expression to split a given
string by matching each character against the [separator] as defined in
`mkdocs.yml`. This doesn't allow for more complex separators based on
lookahead or multiple characters.
string by matching each character against the [`separator`][separator] as
defined in `mkdocs.yml`. This doesn't allow for more complex separators based
on lookahead or multiple characters.
Fortunately, __our new search implementation provides an advanced tokenizer__
that doesn't have these shortcomings and supports more complex regular
@ -439,14 +439,14 @@ characters at which the string should be split, the following three sections
explain the remainder of the regular expression.[^4]
[^4]:
As a fun fact: the [separator default value] of the search plugin being
`[\s\-]+` always has been kind of irritating, as it suggests that multiple
characters can be considered being a separator. However, the `+` is
completely irrelevant, as regular expression groups involving multiple
characters were never supported by
As a fun fact: the [`separator`][separator] [default value] of the search
plugin being `[\s\-]+` always has been kind of irritating, as it suggests
that multiple characters can be considered being a separator. However, the
`+` is completely irrelevant, as regular expression groups involving
multiple characters were never supported by
[lunr's default tokenizer][default tokenizer].
[separator default value]: https://www.mkdocs.org/user-guide/configuration/#separator
[default value]: https://www.mkdocs.org/user-guide/configuration/#separator
#### Case changes

View File

@ -32,10 +32,10 @@ number of Chinese users.__
---
After the United States and Germany, the third-largest country of origin of
Material for MkDocs users is China. For a long time, the built-in search plugin
Material for MkDocs users is China. For a long time, the [built-in search plugin]
didn't allow for proper segmentation of Chinese characters, mainly due to
missing support in [lunr-languages] which is used for search tokenization and
stemming. The latest Insiders release adds long-awaited Chinese language support
missing support in [lunr-languages] which is used for search tokenization and
stemming. The latest Insiders release adds long-awaited Chinese language support
for the built-in search plugin, something that has been requested by many users.
_Material for MkDocs終於支持中文文本正確分割並且容易找到。_
@ -50,18 +50,19 @@ search plugin in a few minutes._
## Configuration
Chinese language support for Material for MkDocs is provided by [jieba], an
excellent Chinese text segmentation library. If [jieba] is installed, the
built-in search plugin automatically detects Chinese characters and runs them
excellent Chinese text segmentation library. If [jieba] is installed, the
built-in search plugin automatically detects Chinese characters and runs them
through the segmenter. You can install [jieba] with:
```
pip install jieba
```
The next step is only required if you specified the [separator] configuration
in `mkdocs.yml`. Text is segmented with [zero-width whitespace] characters, so
it renders exactly the same in the search modal. Adjust `mkdocs.yml` so that
the [separator] includes the `\u200b` character:
The next step is only required if you specified the [`separator`][separator]
configuration in `mkdocs.yml`. Text is segmented with [zero-width whitespace]
characters, so it renders exactly the same in the search modal. Adjust
`mkdocs.yml` so that the [`separator`][separator] includes the `\u200b`
character:
``` yaml
plugins:

View File

@ -33,11 +33,12 @@ number of Chinese users.__
---
After the United States and Germany, the third-largest country of origin of
Material for MkDocs users is China. For a long time, the built-in search plugin
Material for MkDocs users is China. For a long time, the [built-in search plugin]
didn't allow for proper segmentation of Chinese characters, mainly due to
missing support in [lunr-languages] which is used for search tokenization and
stemming. The latest Insiders release adds long-awaited Chinese language support
for the built-in search plugin, something that has been requested by many users.
missing support in [`lunr-languages`][lunr-languages] which is used for search
tokenization and stemming. The latest Insiders release adds long-awaited Chinese
language support for the built-in search plugin, something that has been
requested by many users.
[:octicons-arrow-right-24: Continue reading][Chinese search support 中文搜索​支持]

View File

@ -6,6 +6,10 @@ template: overrides/main.html
## Material for MkDocs Insiders
### 4.17.2 <small>_ June 5, 2022</small> { id="4.17.2" }
- Added support for custom jieba dictionaries (Chinese search)
### 4.17.1 <small>_ June 5, 2022</small> { id="4.17.1" }
- Added support for cookie consent reject button

View File

@ -104,15 +104,15 @@ The following properties are available:
: [:octicons-tag-24: insiders-4.17.1][Insiders] · :octicons-milestone-24:
Default: `[accept, manage]` This property defines which buttons are shown
and in which order, e.g. to allow the user to manage settings and accept
the cookie:
and in which order, e.g. to allow the user to accept cookies and manage
settings:
``` yaml
extra:
consent:
actions:
- manage
- accept
- manage
```
The cookie consent form includes three types of buttons:

View File

@ -92,12 +92,6 @@ The following configuration options are supported:
part of this list by automatically falling back to the stemmer yielding the
best result.
!!! tip "Chinese search support 中文搜索​支持"
Material for MkDocs recently added __experimental language support for
Chinese__ as part of [Insiders]. [Read the blog article][chinese search]
to learn how to set up search for Chinese in a matter of minutes.
`separator`{ #search-separator }
: :octicons-milestone-24: Default: _automatically set_ The separator for
@ -112,10 +106,9 @@ The following configuration options are supported:
```
1. Tokenization itself is carried out by [lunr's default tokenizer], which
doesn't allow for lookahead or separators spanning multiple characters.
For more finegrained control over the tokenization process, see the
section on [tokenizer lookahead].
doesn't allow for lookahead or multi-character separators. For more
finegrained control over the tokenization process, see the section on
[tokenizer lookahead].
<div class="mdx-deprecated" markdown>
@ -142,14 +135,9 @@ The following configuration options are supported:
</div>
The other configuration options of this plugin are not officially supported
by Material for MkDocs, which is why they may yield unexpected results. Use
them at your own risk.
[search support]: https://github.com/squidfunk/mkdocs-material/releases/tag/0.1.0
[lunr]: https://lunrjs.com
[lunr-languages]: https://github.com/MihaiValentin/lunr-languages
[chinese search]: ../blog/2022/chinese-search-support.md
[lunr's default tokenizer]: https://github.com/olivernn/lunr.js/blob/aa5a878f62a6bba1e8e5b95714899e17e8150b38/lunr.js#L413-L456
[site language]: changing-the-language.md#site-language
[tokenizer lookahead]: #tokenizer-lookahead
@ -157,13 +145,72 @@ them at your own risk.
[prebuilt index]: https://www.mkdocs.org/user-guide/configuration/#prebuild_index
[50% smaller]: ../blog/2021/search-better-faster-smaller.md#benchmarks
#### Chinese language support
[:octicons-heart-fill-24:{ .mdx-heart } Sponsors only][Insiders]{ .mdx-insiders } ·
[:octicons-tag-24: insiders-4.14.0][Insiders] ·
:octicons-beaker-24: Experimental
[Insiders] adds search support for the Chinese language (see our [blog article]
[chinese search] from May 2022) by integrating with the text segmentation
library [jieba], which can be installed with `pip`.
``` sh
pip install jieba
```
If [jieba] is installed, the [built-in search plugin] automatically detects
Chinese characters and runs them through the segmenter. The following
configuration options are available:
`jieba_dict`{ #jieba-dict }
: [:octicons-tag-24: insiders-4.17.2][Insiders] · :octicons-milestone-24:
Default: _none_ This option allows for specifying a [custom dictionary]
to be used by [jieba] for segmenting text, replacing the default dictionary:
``` yaml
plugins:
- search:
jieba_dict: dict.txt # (1)!
```
1. The following alternative dictionaries are provided by [jieba]:
- [dict.txt.small] 占用内存较小的词典文件
- [dict.txt.big] 支持繁体分词更好的词典文件
`jieba_dict_user`{ #jieba-dict-user }
: [:octicons-tag-24: insiders-4.17.2][Insiders] · :octicons-milestone-24:
Default: _none_ This option allows for specifying an additional
[user dictionary] to be used by [jieba] for segmenting text, augmenting the
default dictionary:
``` yaml
plugins:
- search:
jieba_dict_user: user_dict.txt
```
User dictionaries can be used for tuning the segmenter to preserve
technical terms.
[chinese search]: ../blog/2022/chinese-search-support.md
[jieba]: https://pypi.org/project/jieba/
[built-in search plugin]: #built-in-search-plugin
[custom dictionary]: https://github.com/fxsjy/jieba#%E5%85%B6%E4%BB%96%E8%AF%8D%E5%85%B8
[dict.txt.small]: https://github.com/fxsjy/jieba/raw/master/extra_dict/dict.txt.small
[dict.txt.big]: https://github.com/fxsjy/jieba/raw/master/extra_dict/dict.txt.big
[user dictionary]: https://github.com/fxsjy/jieba#%E8%BD%BD%E5%85%A5%E8%AF%8D%E5%85%B8
### Rich search previews
[:octicons-heart-fill-24:{ .mdx-heart } Sponsors only][Insiders]{ .mdx-insiders } ·
[:octicons-tag-24: insiders-3.0.0][Insiders] ·
:octicons-beaker-24: Experimental
Insiders ships rich search previews as part of the [new search plugin], which
[Insiders] ships rich search previews as part of the [new search plugin], which
will render code blocks directly in the search result, and highlight all
occurrences inside those blocks:
@ -186,7 +233,7 @@ occurrences inside those blocks:
[:octicons-tag-24: insiders-3.0.0][Insiders] ·
:octicons-beaker-24: Experimental
Insiders allows for more complex configurations of the [`separator`][separator]
[Insiders] allows for more complex configurations of the [`separator`][separator]
setting as part of the [new search plugin], yielding more influence on the way
documents are tokenized: