mirror of
https://github.com/squidfunk/mkdocs-material.git
synced 2024-06-14 11:52:32 +03:00
Documentation
This commit is contained in:
parent
187711fa29
commit
10859be356
@ -1,3 +1,7 @@
|
||||
mkdocs-material-8.3.2+insiders-4.17.2 (2022-06-05)
|
||||
|
||||
* Added support for custom jieba dictionaries (Chinese search)
|
||||
|
||||
mkdocs-material-8.3.2+insiders-4.17.1 (2022-06-05)
|
||||
|
||||
* Added support for cookie consent reject button
|
||||
|
@ -197,8 +197,8 @@ the following steps are taken:
|
||||
remain. Linking is necessary, as search results are grouped by page.
|
||||
|
||||
2. __Tokenization__: The `title` and `text` values of each section are split
|
||||
into tokens by using the [separator] as configured in `mkdocs.yml`.
|
||||
Tokenization itself is carried out by
|
||||
into tokens by using the [`separator`][separator] as configured in
|
||||
`mkdocs.yml`. Tokenization itself is carried out by
|
||||
[lunr's default tokenizer][default tokenizer], which doesn't allow for
|
||||
lookahead or separators spanning multiple characters.
|
||||
|
||||
@ -216,7 +216,7 @@ more magic involved, e.g., search results are [post-processed] and [rescored] to
|
||||
account for some shortcomings of [lunr], but in general, this is how data gets
|
||||
into and out of the index.
|
||||
|
||||
[separator]: ../../setup/setting-up-site-search.md#separator
|
||||
[separator]: ../../setup/setting-up-site-search.md#search-separator
|
||||
[default tokenizer]: https://github.com/olivernn/lunr.js/blob/aa5a878f62a6bba1e8e5b95714899e17e8150b38/lunr.js#L413-L456
|
||||
[post-processed]: https://github.com/squidfunk/mkdocs-material/blob/ec7ccd2b2d15dd033740f388912f7be7738feec2/src/assets/javascripts/integrations/search/_/index.ts#L249-L272
|
||||
[rescored]: https://github.com/squidfunk/mkdocs-material/blob/ec7ccd2b2d15dd033740f388912f7be7738feec2/src/assets/javascripts/integrations/search/_/index.ts#L274-L275
|
||||
@ -421,9 +421,9 @@ On to the next step in the process: __tokenization__.
|
||||
### Tokenizer lookahead
|
||||
|
||||
The [default tokenizer] of [lunr] uses a regular expression to split a given
|
||||
string by matching each character against the [separator] as defined in
|
||||
`mkdocs.yml`. This doesn't allow for more complex separators based on
|
||||
lookahead or multiple characters.
|
||||
string by matching each character against the [`separator`][separator] as
|
||||
defined in `mkdocs.yml`. This doesn't allow for more complex separators based
|
||||
on lookahead or multiple characters.
|
||||
|
||||
Fortunately, __our new search implementation provides an advanced tokenizer__
|
||||
that doesn't have these shortcomings and supports more complex regular
|
||||
@ -439,14 +439,14 @@ characters at which the string should be split, the following three sections
|
||||
explain the remainder of the regular expression.[^4]
|
||||
|
||||
[^4]:
|
||||
As a fun fact: the [separator default value] of the search plugin being
|
||||
`[\s\-]+` always has been kind of irritating, as it suggests that multiple
|
||||
characters can be considered being a separator. However, the `+` is
|
||||
completely irrelevant, as regular expression groups involving multiple
|
||||
characters were never supported by
|
||||
As a fun fact: the [`separator`][separator] [default value] of the search
|
||||
plugin being `[\s\-]+` always has been kind of irritating, as it suggests
|
||||
that multiple characters can be considered being a separator. However, the
|
||||
`+` is completely irrelevant, as regular expression groups involving
|
||||
multiple characters were never supported by
|
||||
[lunr's default tokenizer][default tokenizer].
|
||||
|
||||
[separator default value]: https://www.mkdocs.org/user-guide/configuration/#separator
|
||||
[default value]: https://www.mkdocs.org/user-guide/configuration/#separator
|
||||
|
||||
#### Case changes
|
||||
|
||||
|
@ -32,7 +32,7 @@ number of Chinese users.__
|
||||
---
|
||||
|
||||
After the United States and Germany, the third-largest country of origin of
|
||||
Material for MkDocs users is China. For a long time, the built-in search plugin
|
||||
Material for MkDocs users is China. For a long time, the [built-in search plugin]
|
||||
didn't allow for proper segmentation of Chinese characters, mainly due to
|
||||
missing support in [lunr-languages] which is used for search tokenization and
|
||||
stemming. The latest Insiders release adds long-awaited Chinese language support
|
||||
@ -58,10 +58,11 @@ through the segmenter. You can install [jieba] with:
|
||||
pip install jieba
|
||||
```
|
||||
|
||||
The next step is only required if you specified the [separator] configuration
|
||||
in `mkdocs.yml`. Text is segmented with [zero-width whitespace] characters, so
|
||||
it renders exactly the same in the search modal. Adjust `mkdocs.yml` so that
|
||||
the [separator] includes the `\u200b` character:
|
||||
The next step is only required if you specified the [`separator`][separator]
|
||||
configuration in `mkdocs.yml`. Text is segmented with [zero-width whitespace]
|
||||
characters, so it renders exactly the same in the search modal. Adjust
|
||||
`mkdocs.yml` so that the [`separator`][separator] includes the `\u200b`
|
||||
character:
|
||||
|
||||
``` yaml
|
||||
plugins:
|
||||
|
@ -33,11 +33,12 @@ number of Chinese users.__
|
||||
---
|
||||
|
||||
After the United States and Germany, the third-largest country of origin of
|
||||
Material for MkDocs users is China. For a long time, the built-in search plugin
|
||||
Material for MkDocs users is China. For a long time, the [built-in search plugin]
|
||||
didn't allow for proper segmentation of Chinese characters, mainly due to
|
||||
missing support in [lunr-languages] which is used for search tokenization and
|
||||
stemming. The latest Insiders release adds long-awaited Chinese language support
|
||||
for the built-in search plugin, something that has been requested by many users.
|
||||
missing support in [`lunr-languages`][lunr-languages] which is used for search
|
||||
tokenization and stemming. The latest Insiders release adds long-awaited Chinese
|
||||
language support for the built-in search plugin, something that has been
|
||||
requested by many users.
|
||||
|
||||
[:octicons-arrow-right-24: Continue reading][Chinese search support – 中文搜索支持]
|
||||
|
||||
|
@ -6,6 +6,10 @@ template: overrides/main.html
|
||||
|
||||
## Material for MkDocs Insiders
|
||||
|
||||
### 4.17.2 <small>_ June 5, 2022</small> { id="4.17.2" }
|
||||
|
||||
- Added support for custom jieba dictionaries (Chinese search)
|
||||
|
||||
### 4.17.1 <small>_ June 5, 2022</small> { id="4.17.1" }
|
||||
|
||||
- Added support for cookie consent reject button
|
||||
|
@ -104,15 +104,15 @@ The following properties are available:
|
||||
|
||||
: [:octicons-tag-24: insiders-4.17.1][Insiders] · :octicons-milestone-24:
|
||||
Default: `[accept, manage]` – This property defines which buttons are shown
|
||||
and in which order, e.g. to allow the user to manage settings and accept
|
||||
the cookie:
|
||||
and in which order, e.g. to allow the user to accept cookies and manage
|
||||
settings:
|
||||
|
||||
``` yaml
|
||||
extra:
|
||||
consent:
|
||||
actions:
|
||||
- manage
|
||||
- accept
|
||||
- manage
|
||||
```
|
||||
|
||||
The cookie consent form includes three types of buttons:
|
||||
|
@ -92,12 +92,6 @@ The following configuration options are supported:
|
||||
part of this list by automatically falling back to the stemmer yielding the
|
||||
best result.
|
||||
|
||||
!!! tip "Chinese search support – 中文搜索支持"
|
||||
|
||||
Material for MkDocs recently added __experimental language support for
|
||||
Chinese__ as part of [Insiders]. [Read the blog article][chinese search]
|
||||
to learn how to set up search for Chinese in a matter of minutes.
|
||||
|
||||
`separator`{ #search-separator }
|
||||
|
||||
: :octicons-milestone-24: Default: _automatically set_ – The separator for
|
||||
@ -112,10 +106,9 @@ The following configuration options are supported:
|
||||
```
|
||||
|
||||
1. Tokenization itself is carried out by [lunr's default tokenizer], which
|
||||
doesn't allow for lookahead or separators spanning multiple characters.
|
||||
|
||||
For more finegrained control over the tokenization process, see the
|
||||
section on [tokenizer lookahead].
|
||||
doesn't allow for lookahead or multi-character separators. For more
|
||||
finegrained control over the tokenization process, see the section on
|
||||
[tokenizer lookahead].
|
||||
|
||||
<div class="mdx-deprecated" markdown>
|
||||
|
||||
@ -142,14 +135,9 @@ The following configuration options are supported:
|
||||
|
||||
</div>
|
||||
|
||||
The other configuration options of this plugin are not officially supported
|
||||
by Material for MkDocs, which is why they may yield unexpected results. Use
|
||||
them at your own risk.
|
||||
|
||||
[search support]: https://github.com/squidfunk/mkdocs-material/releases/tag/0.1.0
|
||||
[lunr]: https://lunrjs.com
|
||||
[lunr-languages]: https://github.com/MihaiValentin/lunr-languages
|
||||
[chinese search]: ../blog/2022/chinese-search-support.md
|
||||
[lunr's default tokenizer]: https://github.com/olivernn/lunr.js/blob/aa5a878f62a6bba1e8e5b95714899e17e8150b38/lunr.js#L413-L456
|
||||
[site language]: changing-the-language.md#site-language
|
||||
[tokenizer lookahead]: #tokenizer-lookahead
|
||||
@ -157,13 +145,72 @@ them at your own risk.
|
||||
[prebuilt index]: https://www.mkdocs.org/user-guide/configuration/#prebuild_index
|
||||
[50% smaller]: ../blog/2021/search-better-faster-smaller.md#benchmarks
|
||||
|
||||
#### Chinese language support
|
||||
|
||||
[:octicons-heart-fill-24:{ .mdx-heart } Sponsors only][Insiders]{ .mdx-insiders } ·
|
||||
[:octicons-tag-24: insiders-4.14.0][Insiders] ·
|
||||
:octicons-beaker-24: Experimental
|
||||
|
||||
[Insiders] adds search support for the Chinese language (see our [blog article]
|
||||
[chinese search] from May 2022) by integrating with the text segmentation
|
||||
library [jieba], which can be installed with `pip`.
|
||||
|
||||
``` sh
|
||||
pip install jieba
|
||||
```
|
||||
|
||||
If [jieba] is installed, the [built-in search plugin] automatically detects
|
||||
Chinese characters and runs them through the segmenter. The following
|
||||
configuration options are available:
|
||||
|
||||
`jieba_dict`{ #jieba-dict }
|
||||
|
||||
: [:octicons-tag-24: insiders-4.17.2][Insiders] · :octicons-milestone-24:
|
||||
Default: _none_ – This option allows for specifying a [custom dictionary]
|
||||
to be used by [jieba] for segmenting text, replacing the default dictionary:
|
||||
|
||||
``` yaml
|
||||
plugins:
|
||||
- search:
|
||||
jieba_dict: dict.txt # (1)!
|
||||
```
|
||||
|
||||
1. The following alternative dictionaries are provided by [jieba]:
|
||||
|
||||
- [dict.txt.small] – 占用内存较小的词典文件
|
||||
- [dict.txt.big] – 支持繁体分词更好的词典文件
|
||||
|
||||
`jieba_dict_user`{ #jieba-dict-user }
|
||||
|
||||
: [:octicons-tag-24: insiders-4.17.2][Insiders] · :octicons-milestone-24:
|
||||
Default: _none_ – This option allows for specifying an additional
|
||||
[user dictionary] to be used by [jieba] for segmenting text, augmenting the
|
||||
default dictionary:
|
||||
|
||||
``` yaml
|
||||
plugins:
|
||||
- search:
|
||||
jieba_dict_user: user_dict.txt
|
||||
```
|
||||
|
||||
User dictionaries can be used for tuning the segmenter to preserve
|
||||
technical terms.
|
||||
|
||||
[chinese search]: ../blog/2022/chinese-search-support.md
|
||||
[jieba]: https://pypi.org/project/jieba/
|
||||
[built-in search plugin]: #built-in-search-plugin
|
||||
[custom dictionary]: https://github.com/fxsjy/jieba#%E5%85%B6%E4%BB%96%E8%AF%8D%E5%85%B8
|
||||
[dict.txt.small]: https://github.com/fxsjy/jieba/raw/master/extra_dict/dict.txt.small
|
||||
[dict.txt.big]: https://github.com/fxsjy/jieba/raw/master/extra_dict/dict.txt.big
|
||||
[user dictionary]: https://github.com/fxsjy/jieba#%E8%BD%BD%E5%85%A5%E8%AF%8D%E5%85%B8
|
||||
|
||||
### Rich search previews
|
||||
|
||||
[:octicons-heart-fill-24:{ .mdx-heart } Sponsors only][Insiders]{ .mdx-insiders } ·
|
||||
[:octicons-tag-24: insiders-3.0.0][Insiders] ·
|
||||
:octicons-beaker-24: Experimental
|
||||
|
||||
Insiders ships rich search previews as part of the [new search plugin], which
|
||||
[Insiders] ships rich search previews as part of the [new search plugin], which
|
||||
will render code blocks directly in the search result, and highlight all
|
||||
occurrences inside those blocks:
|
||||
|
||||
@ -186,7 +233,7 @@ occurrences inside those blocks:
|
||||
[:octicons-tag-24: insiders-3.0.0][Insiders] ·
|
||||
:octicons-beaker-24: Experimental
|
||||
|
||||
Insiders allows for more complex configurations of the [`separator`][separator]
|
||||
[Insiders] allows for more complex configurations of the [`separator`][separator]
|
||||
setting as part of the [new search plugin], yielding more influence on the way
|
||||
documents are tokenized:
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user