mirror of
https://github.com/squidfunk/mkdocs-material.git
synced 2024-06-14 11:52:32 +03:00
Documentation
This commit is contained in:
parent
187711fa29
commit
10859be356
@ -1,3 +1,7 @@
|
|||||||
|
mkdocs-material-8.3.2+insiders-4.17.2 (2022-06-05)
|
||||||
|
|
||||||
|
* Added support for custom jieba dictionaries (Chinese search)
|
||||||
|
|
||||||
mkdocs-material-8.3.2+insiders-4.17.1 (2022-06-05)
|
mkdocs-material-8.3.2+insiders-4.17.1 (2022-06-05)
|
||||||
|
|
||||||
* Added support for cookie consent reject button
|
* Added support for cookie consent reject button
|
||||||
|
@ -197,8 +197,8 @@ the following steps are taken:
|
|||||||
remain. Linking is necessary, as search results are grouped by page.
|
remain. Linking is necessary, as search results are grouped by page.
|
||||||
|
|
||||||
2. __Tokenization__: The `title` and `text` values of each section are split
|
2. __Tokenization__: The `title` and `text` values of each section are split
|
||||||
into tokens by using the [separator] as configured in `mkdocs.yml`.
|
into tokens by using the [`separator`][separator] as configured in
|
||||||
Tokenization itself is carried out by
|
`mkdocs.yml`. Tokenization itself is carried out by
|
||||||
[lunr's default tokenizer][default tokenizer], which doesn't allow for
|
[lunr's default tokenizer][default tokenizer], which doesn't allow for
|
||||||
lookahead or separators spanning multiple characters.
|
lookahead or separators spanning multiple characters.
|
||||||
|
|
||||||
@ -216,7 +216,7 @@ more magic involved, e.g., search results are [post-processed] and [rescored] to
|
|||||||
account for some shortcomings of [lunr], but in general, this is how data gets
|
account for some shortcomings of [lunr], but in general, this is how data gets
|
||||||
into and out of the index.
|
into and out of the index.
|
||||||
|
|
||||||
[separator]: ../../setup/setting-up-site-search.md#separator
|
[separator]: ../../setup/setting-up-site-search.md#search-separator
|
||||||
[default tokenizer]: https://github.com/olivernn/lunr.js/blob/aa5a878f62a6bba1e8e5b95714899e17e8150b38/lunr.js#L413-L456
|
[default tokenizer]: https://github.com/olivernn/lunr.js/blob/aa5a878f62a6bba1e8e5b95714899e17e8150b38/lunr.js#L413-L456
|
||||||
[post-processed]: https://github.com/squidfunk/mkdocs-material/blob/ec7ccd2b2d15dd033740f388912f7be7738feec2/src/assets/javascripts/integrations/search/_/index.ts#L249-L272
|
[post-processed]: https://github.com/squidfunk/mkdocs-material/blob/ec7ccd2b2d15dd033740f388912f7be7738feec2/src/assets/javascripts/integrations/search/_/index.ts#L249-L272
|
||||||
[rescored]: https://github.com/squidfunk/mkdocs-material/blob/ec7ccd2b2d15dd033740f388912f7be7738feec2/src/assets/javascripts/integrations/search/_/index.ts#L274-L275
|
[rescored]: https://github.com/squidfunk/mkdocs-material/blob/ec7ccd2b2d15dd033740f388912f7be7738feec2/src/assets/javascripts/integrations/search/_/index.ts#L274-L275
|
||||||
@ -421,9 +421,9 @@ On to the next step in the process: __tokenization__.
|
|||||||
### Tokenizer lookahead
|
### Tokenizer lookahead
|
||||||
|
|
||||||
The [default tokenizer] of [lunr] uses a regular expression to split a given
|
The [default tokenizer] of [lunr] uses a regular expression to split a given
|
||||||
string by matching each character against the [separator] as defined in
|
string by matching each character against the [`separator`][separator] as
|
||||||
`mkdocs.yml`. This doesn't allow for more complex separators based on
|
defined in `mkdocs.yml`. This doesn't allow for more complex separators based
|
||||||
lookahead or multiple characters.
|
on lookahead or multiple characters.
|
||||||
|
|
||||||
Fortunately, __our new search implementation provides an advanced tokenizer__
|
Fortunately, __our new search implementation provides an advanced tokenizer__
|
||||||
that doesn't have these shortcomings and supports more complex regular
|
that doesn't have these shortcomings and supports more complex regular
|
||||||
@ -439,14 +439,14 @@ characters at which the string should be split, the following three sections
|
|||||||
explain the remainder of the regular expression.[^4]
|
explain the remainder of the regular expression.[^4]
|
||||||
|
|
||||||
[^4]:
|
[^4]:
|
||||||
As a fun fact: the [separator default value] of the search plugin being
|
As a fun fact: the [`separator`][separator] [default value] of the search
|
||||||
`[\s\-]+` always has been kind of irritating, as it suggests that multiple
|
plugin being `[\s\-]+` always has been kind of irritating, as it suggests
|
||||||
characters can be considered being a separator. However, the `+` is
|
that multiple characters can be considered being a separator. However, the
|
||||||
completely irrelevant, as regular expression groups involving multiple
|
`+` is completely irrelevant, as regular expression groups involving
|
||||||
characters were never supported by
|
multiple characters were never supported by
|
||||||
[lunr's default tokenizer][default tokenizer].
|
[lunr's default tokenizer][default tokenizer].
|
||||||
|
|
||||||
[separator default value]: https://www.mkdocs.org/user-guide/configuration/#separator
|
[default value]: https://www.mkdocs.org/user-guide/configuration/#separator
|
||||||
|
|
||||||
#### Case changes
|
#### Case changes
|
||||||
|
|
||||||
|
@ -32,10 +32,10 @@ number of Chinese users.__
|
|||||||
---
|
---
|
||||||
|
|
||||||
After the United States and Germany, the third-largest country of origin of
|
After the United States and Germany, the third-largest country of origin of
|
||||||
Material for MkDocs users is China. For a long time, the built-in search plugin
|
Material for MkDocs users is China. For a long time, the [built-in search plugin]
|
||||||
didn't allow for proper segmentation of Chinese characters, mainly due to
|
didn't allow for proper segmentation of Chinese characters, mainly due to
|
||||||
missing support in [lunr-languages] which is used for search tokenization and
|
missing support in [lunr-languages] which is used for search tokenization and
|
||||||
stemming. The latest Insiders release adds long-awaited Chinese language support
|
stemming. The latest Insiders release adds long-awaited Chinese language support
|
||||||
for the built-in search plugin, something that has been requested by many users.
|
for the built-in search plugin, something that has been requested by many users.
|
||||||
|
|
||||||
_Material for MkDocs終於支持中文了!文本被正確分割並且更容易找到。_
|
_Material for MkDocs終於支持中文了!文本被正確分割並且更容易找到。_
|
||||||
@ -50,18 +50,19 @@ search plugin in a few minutes._
|
|||||||
## Configuration
|
## Configuration
|
||||||
|
|
||||||
Chinese language support for Material for MkDocs is provided by [jieba], an
|
Chinese language support for Material for MkDocs is provided by [jieba], an
|
||||||
excellent Chinese text segmentation library. If [jieba] is installed, the
|
excellent Chinese text segmentation library. If [jieba] is installed, the
|
||||||
built-in search plugin automatically detects Chinese characters and runs them
|
built-in search plugin automatically detects Chinese characters and runs them
|
||||||
through the segmenter. You can install [jieba] with:
|
through the segmenter. You can install [jieba] with:
|
||||||
|
|
||||||
```
|
```
|
||||||
pip install jieba
|
pip install jieba
|
||||||
```
|
```
|
||||||
|
|
||||||
The next step is only required if you specified the [separator] configuration
|
The next step is only required if you specified the [`separator`][separator]
|
||||||
in `mkdocs.yml`. Text is segmented with [zero-width whitespace] characters, so
|
configuration in `mkdocs.yml`. Text is segmented with [zero-width whitespace]
|
||||||
it renders exactly the same in the search modal. Adjust `mkdocs.yml` so that
|
characters, so it renders exactly the same in the search modal. Adjust
|
||||||
the [separator] includes the `\u200b` character:
|
`mkdocs.yml` so that the [`separator`][separator] includes the `\u200b`
|
||||||
|
character:
|
||||||
|
|
||||||
``` yaml
|
``` yaml
|
||||||
plugins:
|
plugins:
|
||||||
|
@ -33,11 +33,12 @@ number of Chinese users.__
|
|||||||
---
|
---
|
||||||
|
|
||||||
After the United States and Germany, the third-largest country of origin of
|
After the United States and Germany, the third-largest country of origin of
|
||||||
Material for MkDocs users is China. For a long time, the built-in search plugin
|
Material for MkDocs users is China. For a long time, the [built-in search plugin]
|
||||||
didn't allow for proper segmentation of Chinese characters, mainly due to
|
didn't allow for proper segmentation of Chinese characters, mainly due to
|
||||||
missing support in [lunr-languages] which is used for search tokenization and
|
missing support in [`lunr-languages`][lunr-languages] which is used for search
|
||||||
stemming. The latest Insiders release adds long-awaited Chinese language support
|
tokenization and stemming. The latest Insiders release adds long-awaited Chinese
|
||||||
for the built-in search plugin, something that has been requested by many users.
|
language support for the built-in search plugin, something that has been
|
||||||
|
requested by many users.
|
||||||
|
|
||||||
[:octicons-arrow-right-24: Continue reading][Chinese search support – 中文搜索支持]
|
[:octicons-arrow-right-24: Continue reading][Chinese search support – 中文搜索支持]
|
||||||
|
|
||||||
|
@ -6,6 +6,10 @@ template: overrides/main.html
|
|||||||
|
|
||||||
## Material for MkDocs Insiders
|
## Material for MkDocs Insiders
|
||||||
|
|
||||||
|
### 4.17.2 <small>_ June 5, 2022</small> { id="4.17.2" }
|
||||||
|
|
||||||
|
- Added support for custom jieba dictionaries (Chinese search)
|
||||||
|
|
||||||
### 4.17.1 <small>_ June 5, 2022</small> { id="4.17.1" }
|
### 4.17.1 <small>_ June 5, 2022</small> { id="4.17.1" }
|
||||||
|
|
||||||
- Added support for cookie consent reject button
|
- Added support for cookie consent reject button
|
||||||
|
@ -104,15 +104,15 @@ The following properties are available:
|
|||||||
|
|
||||||
: [:octicons-tag-24: insiders-4.17.1][Insiders] · :octicons-milestone-24:
|
: [:octicons-tag-24: insiders-4.17.1][Insiders] · :octicons-milestone-24:
|
||||||
Default: `[accept, manage]` – This property defines which buttons are shown
|
Default: `[accept, manage]` – This property defines which buttons are shown
|
||||||
and in which order, e.g. to allow the user to manage settings and accept
|
and in which order, e.g. to allow the user to accept cookies and manage
|
||||||
the cookie:
|
settings:
|
||||||
|
|
||||||
``` yaml
|
``` yaml
|
||||||
extra:
|
extra:
|
||||||
consent:
|
consent:
|
||||||
actions:
|
actions:
|
||||||
- manage
|
|
||||||
- accept
|
- accept
|
||||||
|
- manage
|
||||||
```
|
```
|
||||||
|
|
||||||
The cookie consent form includes three types of buttons:
|
The cookie consent form includes three types of buttons:
|
||||||
|
@ -92,12 +92,6 @@ The following configuration options are supported:
|
|||||||
part of this list by automatically falling back to the stemmer yielding the
|
part of this list by automatically falling back to the stemmer yielding the
|
||||||
best result.
|
best result.
|
||||||
|
|
||||||
!!! tip "Chinese search support – 中文搜索支持"
|
|
||||||
|
|
||||||
Material for MkDocs recently added __experimental language support for
|
|
||||||
Chinese__ as part of [Insiders]. [Read the blog article][chinese search]
|
|
||||||
to learn how to set up search for Chinese in a matter of minutes.
|
|
||||||
|
|
||||||
`separator`{ #search-separator }
|
`separator`{ #search-separator }
|
||||||
|
|
||||||
: :octicons-milestone-24: Default: _automatically set_ – The separator for
|
: :octicons-milestone-24: Default: _automatically set_ – The separator for
|
||||||
@ -112,10 +106,9 @@ The following configuration options are supported:
|
|||||||
```
|
```
|
||||||
|
|
||||||
1. Tokenization itself is carried out by [lunr's default tokenizer], which
|
1. Tokenization itself is carried out by [lunr's default tokenizer], which
|
||||||
doesn't allow for lookahead or separators spanning multiple characters.
|
doesn't allow for lookahead or multi-character separators. For more
|
||||||
|
finegrained control over the tokenization process, see the section on
|
||||||
For more finegrained control over the tokenization process, see the
|
[tokenizer lookahead].
|
||||||
section on [tokenizer lookahead].
|
|
||||||
|
|
||||||
<div class="mdx-deprecated" markdown>
|
<div class="mdx-deprecated" markdown>
|
||||||
|
|
||||||
@ -142,14 +135,9 @@ The following configuration options are supported:
|
|||||||
|
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
The other configuration options of this plugin are not officially supported
|
|
||||||
by Material for MkDocs, which is why they may yield unexpected results. Use
|
|
||||||
them at your own risk.
|
|
||||||
|
|
||||||
[search support]: https://github.com/squidfunk/mkdocs-material/releases/tag/0.1.0
|
[search support]: https://github.com/squidfunk/mkdocs-material/releases/tag/0.1.0
|
||||||
[lunr]: https://lunrjs.com
|
[lunr]: https://lunrjs.com
|
||||||
[lunr-languages]: https://github.com/MihaiValentin/lunr-languages
|
[lunr-languages]: https://github.com/MihaiValentin/lunr-languages
|
||||||
[chinese search]: ../blog/2022/chinese-search-support.md
|
|
||||||
[lunr's default tokenizer]: https://github.com/olivernn/lunr.js/blob/aa5a878f62a6bba1e8e5b95714899e17e8150b38/lunr.js#L413-L456
|
[lunr's default tokenizer]: https://github.com/olivernn/lunr.js/blob/aa5a878f62a6bba1e8e5b95714899e17e8150b38/lunr.js#L413-L456
|
||||||
[site language]: changing-the-language.md#site-language
|
[site language]: changing-the-language.md#site-language
|
||||||
[tokenizer lookahead]: #tokenizer-lookahead
|
[tokenizer lookahead]: #tokenizer-lookahead
|
||||||
@ -157,13 +145,72 @@ them at your own risk.
|
|||||||
[prebuilt index]: https://www.mkdocs.org/user-guide/configuration/#prebuild_index
|
[prebuilt index]: https://www.mkdocs.org/user-guide/configuration/#prebuild_index
|
||||||
[50% smaller]: ../blog/2021/search-better-faster-smaller.md#benchmarks
|
[50% smaller]: ../blog/2021/search-better-faster-smaller.md#benchmarks
|
||||||
|
|
||||||
|
#### Chinese language support
|
||||||
|
|
||||||
|
[:octicons-heart-fill-24:{ .mdx-heart } Sponsors only][Insiders]{ .mdx-insiders } ·
|
||||||
|
[:octicons-tag-24: insiders-4.14.0][Insiders] ·
|
||||||
|
:octicons-beaker-24: Experimental
|
||||||
|
|
||||||
|
[Insiders] adds search support for the Chinese language (see our [blog article]
|
||||||
|
[chinese search] from May 2022) by integrating with the text segmentation
|
||||||
|
library [jieba], which can be installed with `pip`.
|
||||||
|
|
||||||
|
``` sh
|
||||||
|
pip install jieba
|
||||||
|
```
|
||||||
|
|
||||||
|
If [jieba] is installed, the [built-in search plugin] automatically detects
|
||||||
|
Chinese characters and runs them through the segmenter. The following
|
||||||
|
configuration options are available:
|
||||||
|
|
||||||
|
`jieba_dict`{ #jieba-dict }
|
||||||
|
|
||||||
|
: [:octicons-tag-24: insiders-4.17.2][Insiders] · :octicons-milestone-24:
|
||||||
|
Default: _none_ – This option allows for specifying a [custom dictionary]
|
||||||
|
to be used by [jieba] for segmenting text, replacing the default dictionary:
|
||||||
|
|
||||||
|
``` yaml
|
||||||
|
plugins:
|
||||||
|
- search:
|
||||||
|
jieba_dict: dict.txt # (1)!
|
||||||
|
```
|
||||||
|
|
||||||
|
1. The following alternative dictionaries are provided by [jieba]:
|
||||||
|
|
||||||
|
- [dict.txt.small] – 占用内存较小的词典文件
|
||||||
|
- [dict.txt.big] – 支持繁体分词更好的词典文件
|
||||||
|
|
||||||
|
`jieba_dict_user`{ #jieba-dict-user }
|
||||||
|
|
||||||
|
: [:octicons-tag-24: insiders-4.17.2][Insiders] · :octicons-milestone-24:
|
||||||
|
Default: _none_ – This option allows for specifying an additional
|
||||||
|
[user dictionary] to be used by [jieba] for segmenting text, augmenting the
|
||||||
|
default dictionary:
|
||||||
|
|
||||||
|
``` yaml
|
||||||
|
plugins:
|
||||||
|
- search:
|
||||||
|
jieba_dict_user: user_dict.txt
|
||||||
|
```
|
||||||
|
|
||||||
|
User dictionaries can be used for tuning the segmenter to preserve
|
||||||
|
technical terms.
|
||||||
|
|
||||||
|
[chinese search]: ../blog/2022/chinese-search-support.md
|
||||||
|
[jieba]: https://pypi.org/project/jieba/
|
||||||
|
[built-in search plugin]: #built-in-search-plugin
|
||||||
|
[custom dictionary]: https://github.com/fxsjy/jieba#%E5%85%B6%E4%BB%96%E8%AF%8D%E5%85%B8
|
||||||
|
[dict.txt.small]: https://github.com/fxsjy/jieba/raw/master/extra_dict/dict.txt.small
|
||||||
|
[dict.txt.big]: https://github.com/fxsjy/jieba/raw/master/extra_dict/dict.txt.big
|
||||||
|
[user dictionary]: https://github.com/fxsjy/jieba#%E8%BD%BD%E5%85%A5%E8%AF%8D%E5%85%B8
|
||||||
|
|
||||||
### Rich search previews
|
### Rich search previews
|
||||||
|
|
||||||
[:octicons-heart-fill-24:{ .mdx-heart } Sponsors only][Insiders]{ .mdx-insiders } ·
|
[:octicons-heart-fill-24:{ .mdx-heart } Sponsors only][Insiders]{ .mdx-insiders } ·
|
||||||
[:octicons-tag-24: insiders-3.0.0][Insiders] ·
|
[:octicons-tag-24: insiders-3.0.0][Insiders] ·
|
||||||
:octicons-beaker-24: Experimental
|
:octicons-beaker-24: Experimental
|
||||||
|
|
||||||
Insiders ships rich search previews as part of the [new search plugin], which
|
[Insiders] ships rich search previews as part of the [new search plugin], which
|
||||||
will render code blocks directly in the search result, and highlight all
|
will render code blocks directly in the search result, and highlight all
|
||||||
occurrences inside those blocks:
|
occurrences inside those blocks:
|
||||||
|
|
||||||
@ -186,7 +233,7 @@ occurrences inside those blocks:
|
|||||||
[:octicons-tag-24: insiders-3.0.0][Insiders] ·
|
[:octicons-tag-24: insiders-3.0.0][Insiders] ·
|
||||||
:octicons-beaker-24: Experimental
|
:octicons-beaker-24: Experimental
|
||||||
|
|
||||||
Insiders allows for more complex configurations of the [`separator`][separator]
|
[Insiders] allows for more complex configurations of the [`separator`][separator]
|
||||||
setting as part of the [new search plugin], yielding more influence on the way
|
setting as part of the [new search plugin], yielding more influence on the way
|
||||||
documents are tokenized:
|
documents are tokenized:
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user