diff --git a/CHANGELOG b/CHANGELOG index 589278135..d17a2042d 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -1,3 +1,7 @@ +mkdocs-material-8.3.2+insiders-4.17.2 (2022-06-05) + + * Added support for custom jieba dictionaries (Chinese search) + mkdocs-material-8.3.2+insiders-4.17.1 (2022-06-05) * Added support for cookie consent reject button diff --git a/docs/blog/2021/search-better-faster-smaller.md b/docs/blog/2021/search-better-faster-smaller.md index b5e4b430d..be2315e21 100644 --- a/docs/blog/2021/search-better-faster-smaller.md +++ b/docs/blog/2021/search-better-faster-smaller.md @@ -197,8 +197,8 @@ the following steps are taken: remain. Linking is necessary, as search results are grouped by page. 2. __Tokenization__: The `title` and `text` values of each section are split - into tokens by using the [separator] as configured in `mkdocs.yml`. - Tokenization itself is carried out by + into tokens by using the [`separator`][separator] as configured in + `mkdocs.yml`. Tokenization itself is carried out by [lunr's default tokenizer][default tokenizer], which doesn't allow for lookahead or separators spanning multiple characters. @@ -216,7 +216,7 @@ more magic involved, e.g., search results are [post-processed] and [rescored] to account for some shortcomings of [lunr], but in general, this is how data gets into and out of the index. - [separator]: ../../setup/setting-up-site-search.md#separator + [separator]: ../../setup/setting-up-site-search.md#search-separator [default tokenizer]: https://github.com/olivernn/lunr.js/blob/aa5a878f62a6bba1e8e5b95714899e17e8150b38/lunr.js#L413-L456 [post-processed]: https://github.com/squidfunk/mkdocs-material/blob/ec7ccd2b2d15dd033740f388912f7be7738feec2/src/assets/javascripts/integrations/search/_/index.ts#L249-L272 [rescored]: https://github.com/squidfunk/mkdocs-material/blob/ec7ccd2b2d15dd033740f388912f7be7738feec2/src/assets/javascripts/integrations/search/_/index.ts#L274-L275 @@ -421,9 +421,9 @@ On to the next step in the process: __tokenization__. ### Tokenizer lookahead The [default tokenizer] of [lunr] uses a regular expression to split a given -string by matching each character against the [separator] as defined in -`mkdocs.yml`. This doesn't allow for more complex separators based on -lookahead or multiple characters. +string by matching each character against the [`separator`][separator] as +defined in `mkdocs.yml`. This doesn't allow for more complex separators based +on lookahead or multiple characters. Fortunately, __our new search implementation provides an advanced tokenizer__ that doesn't have these shortcomings and supports more complex regular @@ -439,14 +439,14 @@ characters at which the string should be split, the following three sections explain the remainder of the regular expression.[^4] [^4]: - As a fun fact: the [separator default value] of the search plugin being - `[\s\-]+` always has been kind of irritating, as it suggests that multiple - characters can be considered being a separator. However, the `+` is - completely irrelevant, as regular expression groups involving multiple - characters were never supported by + As a fun fact: the [`separator`][separator] [default value] of the search + plugin being `[\s\-]+` always has been kind of irritating, as it suggests + that multiple characters can be considered being a separator. However, the + `+` is completely irrelevant, as regular expression groups involving + multiple characters were never supported by [lunr's default tokenizer][default tokenizer]. - [separator default value]: https://www.mkdocs.org/user-guide/configuration/#separator + [default value]: https://www.mkdocs.org/user-guide/configuration/#separator #### Case changes diff --git a/docs/blog/2022/chinese-search-support.md b/docs/blog/2022/chinese-search-support.md index 657d98abb..243d09209 100644 --- a/docs/blog/2022/chinese-search-support.md +++ b/docs/blog/2022/chinese-search-support.md @@ -32,10 +32,10 @@ number of Chinese users.__ --- After the United States and Germany, the third-largest country of origin of -Material for MkDocs users is China. For a long time, the built-in search plugin +Material for MkDocs users is China. For a long time, the [built-in search plugin] didn't allow for proper segmentation of Chinese characters, mainly due to -missing support in [lunr-languages] which is used for search tokenization and -stemming. The latest Insiders release adds long-awaited Chinese language support +missing support in [lunr-languages] which is used for search tokenization and +stemming. The latest Insiders release adds long-awaited Chinese language support for the built-in search plugin, something that has been requested by many users. _Material for MkDocs終於支持中文了!文本被正確分割並且更容易找到。_ @@ -50,18 +50,19 @@ search plugin in a few minutes._ ## Configuration Chinese language support for Material for MkDocs is provided by [jieba], an -excellent Chinese text segmentation library. If [jieba] is installed, the -built-in search plugin automatically detects Chinese characters and runs them +excellent Chinese text segmentation library. If [jieba] is installed, the +built-in search plugin automatically detects Chinese characters and runs them through the segmenter. You can install [jieba] with: ``` pip install jieba ``` -The next step is only required if you specified the [separator] configuration -in `mkdocs.yml`. Text is segmented with [zero-width whitespace] characters, so -it renders exactly the same in the search modal. Adjust `mkdocs.yml` so that -the [separator] includes the `\u200b` character: +The next step is only required if you specified the [`separator`][separator] +configuration in `mkdocs.yml`. Text is segmented with [zero-width whitespace] +characters, so it renders exactly the same in the search modal. Adjust +`mkdocs.yml` so that the [`separator`][separator] includes the `\u200b` +character: ``` yaml plugins: diff --git a/docs/blog/index.md b/docs/blog/index.md index af2d39a5c..7330c027f 100644 --- a/docs/blog/index.md +++ b/docs/blog/index.md @@ -33,11 +33,12 @@ number of Chinese users.__ --- After the United States and Germany, the third-largest country of origin of -Material for MkDocs users is China. For a long time, the built-in search plugin +Material for MkDocs users is China. For a long time, the [built-in search plugin] didn't allow for proper segmentation of Chinese characters, mainly due to -missing support in [lunr-languages] which is used for search tokenization and -stemming. The latest Insiders release adds long-awaited Chinese language support -for the built-in search plugin, something that has been requested by many users. +missing support in [`lunr-languages`][lunr-languages] which is used for search +tokenization and stemming. The latest Insiders release adds long-awaited Chinese +language support for the built-in search plugin, something that has been +requested by many users. [:octicons-arrow-right-24: Continue reading][Chinese search support – 中文搜索支持] diff --git a/docs/insiders/changelog.md b/docs/insiders/changelog.md index 0e514d50a..a03132747 100644 --- a/docs/insiders/changelog.md +++ b/docs/insiders/changelog.md @@ -6,6 +6,10 @@ template: overrides/main.html ## Material for MkDocs Insiders +### 4.17.2 _ June 5, 2022 { id="4.17.2" } + +- Added support for custom jieba dictionaries (Chinese search) + ### 4.17.1 _ June 5, 2022 { id="4.17.1" } - Added support for cookie consent reject button diff --git a/docs/setup/ensuring-data-privacy.md b/docs/setup/ensuring-data-privacy.md index 049ac3e83..a044b337c 100644 --- a/docs/setup/ensuring-data-privacy.md +++ b/docs/setup/ensuring-data-privacy.md @@ -104,15 +104,15 @@ The following properties are available: : [:octicons-tag-24: insiders-4.17.1][Insiders] · :octicons-milestone-24: Default: `[accept, manage]` – This property defines which buttons are shown - and in which order, e.g. to allow the user to manage settings and accept - the cookie: + and in which order, e.g. to allow the user to accept cookies and manage + settings: ``` yaml extra: consent: actions: - - manage - accept + - manage ``` The cookie consent form includes three types of buttons: diff --git a/docs/setup/setting-up-site-search.md b/docs/setup/setting-up-site-search.md index aeadf8479..28a6b2ab1 100644 --- a/docs/setup/setting-up-site-search.md +++ b/docs/setup/setting-up-site-search.md @@ -92,12 +92,6 @@ The following configuration options are supported: part of this list by automatically falling back to the stemmer yielding the best result. - !!! tip "Chinese search support – 中文搜索支持" - - Material for MkDocs recently added __experimental language support for - Chinese__ as part of [Insiders]. [Read the blog article][chinese search] - to learn how to set up search for Chinese in a matter of minutes. - `separator`{ #search-separator } : :octicons-milestone-24: Default: _automatically set_ – The separator for @@ -112,10 +106,9 @@ The following configuration options are supported: ``` 1. Tokenization itself is carried out by [lunr's default tokenizer], which - doesn't allow for lookahead or separators spanning multiple characters. - - For more finegrained control over the tokenization process, see the - section on [tokenizer lookahead]. + doesn't allow for lookahead or multi-character separators. For more + finegrained control over the tokenization process, see the section on + [tokenizer lookahead].