Documentation

2024-06-14 11:52:32 +03:00 · 2022-06-05 18:16:51 +02:00 · 2022-06-05 18:16:51 +02:00 · 10859be356
commit 10859be356
parent 187711fa29
7 changed files with 102 additions and 45 deletions
--- a/4
+++ b/4
@ -1,3 +1,7 @@
 mkdocs-material-8.3.2+insiders-4.17.2 (2022-06-05)
  * Added support for custom jieba dictionaries (Chinese search)
 mkdocs-material-8.3.2+insiders-4.17.1 (2022-06-05)
  * Added support for cookie consent reject button
--- a/docs/blog/2021/search-better-faster-smaller.md
+++ b/docs/blog/2021/search-better-faster-smaller.md
@ -197,8 +197,8 @@ the following steps are taken:
    remain. Linking is necessary, as search results are grouped by page.
 2.  __Tokenization__: The `title` and `text` values of each section are split
-    into tokens by using the [separator] as configured in `mkdocs.yml`.
+    into tokens by using the [`separator`][separator] as configured in
-    Tokenization itself is carried out by
+    `mkdocs.yml`. Tokenization itself is carried out by
    [lunr's default tokenizer][default tokenizer], which doesn't allow for
    lookahead or separators spanning multiple characters.
@ -216,7 +216,7 @@ more magic involved, e.g., search results are [post-processed] and [rescored] to
 account for some shortcomings of [lunr], but in general, this is how data gets
 into and out of the index.
-  [separator]: ../../setup/setting-up-site-search.md#separator
+  [separator]: ../../setup/setting-up-site-search.md#search-separator
  [default tokenizer]: https://github.com/olivernn/lunr.js/blob/aa5a878f62a6bba1e8e5b95714899e17e8150b38/lunr.js#L413-L456
  [post-processed]: https://github.com/squidfunk/mkdocs-material/blob/ec7ccd2b2d15dd033740f388912f7be7738feec2/src/assets/javascripts/integrations/search/_/index.ts#L249-L272
  [rescored]: https://github.com/squidfunk/mkdocs-material/blob/ec7ccd2b2d15dd033740f388912f7be7738feec2/src/assets/javascripts/integrations/search/_/index.ts#L274-L275
@ -421,9 +421,9 @@ On to the next step in the process: __tokenization__.
 ### Tokenizer lookahead
 The [default tokenizer] of [lunr] uses a regular expression to split a given
-string by matching each character against the [separator] as defined in
+string by matching each character against the [`separator`][separator] as
-`mkdocs.yml`. This doesn't allow for more complex separators based on
+defined in `mkdocs.yml`. This doesn't allow for more complex separators based
-lookahead or multiple characters.
+on lookahead or multiple characters.
 Fortunately, __our new search implementation provides an advanced tokenizer__
 that doesn't have these shortcomings and supports more complex regular
@ -439,14 +439,14 @@ characters at which the string should be split, the following three sections
 explain the remainder of the regular expression.[^4]
  [^4]:
-    As a fun fact: the [separator default value] of the search plugin being
+    As a fun fact: the [`separator`][separator] [default value] of the search
-    `[\s\-]+` always has been kind of irritating, as it suggests that multiple
+    plugin being `[\s\-]+` always has been kind of irritating, as it suggests
-    characters can be considered being a separator. However, the `+` is
+    that multiple characters can be considered being a separator. However, the
-    completely irrelevant, as regular expression groups involving multiple
+    `+` is completely irrelevant, as regular expression groups involving
-    characters were never supported by
+    multiple characters were never supported by
    [lunr's default tokenizer][default tokenizer].
-  [separator default value]: https://www.mkdocs.org/user-guide/configuration/#separator
+  [default value]: https://www.mkdocs.org/user-guide/configuration/#separator
 #### Case changes
--- a/docs/blog/2022/chinese-search-support.md
+++ b/docs/blog/2022/chinese-search-support.md
@ -32,7 +32,7 @@ number of Chinese users.__
 ---
 After the United States and Germany, the third-largest country of origin of
-Material for MkDocs users is China. For a long time, the built-in search plugin
+Material for MkDocs users is China. For a long time, the [built-in search plugin]
 didn't allow for proper segmentation of Chinese characters, mainly due to 
 missing support in [lunr-languages] which is used for search tokenization and
 stemming. The latest Insiders release adds long-awaited Chinese language support
@ -58,10 +58,11 @@ through the segmenter. You can install [jieba] with:
 pip install jieba
 ```
-The next step is only required if you specified the [separator] configuration
+The next step is only required if you specified the [`separator`][separator] 
-in `mkdocs.yml`. Text is segmented with [zero-width whitespace] characters, so
+configuration in `mkdocs.yml`. Text is segmented with [zero-width whitespace] 
-it renders exactly the same in the search modal. Adjust `mkdocs.yml` so that
+characters, so it renders exactly the same in the search modal. Adjust
-the [separator] includes the `\u200b` character:
+`mkdocs.yml` so that the [`separator`][separator] includes the `\u200b`
 character:
 ``` yaml
 plugins:
--- a/docs/blog/index.md
+++ b/docs/blog/index.md
@ -33,11 +33,12 @@ number of Chinese users.__
 ---
 After the United States and Germany, the third-largest country of origin of
-Material for MkDocs users is China. For a long time, the built-in search plugin
+Material for MkDocs users is China. For a long time, the [built-in search plugin]
 didn't allow for proper segmentation of Chinese characters, mainly due to 
-missing support in [lunr-languages] which is used for search tokenization and 
+missing support in [`lunr-languages`][lunr-languages] which is used for search 
-stemming. The latest Insiders release adds long-awaited Chinese language support 
+tokenization and stemming. The latest Insiders release adds long-awaited Chinese 
-for the built-in search plugin, something that has been requested by many users.
+language support for the built-in search plugin, something that has been
 requested by many users.
  [:octicons-arrow-right-24: Continue reading][Chinese search support – 中文搜索支持]
--- a/docs/insiders/changelog.md
+++ b/docs/insiders/changelog.md
@ -6,6 +6,10 @@ template: overrides/main.html
 ## Material for MkDocs Insiders
 ### 4.17.2 <small>_ June 5, 2022</small> { id="4.17.2" }
 - Added support for custom jieba dictionaries (Chinese search)
 ### 4.17.1 <small>_ June 5, 2022</small> { id="4.17.1" }
 - Added support for cookie consent reject button
--- a/docs/setup/ensuring-data-privacy.md
+++ b/docs/setup/ensuring-data-privacy.md
@ -104,15 +104,15 @@ The following properties are available:
 :   [:octicons-tag-24: insiders-4.17.1][Insiders] · :octicons-milestone-24: 
    Default: `[accept, manage]` – This property defines which buttons are shown
-    and in which order, e.g. to allow the user to manage settings and accept
+    and in which order, e.g. to allow the user to accept cookies and manage
-    the cookie:
+    settings:
    ``` yaml
    extra:
      consent:
        actions:
          - manage
          - accept
          - manage
    ```
    The cookie consent form includes three types of buttons:
--- a/docs/setup/setting-up-site-search.md
+++ b/docs/setup/setting-up-site-search.md
@ -92,12 +92,6 @@ The following configuration options are supported:
    part of this list by automatically falling back to the stemmer yielding the
    best result.
    !!! tip "Chinese search support – 中文搜索支持"
        Material for MkDocs recently added __experimental language support for 
        Chinese__ as part of [Insiders]. [Read the blog article][chinese search]
        to learn how to set up search for Chinese in a matter of minutes.
 `separator`{ #search-separator }
 :   :octicons-milestone-24: Default: _automatically set_ – The separator for
@ -112,10 +106,9 @@ The following configuration options are supported:
    ```
    1.  Tokenization itself is carried out by [lunr's default tokenizer], which 
-        doesn't allow for lookahead or separators spanning multiple characters.
+        doesn't allow for lookahead or multi-character separators. For more
-
+        finegrained control over the tokenization process, see the section on
-        For more finegrained control over the tokenization process, see the
+        [tokenizer lookahead].
        section on [tokenizer lookahead].
 <div class="mdx-deprecated" markdown>
@ -142,14 +135,9 @@ The following configuration options are supported:
 </div>
 The other configuration options of this plugin are not officially supported
 by Material for MkDocs, which is why they may yield unexpected results. Use
 them at your own risk.
  [search support]: https://github.com/squidfunk/mkdocs-material/releases/tag/0.1.0
  [lunr]: https://lunrjs.com
  [lunr-languages]: https://github.com/MihaiValentin/lunr-languages
  [chinese search]: ../blog/2022/chinese-search-support.md
  [lunr's default tokenizer]: https://github.com/olivernn/lunr.js/blob/aa5a878f62a6bba1e8e5b95714899e17e8150b38/lunr.js#L413-L456
  [site language]: changing-the-language.md#site-language
  [tokenizer lookahead]: #tokenizer-lookahead
@ -157,13 +145,72 @@ them at your own risk.
  [prebuilt index]: https://www.mkdocs.org/user-guide/configuration/#prebuild_index
  [50% smaller]: ../blog/2021/search-better-faster-smaller.md#benchmarks
 #### Chinese language support
 [:octicons-heart-fill-24:{ .mdx-heart } Sponsors only][Insiders]{ .mdx-insiders } ·
 [:octicons-tag-24: insiders-4.14.0][Insiders] ·
 :octicons-beaker-24: Experimental
 [Insiders] adds search support for the Chinese language (see our [blog article]
 [chinese search] from May 2022) by integrating with the text segmentation
 library [jieba], which can be installed with `pip`.
 ``` sh
 pip install jieba
 ```
 If [jieba] is installed, the [built-in search plugin] automatically detects
 Chinese characters and runs them through the segmenter. The following
 configuration options are available:
 `jieba_dict`{ #jieba-dict }
 :   [:octicons-tag-24: insiders-4.17.2][Insiders] · :octicons-milestone-24:
    Default: _none_ – This option allows for specifying a [custom dictionary]
    to be used by [jieba] for segmenting text, replacing the default dictionary:
    ``` yaml
    plugins:
      - search:
          jieba_dict: dict.txt # (1)!
    ```
    1.  The following alternative dictionaries are provided by [jieba]:
        - [dict.txt.small] – 占用内存较小的词典文件
        - [dict.txt.big] – 支持繁体分词更好的词典文件
 `jieba_dict_user`{ #jieba-dict-user }
 :   [:octicons-tag-24: insiders-4.17.2][Insiders] · :octicons-milestone-24:
    Default: _none_ – This option allows for specifying an additional
    [user dictionary] to be used by [jieba] for segmenting text, augmenting the
    default dictionary:
    ``` yaml
    plugins:
      - search:
          jieba_dict_user: user_dict.txt
    ```
    User dictionaries can be used for tuning the segmenter to preserve
    technical terms.
  [chinese search]: ../blog/2022/chinese-search-support.md
  [jieba]: https://pypi.org/project/jieba/
  [built-in search plugin]: #built-in-search-plugin
  [custom dictionary]: https://github.com/fxsjy/jieba#%E5%85%B6%E4%BB%96%E8%AF%8D%E5%85%B8
  [dict.txt.small]: https://github.com/fxsjy/jieba/raw/master/extra_dict/dict.txt.small
  [dict.txt.big]: https://github.com/fxsjy/jieba/raw/master/extra_dict/dict.txt.big
  [user dictionary]: https://github.com/fxsjy/jieba#%E8%BD%BD%E5%85%A5%E8%AF%8D%E5%85%B8
 ### Rich search previews
 [:octicons-heart-fill-24:{ .mdx-heart } Sponsors only][Insiders]{ .mdx-insiders } ·
 [:octicons-tag-24: insiders-3.0.0][Insiders] ·
 :octicons-beaker-24: Experimental
-Insiders ships rich search previews as part of the [new search plugin], which
+[Insiders] ships rich search previews as part of the [new search plugin], which
 will render code blocks directly in the search result, and highlight all
 occurrences inside those blocks:
@ -186,7 +233,7 @@ occurrences inside those blocks:
 [:octicons-tag-24: insiders-3.0.0][Insiders] ·
 :octicons-beaker-24: Experimental
-Insiders allows for more complex configurations of the [`separator`][separator] 
+[Insiders] allows for more complex configurations of the [`separator`][separator] 
 setting as part of the [new search plugin], yielding more influence on the way 
 documents are tokenized: