Updated blog articles to use named links

This commit is contained in:
squidfunk 2021-10-11 17:16:48 +02:00
parent be9d7b5b86
commit d13ed6c3f0
3 changed files with 223 additions and 192 deletions

View File

@ -3,7 +3,6 @@ template: overrides/main.html
description: > description: >
Three new simple ways to exclude dedicated parts of a document from the search Three new simple ways to exclude dedicated parts of a document from the search
index, allowing for more fine-grained control index, allowing for more fine-grained control
disqus: mkdocs-material
search: search:
exclude: true exclude: true
--- ---
@ -15,7 +14,7 @@ dedicated parts of a document from the search index, allowing for more
fine-grained control.__ fine-grained control.__
<aside class="mdx-author" markdown> <aside class="mdx-author" markdown>
![@squidfunk][1] ![@squidfunk][@squidfunk avatar]
<span>__Martin Donath__ · @squidfunk</span> <span>__Martin Donath__ · @squidfunk</span>
<span> <span>
@ -25,13 +24,13 @@ fine-grained control.__
</span> </span>
</aside> </aside>
[1]: https://avatars.githubusercontent.com/u/932156 [@squidfunk avatar]: https://avatars.githubusercontent.com/u/932156
--- ---
Two weeks ago, Material for MkDocs Insiders shipped a [brand new search Two weeks ago, Material for MkDocs Insiders shipped a [brand new search
plugin][2], yielding [massive improvements in usability][3], but also in [speed plugin], yielding [massive improvements in usability], but also in [speed
and size][4] of the search index. Interestingly, as discussed in the previous and size] of the search index. Interestingly, as discussed in the previous
blog article, we only scratched the surface of what's now possible. This blog article, we only scratched the surface of what's now possible. This
release brings some useful features that enhance the writing experience, release brings some useful features that enhance the writing experience,
allowing for more fine-grained control of what pages, sections and blocks of a allowing for more fine-grained control of what pages, sections and blocks of a
@ -39,18 +38,18 @@ Markdown file should be indexed by the built-in search functionality.
_The following section discusses existing solutions for excluding pages and _The following section discusses existing solutions for excluding pages and
sections from the search index. If you immediately want to learn what's new, sections from the search index. If you immediately want to learn what's new,
skip to the [section just after that][5]._ skip to the [section just after that][what's new]._
[2]: search-better-faster-smaller.md [brand new search plugin]: search-better-faster-smaller.md
[3]: search-better-faster-smaller.md#whats-new [massive improvements in usability]: search-better-faster-smaller.md#whats-new
[4]: search-better-faster-smaller.md#benchmarks [speed and size]: search-better-faster-smaller.md#benchmarks
[5]: #whats-new [what's new]: #whats-new
## Prior art ## Prior art
MkDocs has a rich and thriving ecosystem of [plugins][6], and it comes as no MkDocs has a rich and thriving ecosystem of [plugins], and it comes as no
surprise that there's already a fantastic plugin by @chrieke to exclude specific surprise that there's already a fantastic plugin by @chrieke to exclude specific
sections of a Markdown file the [mkdocs-exclude-search][7] plugin. It can be sections of a Markdown file the [mkdocs-exclude-search] plugin. It can be
installed with: installed with:
``` ```
@ -78,10 +77,10 @@ adds support for advanced filtering techniques like infix- and suffix-filtering
using wildcards. While this is a very powerful idea, it comes with some using wildcards. While this is a very powerful idea, it comes with some
downsides: downsides:
1. __Exclusion patterns and content are not co-located__: exclusion patterns 1. __Exclusion patterns and content are not co-located__: exclusion patterns
need to be defined in `mkdocs.yml`, and not as part of the respective need to be defined in `mkdocs.yml`, and not as part of the respective
document or section to be excluded. This might result in stale exclusion document or section to be excluded. This might result in stale exclusion
patterns, leading to unintended behavior: patterns, leading to unintended behavior:
- When a headline is changed, its slug (permalink) also changes, which might - When a headline is changed, its slug (permalink) also changes, which might
suddenly match (or unmatch) a pattern, e.g., when an author fixes a typo suddenly match (or unmatch) a pattern, e.g., when an author fixes a typo
@ -97,23 +96,23 @@ downsides:
pages and sections have been excluded from the search index, but MkDocs will pages and sections have been excluded from the search index, but MkDocs will
now flood the terminal with debug output from its core and other plugins. now flood the terminal with debug output from its core and other plugins.
2. __Exclusion control might be too coarse__: The [mkdocs-exclude-search][7] 2. __Exclusion control might be too coarse__: The [mkdocs-exclude-search]
plugin only allows for the exclusion of pages and sections. It's not possible plugin only allows for the exclusion of pages and sections. It's not
to exclude parts of a section, e.g., content that is irrelevant to search but possible to exclude parts of a section, e.g., content that is irrelevant
must be included as part of the documentation. to search but must be included as part of the documentation.
[6]: https://github.com/mkdocs/mkdocs/wiki/MkDocs-Plugins [plugins]: https://github.com/mkdocs/mkdocs/wiki/MkDocs-Plugins
[7]: https://github.com/chrieke/mkdocs-exclude-search [mkdocs-exclude-search]: https://github.com/chrieke/mkdocs-exclude-search
## What's new? ## What's new?
The latest Insiders release brings fine-grained control for [__excluding pages, The latest Insiders release brings fine-grained control for [__excluding pages,
sections, and blocks__][8] from the search index, implemented through front sections, and blocks__][search exclusion] from the search index, implemented
matter, as well as the [Attribute List][9] extension. Note that it doesn't through front matter, as well as the [Attribute Lists]. Note that it doesn't
replace the [mkdocs-exclude-search][7] plugin but _complements_ it. replace the [mkdocs-exclude-search] plugin but __complements__ it.
[8]: ../../setup/setting-up-site-search.md#search-exclusion [search exclusion]: ../../setup/setting-up-site-search.md#search-exclusion
[9]: https://python-markdown.github.io/extensions/attr_list/ [Attribute Lists]: ../../setup/extensions/python-markdown.md#attribute-lists
### Excluding pages ### Excluding pages
@ -134,12 +133,12 @@ search:
### Excluding sections ### Excluding sections
If a section should be excluded, the author can use the [Attribute List][9] If a section should be excluded, the author can use the [Attribute Lists]
extension to add a __pragma__ called `{ data-search-exclude }` at the end of a extension to add a __pragma__ called `{ data-search-exclude }` at the end of a
heading. The pragma is not included in the final HTML, as search pragmas are heading. The pragma is not included in the final HTML, as search pragmas are
filtered by the search plugin before the page is rendered: filtered by the search plugin before the page is rendered:
=== "`docs/page.md`" === ":octicons-file-code-16: docs/page.md"
``` markdown ``` markdown
# Document title # Document title
@ -153,7 +152,7 @@ filtered by the search plugin before the page is rendered:
The content of this section is excluded The content of this section is excluded
``` ```
=== "`search_index.json`" === ":octicons-codescan-16: search_index.json"
``` json ``` json
{ {
@ -176,10 +175,10 @@ filtered by the search plugin before the page is rendered:
### Excluding blocks ### Excluding blocks
If even more fine-grained control is desired, the __pragma__ can be added to If even more fine-grained control is desired, the __pragma__ can be added to
any [block-level element][10] or [inline-level element][11] that is officially any [block-level element] or [inline-level element] that is officially
supported by the [Attribute List][9] extension: supported by the [Attribute Lists] extension:
=== "`docs/page.md`" === ":octicons-file-code-16: docs/page.md"
``` markdown ``` markdown
# Document title # Document title
@ -190,7 +189,7 @@ supported by the [Attribute List][9] extension:
{ data-search-exclude } { data-search-exclude }
``` ```
=== "`search_index.json`" === ":octicons-codescan-16: search_index.json"
``` json ``` json
{ {
@ -205,12 +204,12 @@ supported by the [Attribute List][9] extension:
} }
``` ```
[10]: https://python-markdown.github.io/extensions/attr_list/#block-level [block-level element]: https://python-markdown.github.io/extensions/attr_list/#block-level
[11]: https://python-markdown.github.io/extensions/attr_list/#inline-level [inline-level element]: https://python-markdown.github.io/extensions/attr_list/#inline
## Conclusion ## Conclusion
The latest release brings three simple ways to control more precisely what goes The latest release brings three simple ways to control more precisely what goes
into the search index and what doesn't. It complements the already very powerful into the search index and what doesn't. It complements the already very powerful
[mkdocs-exclude-search][7] plugin, allowing for new methods of shaping the [mkdocs-exclude-search] plugin, allowing for new methods of shaping the
structure, size and content of the search index. structure, size and content of the search index.

View File

@ -3,7 +3,6 @@ template: overrides/main.html
description: > description: >
How we rebuilt client-side search, delivering a better user experience while How we rebuilt client-side search, delivering a better user experience while
making it faster and smaller at the same time making it faster and smaller at the same time
disqus: mkdocs-material
search: search:
exclude: true exclude: true
--- ---
@ -15,7 +14,7 @@ delivering a significantly better user experience while making it faster and
smaller at the same time.__ smaller at the same time.__
<aside class="mdx-author" markdown> <aside class="mdx-author" markdown>
![@squidfunk][1] ![@squidfunk][@squidfunk avatar]
<span>__Martin Donath__ · @squidfunk</span> <span>__Martin Donath__ · @squidfunk</span>
<span> <span>
@ -25,12 +24,12 @@ smaller at the same time.__
</span> </span>
</aside> </aside>
[1]: https://avatars.githubusercontent.com/u/932156 [@squidfunk avatar]: https://avatars.githubusercontent.com/u/932156
--- ---
The [search][2] of Material for MkDocs is by far one of its best and most-loved The [search] of Material for MkDocs is by far one of its best and most-loved
assets: [multilingual][3], [offline-capable][4], and most importantly: _all assets: [multilingual], [offline-capable], and most importantly: _all
client-side_. It provides a solution to empower the users of your documentation client-side_. It provides a solution to empower the users of your documentation
to find what they're searching for instantly without the headache of managing to find what they're searching for instantly without the headache of managing
additional servers. However, even though several iterations have been made, additional servers. However, even though several iterations have been made,
@ -41,19 +40,19 @@ version, and what's about to come.
_The next section discusses the architecture and issues of the current search _The next section discusses the architecture and issues of the current search
implementation. If you immediately want to learn what's new, skip to the implementation. If you immediately want to learn what's new, skip to the
[section just after that][5]._ [section just after that][what's new]._
[2]: ../../setup/setting-up-site-search.md [search]: ../../setup/setting-up-site-search.md
[3]: ../../setup/setting-up-site-search.md#lang [multilingual]: ../../setup/setting-up-site-search.md#lang
[4]: ../../setup/setting-up-site-search.md#offline-search [offline-capable]: ../../setup/setting-up-site-search.md#offline-search
[5]: #whats-new [what's new]: #whats-new
## Architecture ## Architecture
Material for MkDocs uses [lunr][6] together with [lunr-languages][7] to Material for MkDocs uses [lunr] together with [lunr-languages] to implement
implement its client-side search capabilities. When a documentation page is its client-side search capabilities. When a documentation page is loaded and
loaded and JavaScript is available, the search index as generated by the JavaScript is available, the search index as generated by the
[built-in search plugin][8] during the build process is requested from the [built-in search plugin] during the build process is requested from the
server: server:
``` ts ``` ts
@ -64,9 +63,9 @@ const index$ = document.forms.namedItem("search")
: NEVER : NEVER
``` ```
[6]: https://lunrjs.com [lunr]: https://lunrjs.com
[7]: https://github.com/MihaiValentin/lunr-languages [lunr-languages]: https://github.com/MihaiValentin/lunr-languages
[8]: ../../setup/setting-up-site-search.md#built-in-search [built-in search plugin]: ../../setup/setting-up-site-search.md#built-in-search
### Search index ### Search index
@ -76,7 +75,7 @@ the original Markdown file:
??? example "Expand to inspect example" ??? example "Expand to inspect example"
=== "`docs/page.md`" === ":octicons-file-code-16: docs/page.md"
```` markdown ```` markdown
# Example # Example
@ -106,7 +105,7 @@ the original Markdown file:
* Profit! * Profit!
```` ````
=== "`search_index.json`" === ":octicons-codescan-16: search_index.json"
``` json ``` json
{ {
@ -146,15 +145,15 @@ the original Markdown file:
If we inspect the search index, we immediately see several problems: If we inspect the search index, we immediately see several problems:
1. __All content is included twice__: the search index contains one entry 1. __All content is included twice__: the search index contains one entry
with the entire contents of the page, and one entry for each section of with the entire contents of the page, and one entry for each section of
the page, i.e., each block preceded by a headline or subheadline. This the page, i.e., each block preceded by a headline or subheadline. This
significantly contributes to the size of the search index. significantly contributes to the size of the search index.
2. __All structure is lost__: when the search index is built, all structural 2. __All structure is lost__: when the search index is built, all structural
information like HTML tags and attributes are stripped from the content. information like HTML tags and attributes are stripped from the content.
While this approach works well for paragraphs and inline formatting, it While this approach works well for paragraphs and inline formatting, it
might be problematic for lists and code blocks. An excerpt: might be problematic for lists and code blocks. An excerpt:
``` ```
… links , or even code : if (isAwesome) { … } Lists Sometimes you want … … links , or even code : if (isAwesome) { … } Lists Sometimes you want …
@ -172,51 +171,52 @@ If we inspect the search index, we immediately see several problems:
It's not difficult to see that it can be quite challenging to implement a good It's not difficult to see that it can be quite challenging to implement a good
search experience for theme authors, which is why Material for MkDocs (up to search experience for theme authors, which is why Material for MkDocs (up to
now) did some [monkey patching][9] to be able to render slightly more now) did some [monkey patching] to be able to render slightly more
meaningful search previews. meaningful search previews.
[monkey patching]: https://github.com/squidfunk/mkdocs-material/blob/ec7ccd2b2d15dd033740f388912f7be7738feec2/src/assets/javascripts/integrations/search/document/index.ts#L68-L71
### Search worker ### Search worker
The actual search functionality is implemented as part of a web worker[^1], The actual search functionality is implemented as part of a web worker[^1],
which creates and manages the [lunr][6] search index. When search is which creates and manages the [lunr] search index. When search is initialized,
initialized, the following steps are taken: the following steps are taken:
[^1]: [^1]:
Prior to [version 5.0][10], search was carried out in the main thread which Prior to :octicons-tag-24: 5.0.0, search was carried out in the main thread
locked up the browser, rendering it unusable. This problem was first which locked up the browser, rendering it unusable. This problem was first
reported in #904 and, after some back and forth, fixed and released in reported in #904 and, after some back and forth, fixed and released in
version 5.0. :octicons-tag-24: 5.0.0.
1. __Linking sections with pages__: The search index is parsed, and each section 1. __Linking sections with pages__: The search index is parsed, and each
is linked to its parent page. The parent page itself is _not indexed_, as it section is linked to its parent page. The parent page itself is _not
would lead to duplicate results, so only the sections remain. Linking is indexed_, as it would lead to duplicate results, so only the sections
necessary, as search results are grouped by page. remain. Linking is necessary, as search results are grouped by page.
2. __Tokenization__: The `title` and `text` values of each section are split 2. __Tokenization__: The `title` and `text` values of each section are split
into tokens by using the [separator][11] as configured in `mkdocs.yml`. into tokens by using the [separator] as configured in `mkdocs.yml`.
Tokenization itself is carried out by [lunr's default tokenizer][12], which Tokenization itself is carried out by
doesn't allow for lookahead or separators spanning multiple characters. [lunr's default tokenizer][default tokenizer], which doesn't allow for
lookahead or separators spanning multiple characters.
> Why is this important and a big deal? We will see later how much more we > Why is this important and a big deal? We will see later how much more we
> can achieve with a tokenizer that is capable of separating strings with > can achieve with a tokenizer that is capable of separating strings with
> lookahead. > lookahead.
3. __Indexing__: As a final step, each section is indexed. When querying the 1. __Indexing__: As a final step, each section is indexed. When querying the
index, if a search query includes one of the tokens as returned by step 2., index, if a search query includes one of the tokens as returned by step 2.,
the section is considered to be part of the search result and passed to the the section is considered to be part of the search result and passed to the
main thread. main thread.
Now, that's basically how the search worker operates. Sure, there's a little Now, that's basically how the search worker operates. Sure, there's a little
more magic involved, e.g., search results are [post-processed][13] and more magic involved, e.g., search results are [post-processed] and [rescored] to
[rescored][14] to account for some shortcomings of [lunr][6], but in general, account for some shortcomings of [lunr], but in general, this is how data gets
this is how data gets into and out of the index. into and out of the index.
[9]: https://github.com/squidfunk/mkdocs-material/blob/ec7ccd2b2d15dd033740f388912f7be7738feec2/src/assets/javascripts/integrations/search/document/index.ts#L68-L71 [separator]: ../../setup/setting-up-site-search.md#separator
[10]: https://squidfunk.github.io/mkdocs-material/upgrading/#upgrading-from-4x-to-5x [default tokenizer]: https://github.com/olivernn/lunr.js/blob/aa5a878f62a6bba1e8e5b95714899e17e8150b38/lunr.js#L413-L456
[11]: ../../setup/setting-up-site-search.md#separator [post-processed]: https://github.com/squidfunk/mkdocs-material/blob/ec7ccd2b2d15dd033740f388912f7be7738feec2/src/assets/javascripts/integrations/search/_/index.ts#L249-L272
[12]: https://github.com/olivernn/lunr.js/blob/aa5a878f62a6bba1e8e5b95714899e17e8150b38/lunr.js#L413-L456 [rescored]: https://github.com/squidfunk/mkdocs-material/blob/ec7ccd2b2d15dd033740f388912f7be7738feec2/src/assets/javascripts/integrations/search/_/index.ts#L274-L275
[13]: https://github.com/squidfunk/mkdocs-material/blob/ec7ccd2b2d15dd033740f388912f7be7738feec2/src/assets/javascripts/integrations/search/_/index.ts#L249-L272
[14]: https://github.com/squidfunk/mkdocs-material/blob/ec7ccd2b2d15dd033740f388912f7be7738feec2/src/assets/javascripts/integrations/search/_/index.ts#L274-L275
### Search previews ### Search previews
@ -228,11 +228,11 @@ experience.
This is where the current search preview generation falls short, as some of the This is where the current search preview generation falls short, as some of the
search previews appear not to include any occurrence of any of the search search previews appear not to include any occurrence of any of the search
terms. This was due to the fact that search previews were [truncated after a terms. This was due to the fact that search previews were [truncated after a
maximum of 320 characters][15], as can be seen here: maximum of 320 characters][truncated], as can be seen here:
<figure markdown> <figure markdown>
![Search previews][16] ![search preview]
<figcaption markdown> <figcaption markdown>
@ -261,28 +261,28 @@ carefully considered:
split strings into tokens. split strings into tokens.
[^2]: [^2]:
At the time of writing, [Just the Docs][17] and [Docusaurus][18] use this At the time of writing, [Just the Docs] and [Docusaurus] use this method
method for generating search previews. Note that the latter also integrates for generating search previews. Note that the latter also integrates with
with Algolia, which is a fully managed server-based solution. Algolia, which is a fully managed server-based solution.
[^3]: [^3]:
China and Japan are both within the top 5 countries of origin of users of China and Japan are both within the top 5 countries of origin of users of
Material for MkDocs. Material for MkDocs.
[15]: https://github.com/squidfunk/mkdocs-material/blob/master/src/assets/javascripts/templates/search/index.tsx#L90 [truncated]: https://github.com/squidfunk/mkdocs-material/blob/master/src/assets/javascripts/templates/search/index.tsx#L90
[16]: search-better-faster-smaller/search-preview.png [search preview]: search-better-faster-smaller/search-preview.png
[17]: https://pmarsceill.github.io/just-the-docs/ [Just the Docs]: https://pmarsceill.github.io/just-the-docs/
[18]: https://github.com/lelouch77/docusaurus-lunr-search [Docusaurus]: https://github.com/lelouch77/docusaurus-lunr-search
2. __Context-awareness__: Although whitespace doesn't work for all languages, 1. __Context-awareness__: Although whitespace doesn't work for all languages,
one could argue that it could be a good enough solution. Unfortunately, this one could argue that it could be a good enough solution. Unfortunately, this
is not necessarily true for code blocks, as the removal of whitespace might is not necessarily true for code blocks, as the removal of whitespace might
change meaning in some languages. change meaning in some languages.
3. __Structure__: Preserving structural information is not a must, but 3. __Structure__: Preserving structural information is not a must, but
apparently beneficial to build more meaningful search previews which allow apparently beneficial to build more meaningful search previews which allow
for a quick evaluation of relevance. If a word occurrence is part of a code for a quick evaluation of relevance. If a word occurrence is part of a code
block, it should be rendered as a code block. block, it should be rendered as a code block.
## What's new? ## What's new?
@ -291,41 +291,34 @@ into the internals of our new search implementation to see which of the
problems it already solves, a quick overview of what features and improvements problems it already solves, a quick overview of what features and improvements
it brings: it brings:
- __Better__: support for [rich search previews][19], preserving the structural - __Better__: support for [rich search previews], preserving the structural
information of code blocks, inline code, and lists, so they are rendered information of code blocks, inline code, and lists, so they are rendered
as-is, as well as [lookahead tokenization][20], as-is, as well as [lookahead tokenization], [more accurate highlighting], and
[more accurate highlighting][21], and improved stability of typeahead. Also, improved stability of typeahead. Also, a [slightly better UX].
a [slightly better UX][22].
- __Faster__ and __smaller__: significant decrease in search index size of up - __Faster__ and __smaller__: significant decrease in search index size of up
to 48% due to improved extraction and construction techniques, resulting in a to 48% due to improved extraction and construction techniques, resulting in a
search experience that is up to 95% faster, which is particularly helpful for search experience that is up to 95% faster, which is particularly helpful for
large documentation projects. large documentation projects.
_Note that our new search implementation is currently 'Insiders only', which [rich search previews]: #rich-search-previews
means that it is reserved for sponsors because it's those sponsors that make [lookahead tokenization]: #tokenizer-lookahead
features like this possible._ [more accurate highlighting]: #accurate-highlighting
[slightly better UX]: #user-interface
[:octicons-heart-fill-24:{ .mdx-heart } &nbsp; I want to become a sponsor](../../insiders/index.md){ .md-button .md-button--primary }
[19]: #rich-search-previews
[20]: #tokenizer-lookahead
[21]: #accurate-highlighting
[22]: #user-interface
### Rich search previews ### Rich search previews
As we rebuilt the search plugin from scratch, we reworked the construction of As we rebuilt the search plugin from scratch, we reworked the construction of
the search index to preserve the structural information of code blocks, inline the search index to preserve the structural information of code blocks, inline
code, as well as unordered and ordered lists. Using the example from the code, as well as unordered and ordered lists. Using the example from the
[search index][23] section, here's how it looks: [search index] section, here's how it looks:
=== "Now" === "Now"
![Search preview now][24] ![search preview now]
=== "Before" === "Before"
![Search preview before][25] ![search preview before]
Now, __code blocks are first-class citizens of search previews__, and even Now, __code blocks are first-class citizens of search previews__, and even
inline code formatting is preserved. Let's take a look at the new structure of inline code formatting is preserved. Let's take a look at the new structure of
@ -390,15 +383,15 @@ the search index to understand why:
If we inspect the search index again, we can see how the situation improved: If we inspect the search index again, we can see how the situation improved:
1. __Content is included only once__: the search index does not include the 1. __Content is included only once__: the search index does not include the
content of the page twice, as only the sections of a page are part of the content of the page twice, as only the sections of a page are part of the
search index. This leads to a significant reduction in size, fewer bytes to search index. This leads to a significant reduction in size, fewer bytes to
transfer, and a smaller search index. transfer, and a smaller search index.
2. __Some structure is preserved__: each section of the search index includes a 2. __Some structure is preserved__: each section of the search index includes
small subset of HTML to provide the necessary structure to allow for more a small subset of HTML to provide the necessary structure to allow for more
sophisticated search previews. Revisiting our example from before, let's sophisticated search previews. Revisiting our example from before, let's
look at an excerpt: look at an excerpt:
=== "Now" === "Now"
@ -418,15 +411,15 @@ If we inspect the search index again, we can see how the situation improved:
On to the next step in the process: __tokenization__. On to the next step in the process: __tokenization__.
[23]: #search-index [search index]: #search-index
[24]: search-better-faster-smaller/search-preview-now.png [search preview now]: search-better-faster-smaller/search-preview-now.png
[25]: search-better-faster-smaller/search-preview-before.png [search preview before]: search-better-faster-smaller/search-preview-before.png
### Tokenizer lookahead ### Tokenizer lookahead
The [default tokenizer][12] of [lunr][6] uses a regular expression to split a The [default tokenizer] of [lunr] uses a regular expression to split a given
given string by matching each character against the [separator][11] as defined string by matching each character against the [separator] as defined in
in `mkdocs.yml`. This doesn't allow for more complex separators based on `mkdocs.yml`. This doesn't allow for more complex separators based on
lookahead or multiple characters. lookahead or multiple characters.
Fortunately, __our new search implementation provides an advanced tokenizer__ Fortunately, __our new search implementation provides an advanced tokenizer__
@ -443,13 +436,14 @@ characters at which the string should be split, the following three sections
explain the remainder of the regular expression.[^4] explain the remainder of the regular expression.[^4]
[^4]: [^4]:
As a fun fact: the [separator default value][26] of the search plugin being As a fun fact: the [separator default value] of the search plugin being
`[\s\-]+` always has been kind of irritating, as it suggests that multiple `[\s\-]+` always has been kind of irritating, as it suggests that multiple
characters can be considered being a separator. However, the `+` is characters can be considered being a separator. However, the `+` is
completely irrelevant, as regular expression groups involving multiple completely irrelevant, as regular expression groups involving multiple
characters were never supported by [lunr's default tokenizer][12]. characters were never supported by
[lunr's default tokenizer][default tokenizer].
[26]: https://www.mkdocs.org/user-guide/configuration/#separator [separator default value]: https://www.mkdocs.org/user-guide/configuration/#separator
#### Case changes #### Case changes
@ -470,18 +464,18 @@ character followed by a lowercase character), and has the following behavior:
- `camelCase` :octicons-arrow-right-24: `camel`, `Case` - `camelCase` :octicons-arrow-right-24: `camel`, `Case`
- `UPPERCASE` :octicons-arrow-right-24: `UPPERCASE` - `UPPERCASE` :octicons-arrow-right-24: `UPPERCASE`
Searching for [:octicons-search-24: searchHighlight][27] now brings up the Searching for [:octicons-search-24: searchHighlight][q=searchHighlight]
section discussing the `search.highlight` feature flag, which also demonstrates now brings up the section discussing the `search.highlight` feature flag, which
that this even works for search queries now![^5] also demonstrates that this now even works properly for search queries.[^5]
[^5]: [^5]:
Previously, the search query was not correctly tokenized due to the way Previously, the search query was not correctly tokenized due to the way
[lunr][6] treats wildcards, as it disables the pipeline for search terms [lunr] treats wildcards, as it disables the pipeline for search terms that
that contain wildcards. In order to provide a good typeahead experience, contain wildcards. In order to provide a good typeahead experience,
Material for MkDocs adds wildcards to the end of each search term not Material for MkDocs adds wildcards to the end of each search term not
explicitly preceded with `+` or `-`, effectively disabling tokenization. explicitly preceded with `+` or `-`, effectively disabling tokenization.
[27]: ?q=searchHighlight [q=searchHighlight]: ?q=searchHighlight
#### Version numbers #### Version numbers
@ -496,10 +490,10 @@ undiscoverable. Thus, the following expression:
This regular expression matches a `.` only if not immediately followed by a This regular expression matches a `.` only if not immediately followed by a
digit `\d`, which leaves version numbers discoverable. Searching for digit `\d`, which leaves version numbers discoverable. Searching for
[:octicons-search-24: 7.2.6][28] brings up the [7.2.6][29] release notes. [:octicons-search-24: 7.2.6][q=7.2.6] brings up the [7.2.6] release notes.
[28]: ?q=7.2.6 [q=7.2.6]: ?q=7.2.6
[29]: ../../changelog/index.md#726-_-september-1-2021 [7.2.6]: ../../changelog/index.md#726-_-september-1-2021
#### HTML/XML tags #### HTML/XML tags
@ -512,9 +506,9 @@ following expression to the separator allows for just that:
&[lg]t; &[lg]t;
``` ```
Searching for [:octicons-search-24: custom search worker script][30] brings up Searching for [:octicons-search-24: custom search worker script][q=script]
the section on [custom search][31] and matches the `script` tag among the other brings up the section on [custom search] and matches the `script` tag among the
search terms discovered. other search terms discovered.
--- ---
@ -522,19 +516,19 @@ _We've only just begun to scratch the surface of the new possibilities
tokenizer lookahead brings. If you found other useful expressions, you're tokenizer lookahead brings. If you found other useful expressions, you're
invited to share them in the comment section._ invited to share them in the comment section._
[30]: ?q=custom+search+worker+script [q=script]: ?q=custom+search+worker+script
[31]: ../../setup/setting-up-site-search.md#custom-search [custom search]: ../../setup/setting-up-site-search.md#custom-search
### Accurate highlighting ### Accurate highlighting
Highlighting is the last step in the process of search and involves the Highlighting is the last step in the process of search and involves the
highlighting of all search term occurrences in a given search result. For a highlighting of all search term occurrences in a given search result. For a
long time, highlighting was implemented through dynamically generated long time, highlighting was implemented through dynamically generated
[regular expressions][32].[^6] [regular expressions].[^6]
This approach has some problems with non-whitespace languages like Japanese or This approach has some problems with non-whitespace languages like Japanese or
Chinese[^3] since it only works if the highlighted term is at a word boundary. Chinese[^3] since it only works if the highlighted term is at a word boundary.
However, Asian languages are tokenized using a [dedicated segmenter][33], which However, Asian languages are tokenized using a [dedicated segmenter], which
cannot be modeled with regular expressions. cannot be modeled with regular expressions.
[^6]: [^6]:
@ -544,32 +538,33 @@ cannot be modeled with regular expressions.
regular expression `(^|<separator>)(search|highlight)`, which only matches regular expression `(^|<separator>)(search|highlight)`, which only matches
at word boundaries. at word boundaries.
Now, as a direct result of the [new tokenization approach][34], __our new Now, as a direct result of the [new tokenization approach], __our new search
search implementation uses token positions for highlighting__, making it implementation uses token positions for highlighting__, making it exactly as
exactly as powerful as tokenization: powerful as tokenization:
1. __Word boundaries__: as the new highlighter uses token positions, word 1. __Word boundaries__: as the new highlighter uses token positions, word
boundaries are equal to token boundaries. This means that more complex cases boundaries are equal to token boundaries. This means that more complex cases
of tokenization (e.g., [case changes][35], [version numbers][36], [HTML/XML of tokenization (e.g., [case changes], [version numbers], [HTML/XML tags]),
tags][37]), are now all highlighted accurately. are now all highlighted accurately.
2. __Context-awareness__: as the new search index preserves some of the 2. __Context-awareness__: as the new search index preserves some of the
structural information of the original document, the content of a section is structural information of the original document, the content of a section
now divided into separate content blocks paragraphs, code blocks, and is now divided into separate content blocks paragraphs, code blocks, and
lists. lists.
Now, only the content blocks that actually contain occurrences of one of Now, only the content blocks that actually contain occurrences of one of
the search terms are considered for inclusion into the search preview. If a the search terms are considered for inclusion into the search preview. If a
term only occurs in a code block, it's the code block that gets rendered, term only occurs in a code block, it's the code block that gets rendered,
see, for example, the results of [:octicons-search-24: twitter][38]. see, for example, the results of
[:octicons-search-24: twitter][q=twitter].
[32]: https://github.com/squidfunk/mkdocs-material/blob/ec7ccd2b2d15dd033740f388912f7be7738feec2/src/assets/javascripts/integrations/search/highlighter/index.ts#L61-L91 [regular expressions]: https://github.com/squidfunk/mkdocs-material/blob/ec7ccd2b2d15dd033740f388912f7be7738feec2/src/assets/javascripts/integrations/search/highlighter/index.ts#L61-L91
[33]: http://chasen.org/~taku/software/TinySegmenter/ [dedicated segmenter]: http://chasen.org/~taku/software/TinySegmenter/
[34]: #tokenizer-lookahead [new tokenization approach]: #tokenizer-lookahead
[35]: #case-changes [case changes]: #case-changes
[36]: #version-numbers [version numbers]: #version-numbers
[37]: #htmlxml-tags [HTML/XML tags]: #htmlxml-tags
[38]: ?q=twitter [q=twitter]: ?q=twitter
### Benchmarks ### Benchmarks
@ -601,7 +596,7 @@ reach:
Smallest value of ten distinct runs. Smallest value of ten distinct runs.
[^8]: [^8]:
We agnostically use [KJV Markdown][39] as a tool for testing to learn how We agnostically use [KJV Markdown] as a tool for testing to learn how
Material for MkDocs behaves on large corpora, as it's a very large set of Material for MkDocs behaves on large corpora, as it's a very large set of
Markdown files with over 800k words. Markdown files with over 800k words.
@ -611,12 +606,12 @@ new search is up to 95% faster__. This is a significant improvement,
particularly relevant for large documentation projects. particularly relevant for large documentation projects.
While 1,3s still may sound like a long time, using the new client-side search While 1,3s still may sound like a long time, using the new client-side search
together with [instant loading][40] only creates the search index on the initial together with [instant loading] only creates the search index on the initial
page load. When navigating, the search index is preserved across pages, so the page load. When navigating, the search index is preserved across pages, so the
cost does only have to be paid once. cost does only have to be paid once.
[39]: https://github.com/arleym/kjv-markdown [KJV Markdown]: https://github.com/arleym/kjv-markdown
[40]: ../../setup/setting-up-navigation.md#instant-loading [instant loading]: ../../setup/setting-up-navigation.md#instant-loading
### User interface ### User interface
@ -643,8 +638,12 @@ better. Next up:
If you've made it this far, thank you for your time and interest in Material If you've made it this far, thank you for your time and interest in Material
for MkDocs! This is the first blog article that I decided to write after a for MkDocs! This is the first blog article that I decided to write after a
short [Twitter survey][41] made me to. You're invited to [leave a comment][42] short [Twitter survey] made me to. ~~You're invited to leave a comment
to share your experiences with the new search implementation. to share your experiences with the new search implementation.~~[^9]
[41]: https://twitter.com/squidfunk/status/1434477478823743488 [^9]:
[42]: #__comments We've disabled comments due to Disqus' ads being so incredibly horrible
and invasive. If you know a better alternative, please ping me at
martin.donath@squidfunk.com.
[Twitter survey]: https://twitter.com/squidfunk/status/1434477478823743488

View File

@ -4,14 +4,35 @@ search:
exclude: true exclude: true
--- ---
<style>
.md-sidebar--secondary:not([hidden]) {
visibility: hidden;
}
</style>
# Blog # Blog
<h2>Excluding content from search</h2> ## [Excluding content from search]
__The latest Insiders release brings three new simple ways to exclude dedicated __The latest Insiders release brings three new simple ways to exclude dedicated
parts of a document from the search index, allowing for more fine-grained parts of a document from the search index, allowing for more fine-grained
control.__ control.__
<aside class="mdx-author" markdown>
![@squidfunk][@squidfunk avatar]
<span>__Martin Donath__ · @squidfunk</span>
<span>
:octicons-calendar-24: September 26, 2021 ·
:octicons-clock-24: 5 min read ·
[:octicons-tag-24: 7.3.0+insiders-3.1.1](../../insiders/changelog.md#311-_-september-26-2021)
</span>
</aside>
[@squidfunk avatar]: https://avatars.githubusercontent.com/u/932156
---
Two weeks ago, Material for MkDocs Insiders shipped a brand new search plugin, Two weeks ago, Material for MkDocs Insiders shipped a brand new search plugin,
yielding massive improvements in usability, but also in speed and size of the yielding massive improvements in usability, but also in speed and size of the
search index. Interestingly, as discussed in the previous blog article, we only search index. Interestingly, as discussed in the previous blog article, we only
@ -20,17 +41,29 @@ features that enhance the writing experience, allowing for more fine-grained
control of what pages, sections and blocks of a Markdown file should be indexed control of what pages, sections and blocks of a Markdown file should be indexed
by the built-in search functionality. by the built-in search functionality.
[Continue reading :octicons-arrow-right-24:][2]{ .md-button } [:octicons-arrow-right-24: Continue reading][Excluding content from search]
[2]: 2021/excluding-content-from-search.md [Excluding content from search]: 2021/excluding-content-from-search.md
## [Search: better, faster, smaller]
<h2>Search: better, faster, smaller</h2>
__This is the story of how we managed to completely rebuild client-side search, __This is the story of how we managed to completely rebuild client-side search,
delivering a significantly better user experience while making it faster and delivering a significantly better user experience while making it faster and
smaller at the same time.__ smaller at the same time.__
<aside class="mdx-author" markdown>
![@squidfunk][@squidfunk avatar]
<span>__Martin Donath__ · @squidfunk</span>
<span>
:octicons-calendar-24: September 13, 2021 ·
:octicons-clock-24: 15 min read ·
[:octicons-tag-24: 7.2.6+insiders-3.0.0](../../insiders/changelog.md#300-_-september-13-2021)
</span>
</aside>
---
The search of Material for MkDocs is by far one of its best and most-loved The search of Material for MkDocs is by far one of its best and most-loved
assets: multilingual, offline-capable, and most importantly: _all client-side_. assets: multilingual, offline-capable, and most importantly: _all client-side_.
It provides a solution to empower the users of your documentation to find what It provides a solution to empower the users of your documentation to find what
@ -41,6 +74,6 @@ integration from the ground up. This article shines some light on the internals
of the new search, why it's much more powerful than the previous version, and of the new search, why it's much more powerful than the previous version, and
what's about to come. what's about to come.
[Continue reading :octicons-arrow-right-24:][1]{ .md-button } [:octicons-arrow-right-24: Continue reading][Search: better, faster, smaller]
[1]: 2021/search-better-faster-smaller.md [Search: better, faster, smaller]: 2021/search-better-faster-smaller.md