mirror of
https://github.com/squidfunk/mkdocs-material.git
synced 2024-06-14 11:52:32 +03:00
Updated blog articles to use named links
This commit is contained in:
parent
be9d7b5b86
commit
d13ed6c3f0
@ -3,7 +3,6 @@ template: overrides/main.html
|
||||
description: >
|
||||
Three new simple ways to exclude dedicated parts of a document from the search
|
||||
index, allowing for more fine-grained control
|
||||
disqus: mkdocs-material
|
||||
search:
|
||||
exclude: true
|
||||
---
|
||||
@ -15,7 +14,7 @@ dedicated parts of a document from the search index, allowing for more
|
||||
fine-grained control.__
|
||||
|
||||
<aside class="mdx-author" markdown>
|
||||
![@squidfunk][1]
|
||||
![@squidfunk][@squidfunk avatar]
|
||||
|
||||
<span>__Martin Donath__ · @squidfunk</span>
|
||||
<span>
|
||||
@ -25,13 +24,13 @@ fine-grained control.__
|
||||
</span>
|
||||
</aside>
|
||||
|
||||
[1]: https://avatars.githubusercontent.com/u/932156
|
||||
[@squidfunk avatar]: https://avatars.githubusercontent.com/u/932156
|
||||
|
||||
---
|
||||
|
||||
Two weeks ago, Material for MkDocs Insiders shipped a [brand new search
|
||||
plugin][2], yielding [massive improvements in usability][3], but also in [speed
|
||||
and size][4] of the search index. Interestingly, as discussed in the previous
|
||||
plugin], yielding [massive improvements in usability], but also in [speed
|
||||
and size] of the search index. Interestingly, as discussed in the previous
|
||||
blog article, we only scratched the surface of what's now possible. This
|
||||
release brings some useful features that enhance the writing experience,
|
||||
allowing for more fine-grained control of what pages, sections and blocks of a
|
||||
@ -39,18 +38,18 @@ Markdown file should be indexed by the built-in search functionality.
|
||||
|
||||
_The following section discusses existing solutions for excluding pages and
|
||||
sections from the search index. If you immediately want to learn what's new,
|
||||
skip to the [section just after that][5]._
|
||||
skip to the [section just after that][what's new]._
|
||||
|
||||
[2]: search-better-faster-smaller.md
|
||||
[3]: search-better-faster-smaller.md#whats-new
|
||||
[4]: search-better-faster-smaller.md#benchmarks
|
||||
[5]: #whats-new
|
||||
[brand new search plugin]: search-better-faster-smaller.md
|
||||
[massive improvements in usability]: search-better-faster-smaller.md#whats-new
|
||||
[speed and size]: search-better-faster-smaller.md#benchmarks
|
||||
[what's new]: #whats-new
|
||||
|
||||
## Prior art
|
||||
|
||||
MkDocs has a rich and thriving ecosystem of [plugins][6], and it comes as no
|
||||
MkDocs has a rich and thriving ecosystem of [plugins], and it comes as no
|
||||
surprise that there's already a fantastic plugin by @chrieke to exclude specific
|
||||
sections of a Markdown file – the [mkdocs-exclude-search][7] plugin. It can be
|
||||
sections of a Markdown file – the [mkdocs-exclude-search] plugin. It can be
|
||||
installed with:
|
||||
|
||||
```
|
||||
@ -78,10 +77,10 @@ adds support for advanced filtering techniques like infix- and suffix-filtering
|
||||
using wildcards. While this is a very powerful idea, it comes with some
|
||||
downsides:
|
||||
|
||||
1. __Exclusion patterns and content are not co-located__: exclusion patterns
|
||||
need to be defined in `mkdocs.yml`, and not as part of the respective
|
||||
document or section to be excluded. This might result in stale exclusion
|
||||
patterns, leading to unintended behavior:
|
||||
1. __Exclusion patterns and content are not co-located__: exclusion patterns
|
||||
need to be defined in `mkdocs.yml`, and not as part of the respective
|
||||
document or section to be excluded. This might result in stale exclusion
|
||||
patterns, leading to unintended behavior:
|
||||
|
||||
- When a headline is changed, its slug (permalink) also changes, which might
|
||||
suddenly match (or unmatch) a pattern, e.g., when an author fixes a typo
|
||||
@ -97,23 +96,23 @@ downsides:
|
||||
pages and sections have been excluded from the search index, but MkDocs will
|
||||
now flood the terminal with debug output from its core and other plugins.
|
||||
|
||||
2. __Exclusion control might be too coarse__: The [mkdocs-exclude-search][7]
|
||||
plugin only allows for the exclusion of pages and sections. It's not possible
|
||||
to exclude parts of a section, e.g., content that is irrelevant to search but
|
||||
must be included as part of the documentation.
|
||||
2. __Exclusion control might be too coarse__: The [mkdocs-exclude-search]
|
||||
plugin only allows for the exclusion of pages and sections. It's not
|
||||
possible to exclude parts of a section, e.g., content that is irrelevant
|
||||
to search but must be included as part of the documentation.
|
||||
|
||||
[6]: https://github.com/mkdocs/mkdocs/wiki/MkDocs-Plugins
|
||||
[7]: https://github.com/chrieke/mkdocs-exclude-search
|
||||
[plugins]: https://github.com/mkdocs/mkdocs/wiki/MkDocs-Plugins
|
||||
[mkdocs-exclude-search]: https://github.com/chrieke/mkdocs-exclude-search
|
||||
|
||||
## What's new?
|
||||
|
||||
The latest Insiders release brings fine-grained control for [__excluding pages,
|
||||
sections, and blocks__][8] from the search index, implemented through front
|
||||
matter, as well as the [Attribute List][9] extension. Note that it doesn't
|
||||
replace the [mkdocs-exclude-search][7] plugin but _complements_ it.
|
||||
sections, and blocks__][search exclusion] from the search index, implemented
|
||||
through front matter, as well as the [Attribute Lists]. Note that it doesn't
|
||||
replace the [mkdocs-exclude-search] plugin but __complements__ it.
|
||||
|
||||
[8]: ../../setup/setting-up-site-search.md#search-exclusion
|
||||
[9]: https://python-markdown.github.io/extensions/attr_list/
|
||||
[search exclusion]: ../../setup/setting-up-site-search.md#search-exclusion
|
||||
[Attribute Lists]: ../../setup/extensions/python-markdown.md#attribute-lists
|
||||
|
||||
### Excluding pages
|
||||
|
||||
@ -134,12 +133,12 @@ search:
|
||||
|
||||
### Excluding sections
|
||||
|
||||
If a section should be excluded, the author can use the [Attribute List][9]
|
||||
If a section should be excluded, the author can use the [Attribute Lists]
|
||||
extension to add a __pragma__ called `{ data-search-exclude }` at the end of a
|
||||
heading. The pragma is not included in the final HTML, as search pragmas are
|
||||
filtered by the search plugin before the page is rendered:
|
||||
|
||||
=== "`docs/page.md`"
|
||||
=== ":octicons-file-code-16: docs/page.md"
|
||||
|
||||
``` markdown
|
||||
# Document title
|
||||
@ -153,7 +152,7 @@ filtered by the search plugin before the page is rendered:
|
||||
The content of this section is excluded
|
||||
```
|
||||
|
||||
=== "`search_index.json`"
|
||||
=== ":octicons-codescan-16: search_index.json"
|
||||
|
||||
``` json
|
||||
{
|
||||
@ -176,10 +175,10 @@ filtered by the search plugin before the page is rendered:
|
||||
### Excluding blocks
|
||||
|
||||
If even more fine-grained control is desired, the __pragma__ can be added to
|
||||
any [block-level element][10] or [inline-level element][11] that is officially
|
||||
supported by the [Attribute List][9] extension:
|
||||
any [block-level element] or [inline-level element] that is officially
|
||||
supported by the [Attribute Lists] extension:
|
||||
|
||||
=== "`docs/page.md`"
|
||||
=== ":octicons-file-code-16: docs/page.md"
|
||||
|
||||
``` markdown
|
||||
# Document title
|
||||
@ -190,7 +189,7 @@ supported by the [Attribute List][9] extension:
|
||||
{ data-search-exclude }
|
||||
```
|
||||
|
||||
=== "`search_index.json`"
|
||||
=== ":octicons-codescan-16: search_index.json"
|
||||
|
||||
``` json
|
||||
{
|
||||
@ -205,12 +204,12 @@ supported by the [Attribute List][9] extension:
|
||||
}
|
||||
```
|
||||
|
||||
[10]: https://python-markdown.github.io/extensions/attr_list/#block-level
|
||||
[11]: https://python-markdown.github.io/extensions/attr_list/#inline-level
|
||||
[block-level element]: https://python-markdown.github.io/extensions/attr_list/#block-level
|
||||
[inline-level element]: https://python-markdown.github.io/extensions/attr_list/#inline
|
||||
|
||||
## Conclusion
|
||||
|
||||
The latest release brings three simple ways to control more precisely what goes
|
||||
into the search index and what doesn't. It complements the already very powerful
|
||||
[mkdocs-exclude-search][7] plugin, allowing for new methods of shaping the
|
||||
[mkdocs-exclude-search] plugin, allowing for new methods of shaping the
|
||||
structure, size and content of the search index.
|
||||
|
@ -3,7 +3,6 @@ template: overrides/main.html
|
||||
description: >
|
||||
How we rebuilt client-side search, delivering a better user experience while
|
||||
making it faster and smaller at the same time
|
||||
disqus: mkdocs-material
|
||||
search:
|
||||
exclude: true
|
||||
---
|
||||
@ -15,7 +14,7 @@ delivering a significantly better user experience while making it faster and
|
||||
smaller at the same time.__
|
||||
|
||||
<aside class="mdx-author" markdown>
|
||||
![@squidfunk][1]
|
||||
![@squidfunk][@squidfunk avatar]
|
||||
|
||||
<span>__Martin Donath__ · @squidfunk</span>
|
||||
<span>
|
||||
@ -25,12 +24,12 @@ smaller at the same time.__
|
||||
</span>
|
||||
</aside>
|
||||
|
||||
[1]: https://avatars.githubusercontent.com/u/932156
|
||||
[@squidfunk avatar]: https://avatars.githubusercontent.com/u/932156
|
||||
|
||||
---
|
||||
|
||||
The [search][2] of Material for MkDocs is by far one of its best and most-loved
|
||||
assets: [multilingual][3], [offline-capable][4], and most importantly: _all
|
||||
The [search] of Material for MkDocs is by far one of its best and most-loved
|
||||
assets: [multilingual], [offline-capable], and most importantly: _all
|
||||
client-side_. It provides a solution to empower the users of your documentation
|
||||
to find what they're searching for instantly without the headache of managing
|
||||
additional servers. However, even though several iterations have been made,
|
||||
@ -41,19 +40,19 @@ version, and what's about to come.
|
||||
|
||||
_The next section discusses the architecture and issues of the current search
|
||||
implementation. If you immediately want to learn what's new, skip to the
|
||||
[section just after that][5]._
|
||||
[section just after that][what's new]._
|
||||
|
||||
[2]: ../../setup/setting-up-site-search.md
|
||||
[3]: ../../setup/setting-up-site-search.md#lang
|
||||
[4]: ../../setup/setting-up-site-search.md#offline-search
|
||||
[5]: #whats-new
|
||||
[search]: ../../setup/setting-up-site-search.md
|
||||
[multilingual]: ../../setup/setting-up-site-search.md#lang
|
||||
[offline-capable]: ../../setup/setting-up-site-search.md#offline-search
|
||||
[what's new]: #whats-new
|
||||
|
||||
## Architecture
|
||||
|
||||
Material for MkDocs uses [lunr][6] together with [lunr-languages][7] to
|
||||
implement its client-side search capabilities. When a documentation page is
|
||||
loaded and JavaScript is available, the search index as generated by the
|
||||
[built-in search plugin][8] during the build process is requested from the
|
||||
Material for MkDocs uses [lunr] together with [lunr-languages] to implement
|
||||
its client-side search capabilities. When a documentation page is loaded and
|
||||
JavaScript is available, the search index as generated by the
|
||||
[built-in search plugin] during the build process is requested from the
|
||||
server:
|
||||
|
||||
``` ts
|
||||
@ -64,9 +63,9 @@ const index$ = document.forms.namedItem("search")
|
||||
: NEVER
|
||||
```
|
||||
|
||||
[6]: https://lunrjs.com
|
||||
[7]: https://github.com/MihaiValentin/lunr-languages
|
||||
[8]: ../../setup/setting-up-site-search.md#built-in-search
|
||||
[lunr]: https://lunrjs.com
|
||||
[lunr-languages]: https://github.com/MihaiValentin/lunr-languages
|
||||
[built-in search plugin]: ../../setup/setting-up-site-search.md#built-in-search
|
||||
|
||||
### Search index
|
||||
|
||||
@ -76,7 +75,7 @@ the original Markdown file:
|
||||
|
||||
??? example "Expand to inspect example"
|
||||
|
||||
=== "`docs/page.md`"
|
||||
=== ":octicons-file-code-16: docs/page.md"
|
||||
|
||||
```` markdown
|
||||
# Example
|
||||
@ -106,7 +105,7 @@ the original Markdown file:
|
||||
* Profit!
|
||||
````
|
||||
|
||||
=== "`search_index.json`"
|
||||
=== ":octicons-codescan-16: search_index.json"
|
||||
|
||||
``` json
|
||||
{
|
||||
@ -146,15 +145,15 @@ the original Markdown file:
|
||||
|
||||
If we inspect the search index, we immediately see several problems:
|
||||
|
||||
1. __All content is included twice__: the search index contains one entry
|
||||
with the entire contents of the page, and one entry for each section of
|
||||
the page, i.e., each block preceded by a headline or subheadline. This
|
||||
significantly contributes to the size of the search index.
|
||||
1. __All content is included twice__: the search index contains one entry
|
||||
with the entire contents of the page, and one entry for each section of
|
||||
the page, i.e., each block preceded by a headline or subheadline. This
|
||||
significantly contributes to the size of the search index.
|
||||
|
||||
2. __All structure is lost__: when the search index is built, all structural
|
||||
information like HTML tags and attributes are stripped from the content.
|
||||
While this approach works well for paragraphs and inline formatting, it
|
||||
might be problematic for lists and code blocks. An excerpt:
|
||||
2. __All structure is lost__: when the search index is built, all structural
|
||||
information like HTML tags and attributes are stripped from the content.
|
||||
While this approach works well for paragraphs and inline formatting, it
|
||||
might be problematic for lists and code blocks. An excerpt:
|
||||
|
||||
```
|
||||
… links , or even code : if (isAwesome) { … } Lists Sometimes you want …
|
||||
@ -172,51 +171,52 @@ If we inspect the search index, we immediately see several problems:
|
||||
|
||||
It's not difficult to see that it can be quite challenging to implement a good
|
||||
search experience for theme authors, which is why Material for MkDocs (up to
|
||||
now) did some [monkey patching][9] to be able to render slightly more
|
||||
now) did some [monkey patching] to be able to render slightly more
|
||||
meaningful search previews.
|
||||
|
||||
[monkey patching]: https://github.com/squidfunk/mkdocs-material/blob/ec7ccd2b2d15dd033740f388912f7be7738feec2/src/assets/javascripts/integrations/search/document/index.ts#L68-L71
|
||||
|
||||
### Search worker
|
||||
|
||||
The actual search functionality is implemented as part of a web worker[^1],
|
||||
which creates and manages the [lunr][6] search index. When search is
|
||||
initialized, the following steps are taken:
|
||||
which creates and manages the [lunr] search index. When search is initialized,
|
||||
the following steps are taken:
|
||||
|
||||
[^1]:
|
||||
Prior to [version 5.0][10], search was carried out in the main thread which
|
||||
locked up the browser, rendering it unusable. This problem was first
|
||||
Prior to :octicons-tag-24: 5.0.0, search was carried out in the main thread
|
||||
which locked up the browser, rendering it unusable. This problem was first
|
||||
reported in #904 and, after some back and forth, fixed and released in
|
||||
version 5.0.
|
||||
:octicons-tag-24: 5.0.0.
|
||||
|
||||
1. __Linking sections with pages__: The search index is parsed, and each section
|
||||
is linked to its parent page. The parent page itself is _not indexed_, as it
|
||||
would lead to duplicate results, so only the sections remain. Linking is
|
||||
necessary, as search results are grouped by page.
|
||||
1. __Linking sections with pages__: The search index is parsed, and each
|
||||
section is linked to its parent page. The parent page itself is _not
|
||||
indexed_, as it would lead to duplicate results, so only the sections
|
||||
remain. Linking is necessary, as search results are grouped by page.
|
||||
|
||||
2. __Tokenization__: The `title` and `text` values of each section are split
|
||||
into tokens by using the [separator][11] as configured in `mkdocs.yml`.
|
||||
Tokenization itself is carried out by [lunr's default tokenizer][12], which
|
||||
doesn't allow for lookahead or separators spanning multiple characters.
|
||||
2. __Tokenization__: The `title` and `text` values of each section are split
|
||||
into tokens by using the [separator] as configured in `mkdocs.yml`.
|
||||
Tokenization itself is carried out by
|
||||
[lunr's default tokenizer][default tokenizer], which doesn't allow for
|
||||
lookahead or separators spanning multiple characters.
|
||||
|
||||
> Why is this important and a big deal? We will see later how much more we
|
||||
> can achieve with a tokenizer that is capable of separating strings with
|
||||
> lookahead.
|
||||
|
||||
3. __Indexing__: As a final step, each section is indexed. When querying the
|
||||
index, if a search query includes one of the tokens as returned by step 2.,
|
||||
the section is considered to be part of the search result and passed to the
|
||||
main thread.
|
||||
1. __Indexing__: As a final step, each section is indexed. When querying the
|
||||
index, if a search query includes one of the tokens as returned by step 2.,
|
||||
the section is considered to be part of the search result and passed to the
|
||||
main thread.
|
||||
|
||||
Now, that's basically how the search worker operates. Sure, there's a little
|
||||
more magic involved, e.g., search results are [post-processed][13] and
|
||||
[rescored][14] to account for some shortcomings of [lunr][6], but in general,
|
||||
this is how data gets into and out of the index.
|
||||
more magic involved, e.g., search results are [post-processed] and [rescored] to
|
||||
account for some shortcomings of [lunr], but in general, this is how data gets
|
||||
into and out of the index.
|
||||
|
||||
[9]: https://github.com/squidfunk/mkdocs-material/blob/ec7ccd2b2d15dd033740f388912f7be7738feec2/src/assets/javascripts/integrations/search/document/index.ts#L68-L71
|
||||
[10]: https://squidfunk.github.io/mkdocs-material/upgrading/#upgrading-from-4x-to-5x
|
||||
[11]: ../../setup/setting-up-site-search.md#separator
|
||||
[12]: https://github.com/olivernn/lunr.js/blob/aa5a878f62a6bba1e8e5b95714899e17e8150b38/lunr.js#L413-L456
|
||||
[13]: https://github.com/squidfunk/mkdocs-material/blob/ec7ccd2b2d15dd033740f388912f7be7738feec2/src/assets/javascripts/integrations/search/_/index.ts#L249-L272
|
||||
[14]: https://github.com/squidfunk/mkdocs-material/blob/ec7ccd2b2d15dd033740f388912f7be7738feec2/src/assets/javascripts/integrations/search/_/index.ts#L274-L275
|
||||
[separator]: ../../setup/setting-up-site-search.md#separator
|
||||
[default tokenizer]: https://github.com/olivernn/lunr.js/blob/aa5a878f62a6bba1e8e5b95714899e17e8150b38/lunr.js#L413-L456
|
||||
[post-processed]: https://github.com/squidfunk/mkdocs-material/blob/ec7ccd2b2d15dd033740f388912f7be7738feec2/src/assets/javascripts/integrations/search/_/index.ts#L249-L272
|
||||
[rescored]: https://github.com/squidfunk/mkdocs-material/blob/ec7ccd2b2d15dd033740f388912f7be7738feec2/src/assets/javascripts/integrations/search/_/index.ts#L274-L275
|
||||
|
||||
### Search previews
|
||||
|
||||
@ -228,11 +228,11 @@ experience.
|
||||
This is where the current search preview generation falls short, as some of the
|
||||
search previews appear not to include any occurrence of any of the search
|
||||
terms. This was due to the fact that search previews were [truncated after a
|
||||
maximum of 320 characters][15], as can be seen here:
|
||||
maximum of 320 characters][truncated], as can be seen here:
|
||||
|
||||
<figure markdown>
|
||||
|
||||
![Search previews][16]
|
||||
![search preview]
|
||||
|
||||
<figcaption markdown>
|
||||
|
||||
@ -261,28 +261,28 @@ carefully considered:
|
||||
split strings into tokens.
|
||||
|
||||
[^2]:
|
||||
At the time of writing, [Just the Docs][17] and [Docusaurus][18] use this
|
||||
method for generating search previews. Note that the latter also integrates
|
||||
with Algolia, which is a fully managed server-based solution.
|
||||
At the time of writing, [Just the Docs] and [Docusaurus] use this method
|
||||
for generating search previews. Note that the latter also integrates with
|
||||
Algolia, which is a fully managed server-based solution.
|
||||
|
||||
[^3]:
|
||||
China and Japan are both within the top 5 countries of origin of users of
|
||||
Material for MkDocs.
|
||||
|
||||
[15]: https://github.com/squidfunk/mkdocs-material/blob/master/src/assets/javascripts/templates/search/index.tsx#L90
|
||||
[16]: search-better-faster-smaller/search-preview.png
|
||||
[17]: https://pmarsceill.github.io/just-the-docs/
|
||||
[18]: https://github.com/lelouch77/docusaurus-lunr-search
|
||||
[truncated]: https://github.com/squidfunk/mkdocs-material/blob/master/src/assets/javascripts/templates/search/index.tsx#L90
|
||||
[search preview]: search-better-faster-smaller/search-preview.png
|
||||
[Just the Docs]: https://pmarsceill.github.io/just-the-docs/
|
||||
[Docusaurus]: https://github.com/lelouch77/docusaurus-lunr-search
|
||||
|
||||
2. __Context-awareness__: Although whitespace doesn't work for all languages,
|
||||
one could argue that it could be a good enough solution. Unfortunately, this
|
||||
is not necessarily true for code blocks, as the removal of whitespace might
|
||||
change meaning in some languages.
|
||||
1. __Context-awareness__: Although whitespace doesn't work for all languages,
|
||||
one could argue that it could be a good enough solution. Unfortunately, this
|
||||
is not necessarily true for code blocks, as the removal of whitespace might
|
||||
change meaning in some languages.
|
||||
|
||||
3. __Structure__: Preserving structural information is not a must, but
|
||||
apparently beneficial to build more meaningful search previews which allow
|
||||
for a quick evaluation of relevance. If a word occurrence is part of a code
|
||||
block, it should be rendered as a code block.
|
||||
3. __Structure__: Preserving structural information is not a must, but
|
||||
apparently beneficial to build more meaningful search previews which allow
|
||||
for a quick evaluation of relevance. If a word occurrence is part of a code
|
||||
block, it should be rendered as a code block.
|
||||
|
||||
## What's new?
|
||||
|
||||
@ -291,41 +291,34 @@ into the internals of our new search implementation to see which of the
|
||||
problems it already solves, a quick overview of what features and improvements
|
||||
it brings:
|
||||
|
||||
- __Better__: support for [rich search previews][19], preserving the structural
|
||||
- __Better__: support for [rich search previews], preserving the structural
|
||||
information of code blocks, inline code, and lists, so they are rendered
|
||||
as-is, as well as [lookahead tokenization][20],
|
||||
[more accurate highlighting][21], and improved stability of typeahead. Also,
|
||||
a [slightly better UX][22].
|
||||
as-is, as well as [lookahead tokenization], [more accurate highlighting], and
|
||||
improved stability of typeahead. Also, a [slightly better UX].
|
||||
- __Faster__ and __smaller__: significant decrease in search index size of up
|
||||
to 48% due to improved extraction and construction techniques, resulting in a
|
||||
search experience that is up to 95% faster, which is particularly helpful for
|
||||
large documentation projects.
|
||||
|
||||
_Note that our new search implementation is currently 'Insiders only', which
|
||||
means that it is reserved for sponsors because it's those sponsors that make
|
||||
features like this possible._
|
||||
|
||||
[:octicons-heart-fill-24:{ .mdx-heart } I want to become a sponsor](../../insiders/index.md){ .md-button .md-button--primary }
|
||||
|
||||
[19]: #rich-search-previews
|
||||
[20]: #tokenizer-lookahead
|
||||
[21]: #accurate-highlighting
|
||||
[22]: #user-interface
|
||||
[rich search previews]: #rich-search-previews
|
||||
[lookahead tokenization]: #tokenizer-lookahead
|
||||
[more accurate highlighting]: #accurate-highlighting
|
||||
[slightly better UX]: #user-interface
|
||||
|
||||
### Rich search previews
|
||||
|
||||
As we rebuilt the search plugin from scratch, we reworked the construction of
|
||||
the search index to preserve the structural information of code blocks, inline
|
||||
code, as well as unordered and ordered lists. Using the example from the
|
||||
[search index][23] section, here's how it looks:
|
||||
[search index] section, here's how it looks:
|
||||
|
||||
=== "Now"
|
||||
|
||||
![Search preview now][24]
|
||||
![search preview now]
|
||||
|
||||
=== "Before"
|
||||
|
||||
![Search preview before][25]
|
||||
![search preview before]
|
||||
|
||||
Now, __code blocks are first-class citizens of search previews__, and even
|
||||
inline code formatting is preserved. Let's take a look at the new structure of
|
||||
@ -390,15 +383,15 @@ the search index to understand why:
|
||||
|
||||
If we inspect the search index again, we can see how the situation improved:
|
||||
|
||||
1. __Content is included only once__: the search index does not include the
|
||||
content of the page twice, as only the sections of a page are part of the
|
||||
search index. This leads to a significant reduction in size, fewer bytes to
|
||||
transfer, and a smaller search index.
|
||||
1. __Content is included only once__: the search index does not include the
|
||||
content of the page twice, as only the sections of a page are part of the
|
||||
search index. This leads to a significant reduction in size, fewer bytes to
|
||||
transfer, and a smaller search index.
|
||||
|
||||
2. __Some structure is preserved__: each section of the search index includes a
|
||||
small subset of HTML to provide the necessary structure to allow for more
|
||||
sophisticated search previews. Revisiting our example from before, let's
|
||||
look at an excerpt:
|
||||
2. __Some structure is preserved__: each section of the search index includes
|
||||
a small subset of HTML to provide the necessary structure to allow for more
|
||||
sophisticated search previews. Revisiting our example from before, let's
|
||||
look at an excerpt:
|
||||
|
||||
=== "Now"
|
||||
|
||||
@ -418,15 +411,15 @@ If we inspect the search index again, we can see how the situation improved:
|
||||
|
||||
On to the next step in the process: __tokenization__.
|
||||
|
||||
[23]: #search-index
|
||||
[24]: search-better-faster-smaller/search-preview-now.png
|
||||
[25]: search-better-faster-smaller/search-preview-before.png
|
||||
[search index]: #search-index
|
||||
[search preview now]: search-better-faster-smaller/search-preview-now.png
|
||||
[search preview before]: search-better-faster-smaller/search-preview-before.png
|
||||
|
||||
### Tokenizer lookahead
|
||||
|
||||
The [default tokenizer][12] of [lunr][6] uses a regular expression to split a
|
||||
given string by matching each character against the [separator][11] as defined
|
||||
in `mkdocs.yml`. This doesn't allow for more complex separators based on
|
||||
The [default tokenizer] of [lunr] uses a regular expression to split a given
|
||||
string by matching each character against the [separator] as defined in
|
||||
`mkdocs.yml`. This doesn't allow for more complex separators based on
|
||||
lookahead or multiple characters.
|
||||
|
||||
Fortunately, __our new search implementation provides an advanced tokenizer__
|
||||
@ -443,13 +436,14 @@ characters at which the string should be split, the following three sections
|
||||
explain the remainder of the regular expression.[^4]
|
||||
|
||||
[^4]:
|
||||
As a fun fact: the [separator default value][26] of the search plugin being
|
||||
As a fun fact: the [separator default value] of the search plugin being
|
||||
`[\s\-]+` always has been kind of irritating, as it suggests that multiple
|
||||
characters can be considered being a separator. However, the `+` is
|
||||
completely irrelevant, as regular expression groups involving multiple
|
||||
characters were never supported by [lunr's default tokenizer][12].
|
||||
characters were never supported by
|
||||
[lunr's default tokenizer][default tokenizer].
|
||||
|
||||
[26]: https://www.mkdocs.org/user-guide/configuration/#separator
|
||||
[separator default value]: https://www.mkdocs.org/user-guide/configuration/#separator
|
||||
|
||||
#### Case changes
|
||||
|
||||
@ -470,18 +464,18 @@ character followed by a lowercase character), and has the following behavior:
|
||||
- `camelCase` :octicons-arrow-right-24: `camel`, `Case`
|
||||
- `UPPERCASE` :octicons-arrow-right-24: `UPPERCASE`
|
||||
|
||||
Searching for [:octicons-search-24: searchHighlight][27] now brings up the
|
||||
section discussing the `search.highlight` feature flag, which also demonstrates
|
||||
that this even works for search queries now![^5]
|
||||
Searching for [:octicons-search-24: searchHighlight][q=searchHighlight]
|
||||
now brings up the section discussing the `search.highlight` feature flag, which
|
||||
also demonstrates that this now even works properly for search queries.[^5]
|
||||
|
||||
[^5]:
|
||||
Previously, the search query was not correctly tokenized due to the way
|
||||
[lunr][6] treats wildcards, as it disables the pipeline for search terms
|
||||
that contain wildcards. In order to provide a good typeahead experience,
|
||||
[lunr] treats wildcards, as it disables the pipeline for search terms that
|
||||
contain wildcards. In order to provide a good typeahead experience,
|
||||
Material for MkDocs adds wildcards to the end of each search term not
|
||||
explicitly preceded with `+` or `-`, effectively disabling tokenization.
|
||||
|
||||
[27]: ?q=searchHighlight
|
||||
[q=searchHighlight]: ?q=searchHighlight
|
||||
|
||||
#### Version numbers
|
||||
|
||||
@ -496,10 +490,10 @@ undiscoverable. Thus, the following expression:
|
||||
|
||||
This regular expression matches a `.` only if not immediately followed by a
|
||||
digit `\d`, which leaves version numbers discoverable. Searching for
|
||||
[:octicons-search-24: 7.2.6][28] brings up the [7.2.6][29] release notes.
|
||||
[:octicons-search-24: 7.2.6][q=7.2.6] brings up the [7.2.6] release notes.
|
||||
|
||||
[28]: ?q=7.2.6
|
||||
[29]: ../../changelog/index.md#726-_-september-1-2021
|
||||
[q=7.2.6]: ?q=7.2.6
|
||||
[7.2.6]: ../../changelog/index.md#726-_-september-1-2021
|
||||
|
||||
#### HTML/XML tags
|
||||
|
||||
@ -512,9 +506,9 @@ following expression to the separator allows for just that:
|
||||
&[lg]t;
|
||||
```
|
||||
|
||||
Searching for [:octicons-search-24: custom search worker script][30] brings up
|
||||
the section on [custom search][31] and matches the `script` tag among the other
|
||||
search terms discovered.
|
||||
Searching for [:octicons-search-24: custom search worker script][q=script]
|
||||
brings up the section on [custom search] and matches the `script` tag among the
|
||||
other search terms discovered.
|
||||
|
||||
---
|
||||
|
||||
@ -522,19 +516,19 @@ _We've only just begun to scratch the surface of the new possibilities
|
||||
tokenizer lookahead brings. If you found other useful expressions, you're
|
||||
invited to share them in the comment section._
|
||||
|
||||
[30]: ?q=custom+search+worker+script
|
||||
[31]: ../../setup/setting-up-site-search.md#custom-search
|
||||
[q=script]: ?q=custom+search+worker+script
|
||||
[custom search]: ../../setup/setting-up-site-search.md#custom-search
|
||||
|
||||
### Accurate highlighting
|
||||
|
||||
Highlighting is the last step in the process of search and involves the
|
||||
highlighting of all search term occurrences in a given search result. For a
|
||||
long time, highlighting was implemented through dynamically generated
|
||||
[regular expressions][32].[^6]
|
||||
[regular expressions].[^6]
|
||||
|
||||
This approach has some problems with non-whitespace languages like Japanese or
|
||||
Chinese[^3] since it only works if the highlighted term is at a word boundary.
|
||||
However, Asian languages are tokenized using a [dedicated segmenter][33], which
|
||||
However, Asian languages are tokenized using a [dedicated segmenter], which
|
||||
cannot be modeled with regular expressions.
|
||||
|
||||
[^6]:
|
||||
@ -544,32 +538,33 @@ cannot be modeled with regular expressions.
|
||||
regular expression `(^|<separator>)(search|highlight)`, which only matches
|
||||
at word boundaries.
|
||||
|
||||
Now, as a direct result of the [new tokenization approach][34], __our new
|
||||
search implementation uses token positions for highlighting__, making it
|
||||
exactly as powerful as tokenization:
|
||||
Now, as a direct result of the [new tokenization approach], __our new search
|
||||
implementation uses token positions for highlighting__, making it exactly as
|
||||
powerful as tokenization:
|
||||
|
||||
1. __Word boundaries__: as the new highlighter uses token positions, word
|
||||
boundaries are equal to token boundaries. This means that more complex cases
|
||||
of tokenization (e.g., [case changes][35], [version numbers][36], [HTML/XML
|
||||
tags][37]), are now all highlighted accurately.
|
||||
1. __Word boundaries__: as the new highlighter uses token positions, word
|
||||
boundaries are equal to token boundaries. This means that more complex cases
|
||||
of tokenization (e.g., [case changes], [version numbers], [HTML/XML tags]),
|
||||
are now all highlighted accurately.
|
||||
|
||||
2. __Context-awareness__: as the new search index preserves some of the
|
||||
structural information of the original document, the content of a section is
|
||||
now divided into separate content blocks – paragraphs, code blocks, and
|
||||
lists.
|
||||
2. __Context-awareness__: as the new search index preserves some of the
|
||||
structural information of the original document, the content of a section
|
||||
is now divided into separate content blocks – paragraphs, code blocks, and
|
||||
lists.
|
||||
|
||||
Now, only the content blocks that actually contain occurrences of one of
|
||||
the search terms are considered for inclusion into the search preview. If a
|
||||
term only occurs in a code block, it's the code block that gets rendered,
|
||||
see, for example, the results of [:octicons-search-24: twitter][38].
|
||||
see, for example, the results of
|
||||
[:octicons-search-24: twitter][q=twitter].
|
||||
|
||||
[32]: https://github.com/squidfunk/mkdocs-material/blob/ec7ccd2b2d15dd033740f388912f7be7738feec2/src/assets/javascripts/integrations/search/highlighter/index.ts#L61-L91
|
||||
[33]: http://chasen.org/~taku/software/TinySegmenter/
|
||||
[34]: #tokenizer-lookahead
|
||||
[35]: #case-changes
|
||||
[36]: #version-numbers
|
||||
[37]: #htmlxml-tags
|
||||
[38]: ?q=twitter
|
||||
[regular expressions]: https://github.com/squidfunk/mkdocs-material/blob/ec7ccd2b2d15dd033740f388912f7be7738feec2/src/assets/javascripts/integrations/search/highlighter/index.ts#L61-L91
|
||||
[dedicated segmenter]: http://chasen.org/~taku/software/TinySegmenter/
|
||||
[new tokenization approach]: #tokenizer-lookahead
|
||||
[case changes]: #case-changes
|
||||
[version numbers]: #version-numbers
|
||||
[HTML/XML tags]: #htmlxml-tags
|
||||
[q=twitter]: ?q=twitter
|
||||
|
||||
### Benchmarks
|
||||
|
||||
@ -601,7 +596,7 @@ reach:
|
||||
Smallest value of ten distinct runs.
|
||||
|
||||
[^8]:
|
||||
We agnostically use [KJV Markdown][39] as a tool for testing to learn how
|
||||
We agnostically use [KJV Markdown] as a tool for testing to learn how
|
||||
Material for MkDocs behaves on large corpora, as it's a very large set of
|
||||
Markdown files with over 800k words.
|
||||
|
||||
@ -611,12 +606,12 @@ new search is up to 95% faster__. This is a significant improvement,
|
||||
particularly relevant for large documentation projects.
|
||||
|
||||
While 1,3s still may sound like a long time, using the new client-side search
|
||||
together with [instant loading][40] only creates the search index on the initial
|
||||
together with [instant loading] only creates the search index on the initial
|
||||
page load. When navigating, the search index is preserved across pages, so the
|
||||
cost does only have to be paid once.
|
||||
|
||||
[39]: https://github.com/arleym/kjv-markdown
|
||||
[40]: ../../setup/setting-up-navigation.md#instant-loading
|
||||
[KJV Markdown]: https://github.com/arleym/kjv-markdown
|
||||
[instant loading]: ../../setup/setting-up-navigation.md#instant-loading
|
||||
|
||||
### User interface
|
||||
|
||||
@ -643,8 +638,12 @@ better. Next up:
|
||||
|
||||
If you've made it this far, thank you for your time and interest in Material
|
||||
for MkDocs! This is the first blog article that I decided to write after a
|
||||
short [Twitter survey][41] made me to. You're invited to [leave a comment][42]
|
||||
to share your experiences with the new search implementation.
|
||||
short [Twitter survey] made me to. ~~You're invited to leave a comment
|
||||
to share your experiences with the new search implementation.~~[^9]
|
||||
|
||||
[41]: https://twitter.com/squidfunk/status/1434477478823743488
|
||||
[42]: #__comments
|
||||
[^9]:
|
||||
We've disabled comments due to Disqus' ads being so incredibly horrible
|
||||
and invasive. If you know a better alternative, please ping me at
|
||||
martin.donath@squidfunk.com.
|
||||
|
||||
[Twitter survey]: https://twitter.com/squidfunk/status/1434477478823743488
|
||||
|
@ -4,14 +4,35 @@ search:
|
||||
exclude: true
|
||||
---
|
||||
|
||||
<style>
|
||||
.md-sidebar--secondary:not([hidden]) {
|
||||
visibility: hidden;
|
||||
}
|
||||
</style>
|
||||
|
||||
# Blog
|
||||
|
||||
<h2>Excluding content from search</h2>
|
||||
## [Excluding content from search]
|
||||
|
||||
__The latest Insiders release brings three new simple ways to exclude dedicated
|
||||
parts of a document from the search index, allowing for more fine-grained
|
||||
control.__
|
||||
|
||||
<aside class="mdx-author" markdown>
|
||||
![@squidfunk][@squidfunk avatar]
|
||||
|
||||
<span>__Martin Donath__ · @squidfunk</span>
|
||||
<span>
|
||||
:octicons-calendar-24: September 26, 2021 ·
|
||||
:octicons-clock-24: 5 min read ·
|
||||
[:octicons-tag-24: 7.3.0+insiders-3.1.1](../../insiders/changelog.md#311-_-september-26-2021)
|
||||
</span>
|
||||
</aside>
|
||||
|
||||
[@squidfunk avatar]: https://avatars.githubusercontent.com/u/932156
|
||||
|
||||
---
|
||||
|
||||
Two weeks ago, Material for MkDocs Insiders shipped a brand new search plugin,
|
||||
yielding massive improvements in usability, but also in speed and size of the
|
||||
search index. Interestingly, as discussed in the previous blog article, we only
|
||||
@ -20,17 +41,29 @@ features that enhance the writing experience, allowing for more fine-grained
|
||||
control of what pages, sections and blocks of a Markdown file should be indexed
|
||||
by the built-in search functionality.
|
||||
|
||||
[Continue reading :octicons-arrow-right-24:][2]{ .md-button }
|
||||
[:octicons-arrow-right-24: Continue reading][Excluding content from search]
|
||||
|
||||
[2]: 2021/excluding-content-from-search.md
|
||||
[Excluding content from search]: 2021/excluding-content-from-search.md
|
||||
|
||||
|
||||
<h2>Search: better, faster, smaller</h2>
|
||||
## [Search: better, faster, smaller]
|
||||
|
||||
__This is the story of how we managed to completely rebuild client-side search,
|
||||
delivering a significantly better user experience while making it faster and
|
||||
smaller at the same time.__
|
||||
|
||||
<aside class="mdx-author" markdown>
|
||||
![@squidfunk][@squidfunk avatar]
|
||||
|
||||
<span>__Martin Donath__ · @squidfunk</span>
|
||||
<span>
|
||||
:octicons-calendar-24: September 13, 2021 ·
|
||||
:octicons-clock-24: 15 min read ·
|
||||
[:octicons-tag-24: 7.2.6+insiders-3.0.0](../../insiders/changelog.md#300-_-september-13-2021)
|
||||
</span>
|
||||
</aside>
|
||||
|
||||
---
|
||||
|
||||
The search of Material for MkDocs is by far one of its best and most-loved
|
||||
assets: multilingual, offline-capable, and most importantly: _all client-side_.
|
||||
It provides a solution to empower the users of your documentation to find what
|
||||
@ -41,6 +74,6 @@ integration from the ground up. This article shines some light on the internals
|
||||
of the new search, why it's much more powerful than the previous version, and
|
||||
what's about to come.
|
||||
|
||||
[Continue reading :octicons-arrow-right-24:][1]{ .md-button }
|
||||
[:octicons-arrow-right-24: Continue reading][Search: better, faster, smaller]
|
||||
|
||||
[1]: 2021/search-better-faster-smaller.md
|
||||
[Search: better, faster, smaller]: 2021/search-better-faster-smaller.md
|
||||
|
Loading…
x
Reference in New Issue
Block a user