mirror of
https://github.com/squidfunk/mkdocs-material.git
synced 2024-06-14 11:52:32 +03:00
Updated blog article
This commit is contained in:
@@ -1,7 +1,7 @@
|
|||||||
---
|
---
|
||||||
template: overrides/main.html
|
template: overrides/main.html
|
||||||
description: >
|
description: >
|
||||||
How we rebuilt client-side search, delivering a better user experience, while
|
How we rebuilt client-side search, delivering a better user experience while
|
||||||
making it faster and smaller at the same time
|
making it faster and smaller at the same time
|
||||||
disqus: mkdocs-material
|
disqus: mkdocs-material
|
||||||
search:
|
search:
|
||||||
@@ -11,7 +11,7 @@ search:
|
|||||||
# Search: better, faster, smaller
|
# Search: better, faster, smaller
|
||||||
|
|
||||||
__This is the story of how we managed to completely rebuild client-side search,
|
__This is the story of how we managed to completely rebuild client-side search,
|
||||||
delivering a significantly better user experience, while making it faster and
|
delivering a significantly better user experience while making it faster and
|
||||||
smaller at the same time.__
|
smaller at the same time.__
|
||||||
|
|
||||||
<aside class="mdx-author" markdown="1">
|
<aside class="mdx-author" markdown="1">
|
||||||
@@ -29,14 +29,14 @@ smaller at the same time.__
|
|||||||
---
|
---
|
||||||
|
|
||||||
The [search][2] of Material for MkDocs is by far one of its best and most-loved
|
The [search][2] of Material for MkDocs is by far one of its best and most-loved
|
||||||
assets: [multilingual][3], [offline-capable][4] and most importantly: _all
|
assets: [multilingual][3], [offline-capable][4], and most importantly: _all
|
||||||
client-side_. It provides a solution to empower the users of your documentation
|
client-side_. It provides a solution to empower the users of your documentation
|
||||||
to find what they're searching for instantly without the headache of managing
|
to find what they're searching for instantly without the headache of managing
|
||||||
additional servers. However, even though several iterations have been made,
|
additional servers. However, even though several iterations have been made,
|
||||||
there's still some room for improvement, which is why we rebuilt the search
|
there's still some room for improvement, which is why we rebuilt the search
|
||||||
plugin and integration from the ground up. This article shines some light on the
|
plugin and integration from the ground up. This article shines some light on the
|
||||||
internals of the new search, why it's much more powerful than the previous
|
internals of the new search, why it's much more powerful than the previous
|
||||||
version and what's about to come.
|
version, and what's about to come.
|
||||||
|
|
||||||
_The next section discusses the architecture and issues of the current search
|
_The next section discusses the architecture and issues of the current search
|
||||||
implementation. If you immediately want to learn what's new, skip to the
|
implementation. If you immediately want to learn what's new, skip to the
|
||||||
@@ -70,7 +70,7 @@ const index$ = document.forms.namedItem("search")
|
|||||||
### Search index
|
### Search index
|
||||||
|
|
||||||
The search index includes a stripped-down version of all pages. Let's take a
|
The search index includes a stripped-down version of all pages. Let's take a
|
||||||
look at an example, to understand precisely what the search index contains from
|
look at an example to understand precisely what the search index contains from
|
||||||
the original Markdown file:
|
the original Markdown file:
|
||||||
|
|
||||||
??? example "Expand to inspect example"
|
??? example "Expand to inspect example"
|
||||||
@@ -145,9 +145,9 @@ the original Markdown file:
|
|||||||
|
|
||||||
If we inspect the search index, we immediately see several problems:
|
If we inspect the search index, we immediately see several problems:
|
||||||
|
|
||||||
1. __All content is included twice__: the search index includes one entry
|
1. __All content is included twice__: the search index contains one entry
|
||||||
with the entire contents of the page, and one entry for each section of
|
with the entire contents of the page, and one entry for each section of
|
||||||
the page, i.e. each block preceded by a headline or subheadline. This
|
the page, i.e., each block preceded by a headline or subheadline. This
|
||||||
significantly contributes to the size of the search index.
|
significantly contributes to the size of the search index.
|
||||||
|
|
||||||
2. __All structure is lost__: when the search index is built, all structural
|
2. __All structure is lost__: when the search index is built, all structural
|
||||||
@@ -164,14 +164,14 @@ If we inspect the search index, we immediately see several problems:
|
|||||||
Furthermore, it's not clear that `Lists` is a headline as it's merged
|
Furthermore, it's not clear that `Lists` is a headline as it's merged
|
||||||
with the code block before and the paragraph after it.
|
with the code block before and the paragraph after it.
|
||||||
|
|
||||||
- __Punctuation__: inline elements like links, that are immediately followed
|
- __Punctuation__: inline elements like links that are immediately followed
|
||||||
by punctuation are separated by whitespace (see `,` and `:` in the
|
by punctuation are separated by whitespace (see `,` and `:` in the
|
||||||
excerpt). This is because all extracted text is joined with a whitespace
|
excerpt). This is because all extracted text is joined with a whitespace
|
||||||
character during the construction of the search index.
|
character during the construction of the search index.
|
||||||
|
|
||||||
It's not difficult to see that it can be quite challenging to implement a good
|
It's not difficult to see that it can be quite challenging to implement a good
|
||||||
search experience for theme authors, which is why Material for MkDocs (up to
|
search experience for theme authors, which is why Material for MkDocs (up to
|
||||||
now) did some [monkey patching][9] to be able to render slighltly more
|
now) did some [monkey patching][9] to be able to render slightly more
|
||||||
meaningful search previews.
|
meaningful search previews.
|
||||||
|
|
||||||
### Search worker
|
### Search worker
|
||||||
@@ -186,7 +186,7 @@ initialized, the following steps are taken:
|
|||||||
reported in #904 and, after some back and forth, fixed and released in
|
reported in #904 and, after some back and forth, fixed and released in
|
||||||
version 5.0.
|
version 5.0.
|
||||||
|
|
||||||
1. __Linking sections with pages__: The search index is parsed and each section
|
1. __Linking sections with pages__: The search index is parsed, and each section
|
||||||
is linked to its parent page. The parent page itself is _not indexed_, as it
|
is linked to its parent page. The parent page itself is _not indexed_, as it
|
||||||
would lead to duplicate results, so only the sections remain. Linking is
|
would lead to duplicate results, so only the sections remain. Linking is
|
||||||
necessary, as search results are grouped by page.
|
necessary, as search results are grouped by page.
|
||||||
@@ -206,8 +206,8 @@ initialized, the following steps are taken:
|
|||||||
main thread.
|
main thread.
|
||||||
|
|
||||||
Now, that's basically how the search worker operates. Sure, there's a little
|
Now, that's basically how the search worker operates. Sure, there's a little
|
||||||
more magic involved, e.g. search results are [post-processed][13] and
|
more magic involved, e.g., search results are [post-processed][13] and
|
||||||
[rescored][14] to account for some shortcomings of [lunr][6], but in general
|
[rescored][14] to account for some shortcomings of [lunr][6], but in general,
|
||||||
this is how data gets into and out of the index.
|
this is how data gets into and out of the index.
|
||||||
|
|
||||||
[9]: https://github.com/squidfunk/mkdocs-material/blob/ec7ccd2b2d15dd033740f388912f7be7738feec2/src/assets/javascripts/integrations/search/document/index.ts#L68-L71
|
[9]: https://github.com/squidfunk/mkdocs-material/blob/ec7ccd2b2d15dd033740f388912f7be7738feec2/src/assets/javascripts/integrations/search/document/index.ts#L68-L71
|
||||||
@@ -225,7 +225,7 @@ occurrences of the search terms found is an essential part of a great search
|
|||||||
experience.
|
experience.
|
||||||
|
|
||||||
This is where the current search preview generation falls short, as some of the
|
This is where the current search preview generation falls short, as some of the
|
||||||
search previews appear to not include any occurrence of any of the search
|
search previews appear not to include any occurrence of any of the search
|
||||||
terms. This was due to the fact that search previews were [truncated after a
|
terms. This was due to the fact that search previews were [truncated after a
|
||||||
maximum of 320 characters][15], as can be seen here:
|
maximum of 320 characters][15], as can be seen here:
|
||||||
|
|
||||||
@@ -251,7 +251,7 @@ carefully considered:
|
|||||||
preview might look like this:
|
preview might look like this:
|
||||||
|
|
||||||
```
|
```
|
||||||
… channels, e.g. or which can be configured via mkdocs.yml …
|
… channels, e.g., or which can be configured via mkdocs.yml …
|
||||||
```
|
```
|
||||||
|
|
||||||
While this may work for languages that use whitespace as a separator
|
While this may work for languages that use whitespace as a separator
|
||||||
@@ -273,8 +273,8 @@ carefully considered:
|
|||||||
[17]: https://pmarsceill.github.io/just-the-docs/
|
[17]: https://pmarsceill.github.io/just-the-docs/
|
||||||
[18]: https://github.com/lelouch77/docusaurus-lunr-search
|
[18]: https://github.com/lelouch77/docusaurus-lunr-search
|
||||||
|
|
||||||
2. __Context awareness__: Although whitespace doesn't work for all languages,
|
2. __Context-awareness__: Although whitespace doesn't work for all languages,
|
||||||
one could argue that it could be a good-enough solution. Unfortunately, this
|
one could argue that it could be a good enough solution. Unfortunately, this
|
||||||
is not necessarily true for code blocks, as the removal of whitespace might
|
is not necessarily true for code blocks, as the removal of whitespace might
|
||||||
change meaning in some languages.
|
change meaning in some languages.
|
||||||
|
|
||||||
@@ -291,7 +291,7 @@ problems it already solves, a quick overview of what features and improvements
|
|||||||
it brings:
|
it brings:
|
||||||
|
|
||||||
- __Better__: support for [rich search previews][19], preserving the structural
|
- __Better__: support for [rich search previews][19], preserving the structural
|
||||||
information of code blocks, inline code and lists, so they are rendered
|
information of code blocks, inline code, and lists, so they are rendered
|
||||||
as-is, as well as [lookahead tokenization][20],
|
as-is, as well as [lookahead tokenization][20],
|
||||||
[more accurate highlighting][21], and improved stability of typeahead. Also,
|
[more accurate highlighting][21], and improved stability of typeahead. Also,
|
||||||
a [slightly better UX][22].
|
a [slightly better UX][22].
|
||||||
@@ -301,7 +301,7 @@ it brings:
|
|||||||
large documentation projects.
|
large documentation projects.
|
||||||
|
|
||||||
_Note that our new search implementation is currently 'Insiders only', which
|
_Note that our new search implementation is currently 'Insiders only', which
|
||||||
means that it is reserved for sponsors, because it's those sponsors that make
|
means that it is reserved for sponsors because it's those sponsors that make
|
||||||
features like this possible._
|
features like this possible._
|
||||||
|
|
||||||
[:octicons-heart-fill-24:{ .mdx-heart } I want to become a sponsor](../../insiders/index.md){ .md-button .md-button--primary }
|
[:octicons-heart-fill-24:{ .mdx-heart } I want to become a sponsor](../../insiders/index.md){ .md-button .md-button--primary }
|
||||||
@@ -392,7 +392,7 @@ If we inspect the search index again, we can see how the situation improved:
|
|||||||
1. __Content is included only once__: the search index does not include the
|
1. __Content is included only once__: the search index does not include the
|
||||||
content of the page twice, as only the sections of a page are part of the
|
content of the page twice, as only the sections of a page are part of the
|
||||||
search index. This leads to a significant reduction in size, fewer bytes to
|
search index. This leads to a significant reduction in size, fewer bytes to
|
||||||
transfer and a smaller search index.
|
transfer, and a smaller search index.
|
||||||
|
|
||||||
2. __Some structure is preserved__: each section of the search index includes a
|
2. __Some structure is preserved__: each section of the search index includes a
|
||||||
small subset of HTML to provide the necessary structure to allow for more
|
small subset of HTML to provide the necessary structure to allow for more
|
||||||
@@ -424,13 +424,13 @@ On to the next step in the process: __tokenization__.
|
|||||||
### Tokenizer lookahead
|
### Tokenizer lookahead
|
||||||
|
|
||||||
The [default tokenizer][12] of [lunr][6] uses a regular expression to split a
|
The [default tokenizer][12] of [lunr][6] uses a regular expression to split a
|
||||||
given string, by matching each character against the [separator][11] as defined
|
given string by matching each character against the [separator][11] as defined
|
||||||
in `mkdocs.yml`. This doesn't allow for more complex separators based on
|
in `mkdocs.yml`. This doesn't allow for more complex separators based on
|
||||||
lookahead or multiple characters.
|
lookahead or multiple characters.
|
||||||
|
|
||||||
Fortunately, __our new search implementation provides an advanced tokenizer__
|
Fortunately, __our new search implementation provides an advanced tokenizer__
|
||||||
that doesn't have these shortcomings and supports more complex regular
|
that doesn't have these shortcomings and supports more complex regular
|
||||||
expressions. As a result, Material for MkDocs just changed it's own separator
|
expressions. As a result, Material for MkDocs just changed its own separator
|
||||||
configuration to the following value:
|
configuration to the following value:
|
||||||
|
|
||||||
```
|
```
|
||||||
@@ -461,8 +461,8 @@ the separator, this can now be achieved with ease:
|
|||||||
(?!\b)(?=[A-Z][a-z])
|
(?!\b)(?=[A-Z][a-z])
|
||||||
```
|
```
|
||||||
|
|
||||||
This regular expression is a combination of a negative lookahead (`\b`, i.e.
|
This regular expression is a combination of a negative lookahead (`\b`, i.e.,
|
||||||
not a word boundary) and a positive lookahead (`[A-Z][a-z]`, i.e. an uppercase
|
not a word boundary) and a positive lookahead (`[A-Z][a-z]`, i.e., an uppercase
|
||||||
character followed by a lowercase character), and has the following behavior:
|
character followed by a lowercase character), and has the following behavior:
|
||||||
|
|
||||||
- `PascalCase` :octicons-arrow-right-24: `Pascal`, `Case`
|
- `PascalCase` :octicons-arrow-right-24: `Pascal`, `Case`
|
||||||
@@ -493,8 +493,8 @@ undiscoverable. Thus, the following expression:
|
|||||||
\.(?!\d)
|
\.(?!\d)
|
||||||
```
|
```
|
||||||
|
|
||||||
This regular expression matches a `.`, but not immediately followed by a digit
|
This regular expression matches a `.` only if not immediately followed by a
|
||||||
`\d`, which leaves version numbers discoverable. Searching for
|
digit `\d`, which leaves version numbers discoverable. Searching for
|
||||||
[:octicons-search-24: 7.2.6][28] brings up the [7.2.6][29] release notes.
|
[:octicons-search-24: 7.2.6][28] brings up the [7.2.6][29] release notes.
|
||||||
|
|
||||||
[28]: ?q=7.2.6
|
[28]: ?q=7.2.6
|
||||||
@@ -532,13 +532,13 @@ long time, highlighting was implemented through dynamically generated
|
|||||||
[regular expressions][32].[^6]
|
[regular expressions][32].[^6]
|
||||||
|
|
||||||
This approach has some problems with non-whitespace languages like Japanese or
|
This approach has some problems with non-whitespace languages like Japanese or
|
||||||
Chinese[^3], since it only works if the highlighted term is at a word boundary.
|
Chinese[^3] since it only works if the highlighted term is at a word boundary.
|
||||||
However, Asian languages are tokenized using a [dedicated segmenter][33], which
|
However, Asian languages are tokenized using a [dedicated segmenter][33], which
|
||||||
cannot be modelled with regular expressions.
|
cannot be modeled with regular expressions.
|
||||||
|
|
||||||
[^6]:
|
[^6]:
|
||||||
Using the separator as defined in `mkdocs.yml`, a regular expression was
|
Using the separator as defined in `mkdocs.yml`, a regular expression was
|
||||||
constructed that was trying to mimick the tokenizer. As an example, the
|
constructed that was trying to mimic the tokenizer. As an example, the
|
||||||
search query `search highlight` was transformed into the rather cumbersome
|
search query `search highlight` was transformed into the rather cumbersome
|
||||||
regular expression `(^|<separator>)(search|highlight)`, which only matches
|
regular expression `(^|<separator>)(search|highlight)`, which only matches
|
||||||
at word boundaries.
|
at word boundaries.
|
||||||
@@ -549,19 +549,20 @@ exactly as powerful as tokenization:
|
|||||||
|
|
||||||
1. __Word boundaries__: as the new highlighter uses token positions, word
|
1. __Word boundaries__: as the new highlighter uses token positions, word
|
||||||
boundaries are equal to token boundaries. This means that more complex cases
|
boundaries are equal to token boundaries. This means that more complex cases
|
||||||
of tokenization (e.g. [case changes][35], [version numbers][36], [HTML/XML
|
of tokenization (e.g., [case changes][35], [version numbers][36], [HTML/XML
|
||||||
tags][37]), are now all highlighted accurately.
|
tags][37]), are now all highlighted accurately.
|
||||||
|
|
||||||
1. __Context awareness__: as the new search index preserves some of the
|
2. __Context-awareness__: as the new search index preserves some of the
|
||||||
structural information of the original document, the content of a section is
|
structural information of the original document, the content of a section is
|
||||||
now divided into separate content blocks – paragraphs, code blocks and lists.
|
now divided into separate content blocks – paragraphs, code blocks, and
|
||||||
|
lists.
|
||||||
|
|
||||||
Now, only the content blocks that actually contain occurrences of one of
|
Now, only the content blocks that actually contain occurrences of one of
|
||||||
the search terms are considered for inclusion into the search preview. If a
|
the search terms are considered for inclusion into the search preview. If a
|
||||||
term only occurs in a code block, it's the code block that gets rendered,
|
term only occurs in a code block, it's the code block that gets rendered,
|
||||||
see for example the results of [:octicons-search-24: twitter][38].
|
see, for example, the results of [:octicons-search-24: twitter][38].
|
||||||
|
|
||||||
[32]: https://github.com/squidfunk/mkdocs-material/blob/ec7ccd2b2d15dd033740f388912f7be7738feec2/src/assets/javascripts/integrations/search/highlighter/index.ts#L61-91
|
[32]: https://github.com/squidfunk/mkdocs-material/blob/ec7ccd2b2d15dd033740f388912f7be7738feec2/src/assets/javascripts/integrations/search/highlighter/index.ts#L61-L91
|
||||||
[33]: http://chasen.org/~taku/software/TinySegmenter/
|
[33]: http://chasen.org/~taku/software/TinySegmenter/
|
||||||
[34]: #tokenizer-lookahead
|
[34]: #tokenizer-lookahead
|
||||||
[35]: #case-changes
|
[35]: #case-changes
|
||||||
@@ -607,7 +608,7 @@ new search is up to 95% faster__. This is a significant improvement,
|
|||||||
particularly relevant for large documentation projects.
|
particularly relevant for large documentation projects.
|
||||||
|
|
||||||
While 1,3s still may sound like a long time, using the new client-side search
|
While 1,3s still may sound like a long time, using the new client-side search
|
||||||
together with [instant loading][40] only initializes the search on the first
|
together with [instant loading][40] only initializes the search on the initial
|
||||||
page load. When navigating, the search index is preserved across pages, so the
|
page load. When navigating, the search index is preserved across pages, so the
|
||||||
cost does only have to be paid once.
|
cost does only have to be paid once.
|
||||||
|
|
||||||
@@ -628,7 +629,7 @@ solves some long-standing issues which needed to be tackled for years. Yet,
|
|||||||
it's only the start of a search experience that is going to get better and
|
it's only the start of a search experience that is going to get better and
|
||||||
better. Next up:
|
better. Next up:
|
||||||
|
|
||||||
- __Context aware search summarization__: currently, the first two matching
|
- __Context-aware search summarization__: currently, the first two matching
|
||||||
content blocks are rendered as a search preview. With the new tokenization
|
content blocks are rendered as a search preview. With the new tokenization
|
||||||
technique, we laid the groundwork for more sophisticated shortening and
|
technique, we laid the groundwork for more sophisticated shortening and
|
||||||
summarization methods, which we're tackling next.
|
summarization methods, which we're tackling next.
|
||||||
@@ -638,7 +639,7 @@ better. Next up:
|
|||||||
a better experience. We'll explore some of those paths in the future.
|
a better experience. We'll explore some of those paths in the future.
|
||||||
|
|
||||||
If you've made it this far, thank you for your time and interest in Material
|
If you've made it this far, thank you for your time and interest in Material
|
||||||
for MkDocs! This is the first blog article which I decided to write after a
|
for MkDocs! This is the first blog article that I decided to write after a
|
||||||
short [Twitter survey][41] made me to. Feel free to leave a comment when you
|
short [Twitter survey][41] made me to. Feel free to leave a comment when you
|
||||||
have something to say.
|
have something to say.
|
||||||
|
|
||||||
|
|||||||
@@ -9,17 +9,17 @@ search:
|
|||||||
<h2>Search: better, faster, smaller</h2>
|
<h2>Search: better, faster, smaller</h2>
|
||||||
|
|
||||||
__This is the story of how we managed to completely rebuild client-side search,
|
__This is the story of how we managed to completely rebuild client-side search,
|
||||||
delivering a significantly better user experience, while making it faster and
|
delivering a significantly better user experience while making it faster and
|
||||||
smaller at the same time.__
|
smaller at the same time.__
|
||||||
|
|
||||||
The search of Material for MkDocs is by far one of its best and most-loved
|
The search of Material for MkDocs is by far one of its best and most-loved
|
||||||
assets: multilingual, offline-capable and most importantly: _all client-side_.
|
assets: multilingual, offline-capable, and most importantly: _all client-side_.
|
||||||
It provides a solution to empower the users of your documentation to find what
|
It provides a solution to empower the users of your documentation to find what
|
||||||
they're searching for instantly without the headache of managing additional
|
they're searching for instantly without the headache of managing additional
|
||||||
servers. However, even though several iterations have been made, there's still
|
servers. However, even though several iterations have been made, there's still
|
||||||
some room for improvement, which is why we rebuilt the search plugin and
|
some room for improvement, which is why we rebuilt the search plugin and
|
||||||
integration from the ground up. This article shines some light on the internals
|
integration from the ground up. This article shines some light on the internals
|
||||||
of the new search, why it's much more powerful than the previous version and
|
of the new search, why it's much more powerful than the previous version, and
|
||||||
what's about to come.
|
what's about to come.
|
||||||
|
|
||||||
[Continue reading :octicons-arrow-right-24:][1]{ .md-button }
|
[Continue reading :octicons-arrow-right-24:][1]{ .md-button }
|
||||||
|
|||||||
Reference in New Issue
Block a user