mkdocs-material/docs/setup/setting-up-site-search.md

590 lines
18 KiB
Markdown
Raw Normal View History

2020-07-20 15:18:09 +02:00
---
template: overrides/main.html
search:
boost: 1.05
2020-07-20 15:18:09 +02:00
---
# Setting up site search
2022-01-25 20:06:53 +01:00
Material for MkDocs provides an excellent client-side search implementation,
2020-07-22 19:11:22 +02:00
omitting the need for the integration of third-party services, which might
2022-01-25 20:06:53 +01:00
not be compliant with privacy regulations. Moreover, search even works
[offline], allowing users to download your documentation.
2020-07-20 15:18:09 +02:00
2022-02-27 17:07:10 +01:00
[offline]: building-for-offline-usage.md
2020-07-20 15:18:09 +02:00
## Configuration
2022-02-27 13:19:44 +01:00
### Built-in search plugin
2020-07-20 15:18:09 +02:00
2022-09-11 19:25:40 +02:00
[:octicons-tag-24: 0.1.0][Search support] ·
2021-10-11 13:38:03 +02:00
:octicons-cpu-24: Plugin
2021-09-13 18:14:28 +02:00
2021-10-11 13:38:03 +02:00
The built-in search plugin integrates seamlessly with Material for MkDocs,
adding multilingual client-side search with [lunr] and [lunr-languages]. It's
enabled by default, but must be re-added to `mkdocs.yml` when other plugins
2020-07-20 15:18:09 +02:00
are used:
``` yaml
plugins:
- search
```
2021-10-11 13:38:03 +02:00
The following configuration options are supported:
2020-07-20 15:18:09 +02:00
2022-09-11 19:25:40 +02:00
[`lang`](#+search.lang){ #+search.lang }
2020-07-20 15:18:09 +02:00
2021-10-10 19:22:13 +02:00
: :octicons-milestone-24: Default: _automatically set_ This option allows
2021-10-11 13:38:03 +02:00
to include the language-specific stemmers provided by [lunr-languages].
2020-07-20 15:18:09 +02:00
Note that Material for MkDocs will set this automatically based on the
2021-10-11 13:38:03 +02:00
[site language], but it may be overridden, e.g. to support multiple
2020-07-20 15:18:09 +02:00
languages:
=== "A single language"
``` yaml
plugins:
- search:
lang: ru
```
=== "Multiple languages"
``` yaml
plugins:
- search:
2021-12-11 14:30:07 +01:00
lang: # (1)!
2020-07-20 15:18:09 +02:00
- en
- ru
```
2021-10-11 13:38:03 +02:00
1. Be aware that including support for other languages increases the
general JavaScript payload by around 20kb (before `gzip`) and by
another 15-30kb per language.
2020-07-20 15:18:09 +02:00
The following languages are supported:
2021-10-04 23:36:31 +02:00
<div class="mdx-columns" markdown>
2020-12-21 17:38:58 +01:00
- `ar` Arabic
- `da` Danish
2022-01-29 16:11:11 +01:00
- `de` German
2020-12-21 17:38:58 +01:00
- `du` Dutch
- `en` English
2022-01-29 16:11:11 +01:00
- `es` Spanish
2020-12-21 17:38:58 +01:00
- `fi` Finnish
- `fr` French
- `hu` Hungarian
- `it` Italian
- `ja` Japanese
- `no` Norwegian
- `pt` Portuguese
- `ro` Romanian
- `ru` Russian
- `sv` Swedish
- `th` Thai
- `tr` Turkish
- `vi` Vietnamese
</div>
2020-07-20 15:18:09 +02:00
2021-10-11 13:38:03 +02:00
Material for MkDocs goes to great lengths to support languages that are not
part of this list by automatically falling back to the stemmer yielding the
best result.
2020-07-20 15:18:09 +02:00
2022-09-11 19:25:40 +02:00
[`separator`](#+search.separator){ #+search.separator }
2020-07-20 15:18:09 +02:00
2021-10-10 19:22:13 +02:00
: :octicons-milestone-24: Default: _automatically set_ The separator for
2020-07-26 14:46:09 +02:00
indexing and query tokenization can be customized, making it possible to
index parts of words separated by other characters than whitespace and `-`,
e.g. by including `.`:
2020-07-20 15:18:09 +02:00
``` yaml
plugins:
- search:
separator: '[\s\-\.]' # (1)!
2020-07-20 15:18:09 +02:00
```
2021-10-11 13:38:03 +02:00
1. Tokenization itself is carried out by [lunr's default tokenizer], which
2022-06-05 18:16:51 +02:00
doesn't allow for lookahead or multi-character separators. For more
finegrained control over the tokenization process, see the section on
[tokenizer lookahead].
2021-10-11 13:38:03 +02:00
<div class="mdx-deprecated" markdown>
2022-09-11 19:25:40 +02:00
[`prebuild_index`](#+search.prebuild_index){ #+search.prebuild_index }
2020-07-20 15:18:09 +02:00
: [:octicons-tag-24: 5.0.0][prebuilt index support] · :octicons-archive-24:
Deprecated · :octicons-trash-24: 8.0.0 · :octicons-milestone-24: Default:
`false` MkDocs can generate a [prebuilt index] of all pages during
2020-07-20 15:18:09 +02:00
build time, which provides performance improvements at the cost of more
bandwidth, as it reduces the build time of the search index:
``` yaml
plugins:
- search:
prebuild_index: true
```
2021-10-02 15:53:47 +02:00
Note that this configuration option was removed, as the [new search
2021-10-11 13:38:03 +02:00
plugin] generates up to [50% smaller] search indexes, doubling search
performance.
[:octicons-arrow-right-24: Read more on the new search plugin]
[new search plugin]
</div>
2022-09-11 19:25:40 +02:00
[Search support]: https://github.com/squidfunk/mkdocs-material/releases/tag/0.1.0
2021-10-11 13:38:03 +02:00
[lunr]: https://lunrjs.com
[lunr-languages]: https://github.com/MihaiValentin/lunr-languages
[lunr's default tokenizer]: https://github.com/olivernn/lunr.js/blob/aa5a878f62a6bba1e8e5b95714899e17e8150b38/lunr.js#L413-L456
[site language]: changing-the-language.md#site-language
[tokenizer lookahead]: #tokenizer-lookahead
[prebuilt index support]: https://github.com/squidfunk/mkdocs-material/releases/tag/5.0.0
2021-10-11 13:38:03 +02:00
[prebuilt index]: https://www.mkdocs.org/user-guide/configuration/#prebuild_index
2022-09-11 19:25:40 +02:00
[50% smaller]: ../blog/posts/search-better-faster-smaller.md#benchmarks
2021-10-11 13:38:03 +02:00
2022-06-05 18:16:51 +02:00
#### Chinese language support
[:octicons-heart-fill-24:{ .mdx-heart } Sponsors only][Insiders]{ .mdx-insiders } ·
[:octicons-tag-24: insiders-4.14.0][Insiders] ·
:octicons-beaker-24: Experimental
[Insiders] adds search support for the Chinese language (see our [blog article]
[chinese search] from May 2022) by integrating with the text segmentation
library [jieba], which can be installed with `pip`.
``` sh
pip install jieba
```
If [jieba] is installed, the [built-in search plugin] automatically detects
Chinese characters and runs them through the segmenter. The following
configuration options are available:
2022-09-11 19:25:40 +02:00
[`jieba_dict`](#+search.jieba_dict){ #+search.jieba_dict }
2022-06-05 18:16:51 +02:00
: [:octicons-tag-24: insiders-4.17.2][Insiders] · :octicons-milestone-24:
Default: _none_ This option allows for specifying a [custom dictionary]
to be used by [jieba] for segmenting text, replacing the default dictionary:
``` yaml
plugins:
- search:
jieba_dict: dict.txt # (1)!
```
1. The following alternative dictionaries are provided by [jieba]:
- [dict.txt.small] 占用内存较小的词典文件
- [dict.txt.big] 支持繁体分词更好的词典文件
2022-09-11 19:25:40 +02:00
[`jieba_dict_user`](#+search.jieba_dict_user){ #+search.jieba_dict_user }
2022-06-05 18:16:51 +02:00
: [:octicons-tag-24: insiders-4.17.2][Insiders] · :octicons-milestone-24:
Default: _none_ This option allows for specifying an additional
[user dictionary] to be used by [jieba] for segmenting text, augmenting the
default dictionary:
``` yaml
plugins:
- search:
jieba_dict_user: user_dict.txt
```
User dictionaries can be used for tuning the segmenter to preserve
technical terms.
2022-09-11 19:25:40 +02:00
[chinese search]: ../blog/posts/chinese-search-support.md
2022-06-05 18:16:51 +02:00
[jieba]: https://pypi.org/project/jieba/
[built-in search plugin]: #built-in-search-plugin
[custom dictionary]: https://github.com/fxsjy/jieba#%E5%85%B6%E4%BB%96%E8%AF%8D%E5%85%B8
[dict.txt.small]: https://github.com/fxsjy/jieba/raw/master/extra_dict/dict.txt.small
[dict.txt.big]: https://github.com/fxsjy/jieba/raw/master/extra_dict/dict.txt.big
[user dictionary]: https://github.com/fxsjy/jieba#%E8%BD%BD%E5%85%A5%E8%AF%8D%E5%85%B8
2021-10-11 13:38:03 +02:00
### Rich search previews
2022-05-05 13:33:14 +02:00
[:octicons-heart-fill-24:{ .mdx-heart } Sponsors only][Insiders]{ .mdx-insiders } ·
2021-10-11 13:38:03 +02:00
[:octicons-tag-24: insiders-3.0.0][Insiders] ·
:octicons-beaker-24: Experimental
2022-06-05 18:16:51 +02:00
[Insiders] ships rich search previews as part of the [new search plugin], which
2021-10-11 13:38:03 +02:00
will render code blocks directly in the search result, and highlight all
occurrences inside those blocks:
=== "Insiders"
![search preview now]
=== "Material for MkDocs"
![search preview before]
[Insiders]: ../insiders/index.md
2022-09-11 19:25:40 +02:00
[new search plugin]: ../blog/posts/search-better-faster-smaller.md
[search preview now]: ../blog/posts/search-better-faster-smaller/search-preview-now.png
[search preview before]: ../blog/posts/search-better-faster-smaller/search-preview-before.png
2021-10-11 13:38:03 +02:00
### Tokenizer lookahead
2022-05-05 13:33:14 +02:00
[:octicons-heart-fill-24:{ .mdx-heart } Sponsors only][Insiders]{ .mdx-insiders } ·
2021-10-11 13:38:03 +02:00
[:octicons-tag-24: insiders-3.0.0][Insiders] ·
:octicons-beaker-24: Experimental
2022-06-05 18:16:51 +02:00
[Insiders] allows for more complex configurations of the [`separator`][separator]
2021-10-11 13:38:03 +02:00
setting as part of the [new search plugin], yielding more influence on the way
documents are tokenized:
``` yaml
plugins:
- search:
separator: '[\s\-,:!=\[\]()"/]+|\.(?!\d)|&[lg]t;|(?!\b)(?=[A-Z][a-z])'
2021-10-11 13:38:03 +02:00
```
The following section explains what can be achieved with tokenizer lookahead:
=== "Case changes"
```
(?!\b)(?=[A-Z][a-z])
```
`PascalCase` and `camelCase` are used as naming conventions in many
programming languages. By adding this match group to the [`separator`]
[separator], [words are split at case changes], tokenizing the word
`PascalCase` into `Pascal` and `Case`, so both terms can be searched
individually.
[:octicons-arrow-right-24: Read more on tokenizing case changes]
[tokenize case changes]
=== "Version numbers"
```
\.(?!\d)
```
When `.` is added to the [`separator`][separator], version numbers would be
split into parts, rendering them undiscoverable via search. By adding
this match group, a small lookahead is introduced, so version numbers will
remain as they are, and can be found through search.
2020-07-20 15:18:09 +02:00
2021-10-11 13:38:03 +02:00
[:octicons-arrow-right-24: Read more on tokenizing version numbers]
[tokenize version numbers]
2021-10-02 15:53:47 +02:00
2021-10-11 13:38:03 +02:00
=== "HTML/XML tags"
2020-07-20 15:18:09 +02:00
2021-10-11 13:38:03 +02:00
```
&[lg]t;
```
If your documentation includes HTML/XML code examples, you may want to allow
users to find specific tag names. Unfortunately, the `<` and `>` control
characters are encoded in code blocks as `&lt;` and `&gt;`. Adding this
expression to the separator allows for just that.
[:octicons-arrow-right-24: Read more on tokenizing HTML/XML tags]
[tokenize html-xml tags]
[separator]: #search-separator
[words are split at case changes]: ?q=searchHighlight
2022-09-11 19:25:40 +02:00
[tokenize case changes]: ../blog/posts/search-better-faster-smaller.md#case-changes
[tokenize version numbers]: ../blog/posts/search-better-faster-smaller.md#version-numbers
[tokenize html-xml tags]: ../blog/posts/search-better-faster-smaller.md#htmlxml-tags
2020-07-20 15:18:09 +02:00
### Search suggestions
2022-09-11 19:25:40 +02:00
[:octicons-tag-24: 7.2.0][Search suggestions support] ·
:octicons-unlock-24: Feature flag ·
2021-10-11 13:38:03 +02:00
:octicons-beaker-24: Experimental
2021-10-10 21:04:22 +02:00
When search suggestions are enabled, the search will display the likeliest
2021-10-11 13:38:03 +02:00
completion for the last word which can be accepted with the ++arrow-right++ key.
2021-01-31 19:23:28 +01:00
Add the following lines to `mkdocs.yml`:
``` yaml
theme:
features:
- search.suggest
```
2022-09-11 19:25:40 +02:00
Searching for [:octicons-search-24: search su][Search suggestions example]
yields ^^search suggestions^^ as a suggestion.
2022-09-11 19:25:40 +02:00
[Search suggestions support]: https://github.com/squidfunk/mkdocs-material/releases/tag/7.2.0
[Search suggestions example]: ?q=search+su
### Search highlighting
2022-09-11 19:25:40 +02:00
[:octicons-tag-24: 7.2.0][Search highlighting support] ·
:octicons-unlock-24: Feature flag ·
2021-10-11 13:38:03 +02:00
:octicons-beaker-24: Experimental
2021-10-10 21:04:22 +02:00
When search highlighting is enabled and a user clicks on a search result,
2020-08-02 22:09:44 +02:00
Material for MkDocs will highlight all occurrences after following the link.
2021-01-31 19:23:28 +01:00
Add the following lines to `mkdocs.yml`:
``` yaml
theme:
features:
- search.highlight
```
2022-09-11 19:25:40 +02:00
Searching for [:octicons-search-24: code blocks][Search highlighting example]
2021-10-11 13:38:03 +02:00
highlights all occurrences of both terms.
2022-09-11 19:25:40 +02:00
[Search highlighting support]: https://github.com/squidfunk/mkdocs-material/releases/tag/7.2.0
[Search highlighting example]: ../reference/code-blocks.md?h=code+blocks
2020-12-30 19:02:02 +01:00
### Search sharing
2022-09-11 19:25:40 +02:00
[:octicons-tag-24: 7.2.0][Search sharing support] ·
2020-12-30 19:02:02 +01:00
:octicons-unlock-24: Feature flag ·
2021-10-11 13:38:03 +02:00
:octicons-beaker-24: Experimental
2020-12-30 19:02:02 +01:00
2021-10-10 21:04:22 +02:00
When search sharing is activated, a :material-share-variant: share button is
2020-12-30 19:02:02 +01:00
rendered next to the reset button, which allows to deep link to the current
2021-01-31 19:23:28 +01:00
search query and result. Add the following lines to `mkdocs.yml`:
2020-12-30 19:02:02 +01:00
``` yaml
theme:
features:
- search.share
```
When a user clicks the share button, the URL is automatically copied to the
clipboard.
2022-09-11 19:25:40 +02:00
[Search sharing support]: https://github.com/squidfunk/mkdocs-material/releases/tag/7.2.0
2020-12-30 19:02:02 +01:00
2021-05-12 09:29:40 +02:00
## Usage
### Search boosting
2021-05-12 09:29:40 +02:00
[:octicons-tag-24: 8.3.0][boost support] ·
:octicons-beaker-24: Experimental
2021-05-12 09:29:40 +02:00
Pages can be boosted in search with the front matter `search.boost` property,
which will make them rank higher. Add the following lines at the top of a
Markdown file:
2021-05-12 09:29:40 +02:00
2022-09-11 19:25:40 +02:00
``` yaml
2021-05-12 09:29:40 +02:00
---
search:
2021-12-11 14:30:07 +01:00
boost: 2 # (1)!
2021-05-12 09:29:40 +02:00
---
2021-05-30 15:59:13 +02:00
# Document title
2021-05-12 09:29:40 +02:00
...
```
2021-10-11 13:38:03 +02:00
1. :woman_in_lotus_position: When boosting pages, be gentle and start with
__low values__.
[boost support]: https://github.com/squidfunk/mkdocs-material/releases/tag/8.3.0
### Search exclusion
2022-05-05 13:33:14 +02:00
[:octicons-heart-fill-24:{ .mdx-heart } Sponsors only][Insiders]{ .mdx-insiders } ·
2021-10-11 13:38:03 +02:00
[:octicons-tag-24: insiders-3.1.0][Insiders] ·
:octicons-beaker-24: Experimental
Pages can be excluded from search with the front matter `search.exclude`
property, removing them from the index. Add the following lines at the top of a
2021-10-11 13:38:03 +02:00
Markdown file:
2022-09-11 19:25:40 +02:00
``` yaml
---
search:
exclude: true
---
# Document title
...
```
#### Excluding sections
2021-10-11 13:38:03 +02:00
When [Attribute Lists] is enabled, specific sections of pages can be excluded
from search by adding the `{ data-search-exclude }` pragma after a Markdown
heading:
2022-09-11 19:25:40 +02:00
=== ":octicons-file-code-16: `docs/page.md`"
``` markdown
# Document title
## Section 1
The content of this section is included
## Section 2 { data-search-exclude }
The content of this section is excluded
```
2022-09-11 19:25:40 +02:00
=== ":octicons-codescan-16: `search_index.json`"
``` json
{
...
"docs": [
{
"location":"page/",
"text":"",
"title":"Document title"
},
{
"location":"page/#section-1",
"text":"<p>The content of this section is included</p>",
"title":"Section 1"
}
]
}
```
2021-10-11 13:38:03 +02:00
[Attribute Lists]: extensions/python-markdown.md#attribute-lists
#### Excluding blocks
2021-10-11 13:38:03 +02:00
When [Attribute Lists] is enabled, specific sections of pages can be excluded
from search by adding the `{ data-search-exclude }` pragma after a Markdown
inline- or block-level element:
2022-09-11 19:25:40 +02:00
=== ":octicons-file-code-16: `docs/page.md`"
``` markdown
# Document title
The content of this block is included
The content of this block is excluded
{ data-search-exclude }
```
2022-09-11 19:25:40 +02:00
=== ":octicons-codescan-16: `search_index.json`"
``` json
{
...
"docs": [
{
"location":"page/",
"text":"<p>The content of this block is included</p>",
"title":"Document title"
}
]
}
```
2021-05-12 09:29:40 +02:00
2020-07-20 15:18:09 +02:00
## Customization
The search implementation of Material for MkDocs is probably its most
2021-10-11 13:38:03 +02:00
sophisticated feature, as it tries to balance a great typeahead experience,
good performance, accessibility, and a result list that is easy to scan.
2020-07-26 14:46:09 +02:00
This is where Material for MkDocs deviates from other themes.
2020-07-20 15:18:09 +02:00
2020-07-26 14:46:09 +02:00
The following section explains how search can be customized to tailor it to
your needs.
2020-07-20 15:18:09 +02:00
### Query transformation
When a user enters a query into the search box, the query is pre-processed
before it is submitted to the search index. Material for MkDocs will apply the
2021-10-11 13:38:03 +02:00
following transformations, which can be customized by [extending the theme]:
2020-07-20 15:18:09 +02:00
2021-07-18 17:57:45 +02:00
``` ts
export function defaultTransform(query: string): string {
2020-08-02 22:09:44 +02:00
return query
2021-12-11 14:30:07 +01:00
.split(/"([^"]+)"/g) /* (1)! */
.map((terms, index) => index & 1
2020-07-20 15:18:09 +02:00
? terms.replace(/^\b|^(?![^\x00-\x7F]|$)|\s+/g, " +")
: terms
)
.join("")
2021-12-11 14:30:07 +01:00
.replace(/"|(?:^|\s+)[*+\-:^~]+(?=\s+|$)/g, "") /* (2)! */
.trim() /* (3)! */
2020-07-20 15:18:09 +02:00
}
```
2021-10-10 12:19:14 +02:00
1. Search for terms in quotation marks and prepend a `+` modifier to denote
that the resulting document must contain all terms, converting the query
to an `AND` query (as opposed to the default `OR` behavior). While users
may expect terms enclosed in quotation marks to map to span queries, i.e.
for which order is important, `lunr` doesn't support them, so the best
we can do is to convert the terms to an `AND` query.
2. Replace control characters which are not located at the beginning of the
query or preceded by white space, or are not followed by a non-whitespace
character or are at the end of the query string. Furthermore, filter
unmatched quotation marks.
3. Trim excess whitespace from left and right.
2021-06-15 11:56:08 +02:00
2021-05-01 20:26:54 +02:00
If you want to switch to the default behavior of the `mkdocs` and `readthedocs`
themes, both of which don't transform the query prior to submission, or
customize the `transform` function, you can do this by [overriding the
2021-10-11 13:38:03 +02:00
`config` block][overriding blocks]:
2020-07-20 15:18:09 +02:00
``` html
{% extends "base.html" %}
2020-07-20 15:18:09 +02:00
{% block config %}
2021-02-22 22:27:30 +01:00
{{ super() }}
2020-07-20 15:18:09 +02:00
<script>
2021-02-22 22:27:30 +01:00
var __search = {
2020-07-20 15:18:09 +02:00
transform: function(query) {
return query
}
}
</script>
{% endblock %}
```
The `transform` function will receive the query string as entered by the user
and must return the processed query string to be submitted to the search index.
2021-10-11 13:38:03 +02:00
[extending the theme]: ../customization.md#extending-the-theme
[overriding blocks]: ../customization.md#overriding-blocks
2020-07-20 15:18:09 +02:00
2020-07-22 11:57:41 +02:00
### Custom search
2020-07-20 15:18:09 +02:00
2021-10-11 13:38:03 +02:00
Material for MkDocs implements search as part of a [web worker]. If you
2020-07-20 15:18:09 +02:00
want to switch the web worker with your own implementation, e.g. to submit
2021-05-12 09:29:40 +02:00
search to an external service, you can add a custom JavaScript file to the
2021-10-11 13:38:03 +02:00
`docs` directory and [override the `config` block][overriding blocks]:
2020-07-20 15:18:09 +02:00
``` html
2022-01-16 13:44:31 +01:00
{% extends "base.html" %}
2020-07-20 15:18:09 +02:00
{% block config %}
2021-02-22 22:27:30 +01:00
{{ super() }}
2020-07-20 15:18:09 +02:00
<script>
2021-02-22 22:27:30 +01:00
var __search = {
2020-07-20 15:18:09 +02:00
worker: "<url>"
}
</script>
{% endblock %}
```
2021-05-01 20:26:54 +02:00
Communication with the search worker is implemented using a designated message
2021-10-11 13:38:03 +02:00
format using discriminated unions, i.e. through the `type` property of the
2021-05-01 20:26:54 +02:00
message. See the following interface definitions to learn about the message
formats:
2020-07-20 15:18:09 +02:00
2021-10-11 13:38:03 +02:00
- [:octicons-file-code-24: `SearchMessage`][SearchMessage]
- [:octicons-file-code-24: `SearchIndex` and `SearchResult`][SearchIndex]
2020-07-20 15:18:09 +02:00
The sequence and direction of messages is rather intuitive:
2020-11-15 22:25:11 +01:00
- :octicons-arrow-right-24: `SearchSetupMessage`
- :octicons-arrow-left-24: `SearchReadyMessage`
- :octicons-arrow-right-24: `SearchQueryMessage`
- :octicons-arrow-left-24: `SearchResultMessage`
2020-07-20 15:18:09 +02:00
2021-10-11 13:38:03 +02:00
[web worker]: https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API/Using_web_workers
[SearchMessage]: https://github.com/squidfunk/mkdocs-material/blob/master/src/assets/javascripts/integrations/search/worker/message/index.ts
[SearchIndex]: https://github.com/squidfunk/mkdocs-material/blob/master/src/assets/javascripts/integrations/search/_/index.ts