The unicode standard allows for certain (visually) identical characters to
be represented in different ways.
For example the character ä may be represented as a single combined
codepoint "Latin Small Letter A with Diaeresis" (U+00E4) or by the
combination of "Latin Small Letter A" (U+0061) followed by "Combining
Diaeresis" (U+0308).
When encoded with UTF-8, these are represented as respectively the two
bytes 0xC3 0xA4, and the three bytes 0x61 0xCC 0x88.
A user linking to notes with these characters in their titles would
expect these two variants to link to the same file, given they are
visually identical and have the exact same semantic meaning.
The unicode standard defines a method to deconstruct and normalize these
forms, so that a byte comparison on the normalized forms of these
variants ends up comparing the same thing. This is called Unicode
Normalization, defined in Unicode® Standard Annex #15
(http://www.unicode.org/reports/tr15/).
The W3C Working Group has written an excellent explanation of the
problems regarding string matching, and how unicode normalization helps
with this process: https://www.w3.org/TR/charmod-norm/#unicodeNormalization
With this change, obsidian-export will perform unicode normalization
(specifically the C (or NFC) normalization form) on all note titles
while looking up link references, ensuring visually identical links are
treated as being similar, even if they were encoded as different
variants.
A special thanks to Hans Raaf (@oderwat) for reporting and helping track
down this issue.
---
Closes#126
Instead of passing clones of context and the markdown tree to
postprocessors, pass them a mutable reference which may be modified
in-place.
This is a breaking change to the postprocessor implementation, changing
both the input arguments as well as the return value:
```diff
- dyn Fn(Context, MarkdownEvents) -> (Context, MarkdownEvents, PostprocessorResult) + Send + Sync;
+ dyn Fn(&mut Context, &mut MarkdownEvents) -> PostprocessorResult + Send + Sync;
```
With this change the postprocessor API becomes a little more ergonomic
to use however, especially making the intent around return statements more clear.
This change introduces a new `--hard-linebreaks` CLI argument. When
used, this converts soft line breaks to hard line breaks, mimicking
Obsidian's "Strict line breaks" setting.
Implementation detail: I considered naming this flag
`--strict-line-breaks` to be consistent with Obsidian itself, however I
feel the name is somewhat misleading and ill-chosen.
This introduces support for postprocessors that are run on the result of
a note that is being embedded into another note. This differs from the
existing postprocessors (which remain unchanged) that run once all
embeds have been processed and merged with the final note.
These "embed postprocessors" may be set through the new
`Exporter::add_embed_postprocessor` method.
This introduces a new `--start-at` CLI argument and corresponding
`start_at()` method on the Exporter type that allows exporting of only a
given subdirectory within a vault.
See the updated README file for more details on when and how this may be
used.
Add support for postprocessing of Markdown prior to writing converted
notes to disk.
Postprocessors may be used when making use of Obsidian export as a Rust
library to do the following:
1. Modify a note's `Context`, for example to change the destination
filename or update its Frontmatter.
2. Change a note's contents by altering `MarkdownEvents`.
3. Prevent later postprocessors from running or cause a note to be
skipped entirely.
Future releases of Obsidian export may come with built-in postprocessors
for users of the command-line tool to use, if general use-cases can be
identified.
For example, a future release might include functionality to make notes
more suitable for the Hugo static site generator. This functionality
would be implemented as a postprocessor that could be enabled through
command-line flags.
A recent Obsidian update expanded the list of allowed characters in
filenames, which now includes `?` as well. This needs to be
percent-encoded for proper links in static site generators like Hugo.
Notes with an underscore would fail to be recognized within Obsidian
`[[_WikiLinks]]` due to the assumption that the underlying Markdown
parser (pulldown_cmark) would emit the text between [[ and ]] as a
single event.
The note parser has now been rewritten to use a more reliable state
machine which correctly recognizes this corner-case (and likely some
others).
Previously, `filter_fn` on the `WalkOptions` struct looked like:
pub filter_fn: Option<Box<&'static FilterFn>>,
This boxing was unneccesary and has been changed to:
pub filter_fn: Option<&'static FilterFn>,
This will only affect people who use obsidian-export as a library in
other Rust programs, not users of the CLI.
For those library users, they no longer need to supply `FilterFn`
wrapped in a Box.
This commit fixes a bug where, if a note contained uppercase characters
(for example `Note.md`) but was referred to using lowercase
`(`[[note]]`), that note would not be found.
It's possible to end up with "recursive embeds" when two notes embed
each other. This happens for example when a `Note A.md` contains
`![[Note B]]` but `Note B.md` also contains `![[Note A]]`.
By default, this will trigger an error and display the chain of notes
which caused the recursion.
Using the new `--no-recursive-embeds`, if a note is encountered for a
second time while processing the original note, rather than embedding it
again a link to the note is inserted instead to break the cycle.
See also: https://github.com/zoni/obsidian-export/issues/1
By default hidden files, patterns listed in `.export-ignore` as well as
any files ignored by git are excluded from exports. This behavior has
been made configurable on the CLI using the new flags `--hidden`,
`--ignore-file` and `--no-git`.
Previously, links referencing a heading (`[[note#heading]]`) would just
link to the file name without including an anchor in the link target.
Now, such references will include an appropriate `#anchor` attribute.
Note that neither the original Markdown specification, nor the more
recent CommonMark standard, specify how anchors should be constructed
for a given heading.
There are also some differences between the various Markdown rendering
implementations.
Obsidian-export uses the [slug] crate to generate anchors which should
be compatible with most implementations, however your mileage may vary.
(For example, GitHub may leave a trailing `-` on anchors when headings
end with a smiley. The slug library, and thus obsidian-export, will
avoid such dangling dashes).
[slug]: https://crates.io/crates/slug
Previously, partial embeds (`![[note#heading]]`) would always include
the entire file into the source note. Now, such embeds will only include
the contents of the referenced heading (and any subheadings).
Links and embeds of [arbitrary blocks] remains unsupported at this time.
[arbitrary blocks]: https://publish.obsidian.md/help/How+to/Link+to+blocks
Links within an embedded note would point to other local resources
relative to the filesystem location of the note being embedded.
When a note inside a different directory would embed such a note, these
links would point to invalid locations.
Now these links are calculated relative to the top note, which ensures
these links will point to the right path.
This reduces the need to pass vault_contents around in various places
and restricts Context to dealing with the actual note which is being
processed, instead of also carrying program state information.
This will help with future feature development as note parsing functions
can now access Exporter directly.
This refactors the Context to maintain a list of all the files which
have been processed so far in a chain of embeds. This information is
then used to print a more helpful error message to users of the CLI when
RecursionLimitExceeded is returned.