closes ENG-627
We were using `cheerio` to parse+modify+serialize our rendered HTML to modify links for member attribution. Cheerio's serializer has a [long-standing issue](https://github.com/cheeriojs/cheerio/issues/720) (that we've [had to deal with before](https://github.com/TryGhost/SDK/issues/124)) where it replaces single-quote attributes with double-quote attributes. That was resulting in broken rendering when content used single-quotes such as in HTML cards that have JSON data inside a `data-` attribute or otherwise used single-quotes to avoid escaping double-quotes in an attribute value.
- swapped the implementation that uses `cheerio` for one that uses `html5parser` to tokenize the html string, from there we can loop over the tokens and replace the href attribute values in the original string without touching any other part of the content. Avoids a full parse+serialize process which is both more costly and can result unexpected content changes due to serializer opinions.
- fixes the quote change bug
- uses tokenization directly to avoid cost of building a full AST
- updated Content API Posts snapshot
- one of our fixtures has a missing closing tag which we're no longer "fixing" with a full parse+serialize step in the link replacer (keeps modified src closer to original and better matches behaviour elsewhere in the app / without member-attribution applied)
- the link replacer no longer converts `attr=""` to `attr` (these are equivalent in the HTML spec so no change in behaviour other than preserving the original source html)
- added a benchmark test file comparing the two implementations because the link replacer runs on render so it's used in a hot path
- new implementation has a 3x performance improvement
- the separate files with the old/new implementations have been cleaned up but I've left the benchmark test file in place for future reference
Benchmark results comparing implementations:
```
❯ node test/benchmark.js
LinkReplacer
├─ cheerio: 5.03K /s ±2.20%
├─ html5parser: 16.5K /s ±0.43%
Completed benchmark in 0.9976526670455933s
┌─────────────┬─────────┬────────────┬─────────┬───────┐
│ (index) │ percent │ iterations │ current │ max │
├─────────────┼─────────┼────────────┼─────────┼───────┤
│ cheerio │ '' │ '5.03K/s' │ 5037 │ 5037 │
│ html5parser │ '' │ '16.5K/s' │ 16534 │ 16534 │
└─────────────┴─────────┴────────────┴─────────┴───────┘
```
no issue
When a user fails to pay the first time, and returns to the site, the
next payment will be attributed to Stripe instead of the original
referrer. This change always ignores the Stripe checkout as referrer in
the URL history.
refs: https://github.com/TryGhost/Toolbox/issues/595
We're rolling out new rules around the node assert library, the first of which is enforcing the use of assert/strict. This means we don't need to use the strict version of methods, as the standard version will work that way by default.
This caught some gotchas in our existing usage of assert where the lack of strict mode had unexpected results:
- Url matching needs to be done on `url.href` see aa58b354a4
- Null and undefined are not the same thing, there were a few cases of this being confused
- Particularly questionable changes in [PostExporter tests](c1a468744b) tracked [here](https://github.com/TryGhost/Team/issues/3505).
- A typo see eaac9c293a
Moving forward, using assert strict should help us to catch unexpected behaviour, particularly around nulls and undefineds during implementation.
fixes https://github.com/TryGhost/Team/issues/3331
This adds attribution tracking to the signup form. It sends a newly
created url history when sending the signup API call, this url history
will get translated to a proper attribution and saved on the backend. We
send a history with only a single item that contains the referrer
source, medium and path of the Embed form.
This also makes some changes to the E2E tests so that the tests run
in an https environment instead of about:blank.
refs https://github.com/TryGhost/Team/issues/3172
The outbound link tagger was tagging non http urls (i.e `javascript:`,
`mailto:`) which would prevent these urls from working as expected. This
change only allows urls to be tagged if they use the `http(s)` protocol.
As discussed with the product team we want to enforce kebab-case file names for
all files, with the exception of files which export a single class, in which
case they should be PascalCase and reflect the class which they export.
This will help find classes faster, and should push better naming for them too.
Some files and packages have been excluded from this linting, specifically when
a library or framework depends on the naming of a file for the functionality
e.g. Ember, knex-migrator, adapter-manager
refs TryGhost/Team#2742
- The federal reserve's website returns a 404 for any URLs that include
query params, so our member attribution/outbound link tagging was
breaking any links to the federal reserve's website
- This adds the federal reserve's website to the list of blocked domains
so that we don't append ?ref= to any links to the federal reserve's
website
refs TryGhost/Team#2755
Previous change for outbound tagging had incorrectly changed the internal links ref for newsletter to include domain. This change fixes it back to use newsletter name for internal links.
fixes https://github.com/TryGhost/Team/issues/2433
- Moved all outbound link tagging code to separate OutboundLinkTagger
- Because a site can easily enable/disable this feature, we don't store
the ?refs in the HTML but add them on the fly for now in the Content
API.
fixes https://github.com/TryGhost/Team/issues/2432
Adds outbound_link_tagging setting (enabled by default and behind
feature flag). If the feature flag is enabled, and the setting is
disabled, we won't add ?ref to links in emails.
This includes new E2E tests for email click tracking, which were also
extended to check outbound link tagging (for both MEGA and the new email
stability flow).
Also fixes a test fixture for the comments_enabled setting.
fixes https://github.com/TryGhost/Team/issues/2025
fixes https://github.com/TryGhost/Team/issues/2023
The `ref` attribute has changed in email links:
- We now use the site name when linking to external sites
- We blacklist facebook.com because it doesn't support ref attributes
- '-newsletter' is not repeated anymore if the newsletter name already ends with 'newsletter'
- We always sluggify the ref
- We no longer overwrite existing ref, utm_source or source parameters
refs https://ghost.slack.com/archives/C02G9E68C/p1667834794676479
- When enabling tracking, it could be the case that the server is ignoring the attributions because of the cached setting value.
- When disabling tracking, the frontend should take care of not
collecting new tracking information to the server, but still the backend value should be used as a fail-safe.
closesTryGhost/Team#2007
- uses request context to add referrer source and medium for a new member
- uses integration name as referrer medium if exists
The slug from the fixtures is "default-newsletter" which doesn't correctly
reflect the name of most sites newsletters. Because we're using the URL
constructor it handles all of the URL encoding/decoding on both ends for us.
- renames `refSource`, `refMedium` and `refUrl` to `referrerSource`, `referrerMedium` and `referrerUrl` respectively for consistent naming across files and usages
refs https://github.com/TryGhost/Team/issues/1907
- assigns top-level domain as source for external referrers that don't have known mapping
- returns `Direct` as source for new members when no referrer information is found
refs TryGhost/Team#1923
- updates referrer translator logic to use grouping for known domains/urls to calculate source and medium
- returns first external ref url if nothing matches in history in first pass
refs TryGhost/Team#1907
- calculates final attribution source and medium using captured referrer information in history
- adds new referrer-translator that goes through available history and based to determine most valid referrer info
- includes referrer url, source and medium in the attribution data for storage
refs https://github.com/TryGhost/Team/issues/1899
- Added `addEmailAttributionToUrl` method to MemberAttributionService. This adds both the source attribution (`rel=newsletter`) and member attribution (`?attribution_id=123&attribution_type=post`) to a URL.
- The URLHistory can now contain a new sort of items: `{type: 'post', id: 'post-id', time: 123}`.
- Updated frontend script to read `?attribution_id=123&attribution_type=post` from the URL and add it to the URLHistory + clear it from the URL.
- Wired up some external dependencies to LinkReplacementService and added some dummy code.
- Increased test coverage of attribution service
- Moved all logic that removes the subdirectory from a URL to the UrlTranslator instead of the AttributionBuilder
- The UrlTranslator now parses a URLHistoryItem to an object that can be used to build an Attribution instance
- Excluded sites with different domain from member id and attribution tracking
fixes https://github.com/TryGhost/Team/issues/1821
This change moves all the event storage logic to one new place: the event storage class in the MembersEventsService, which is initialised in a new members events service wrapper.
Apart from this, this includes some improvements:
- Removed DomainEvents from the constructor arguments to the subscribe method (to make it more clear where to subscribe to and decrease dependencies)
- LastSeenAtUpdater doesn't subscribe in the constructor any longer (removes unclear side effect)
- Moved LastSeenAtUpdater initialisation to new members events service wrapper
- Added missing tests to LastSeenAtUpdater to assure that the MembersEventsService package has 100% coverage.
In case there is an issue with the filtering of items in our client
side attribution script, we also check for and remove out of date
items here. This ensures that we do not erroneously attribute signups
or conversions to webpages from more than 24h ago.
This keeps the constructor clean, relying on types for validation,
whilst preserving the validation when creating the instance. The
constructor is now private so that the factory which handles
validation is always used.
The tests have also been updated to test the public factory interface
rather than the internal validation methods. Validation has been
rolled into a single method and slightly improved in the way of
readability.
refs https://github.com/TryGhost/Team/issues/1833
refs https://github.com/TryGhost/Team/issues/1834
We've added the attribution property to subscription and signup events when the
flag is enabled. The attributions resource is fetched by creating multiple relations
on the model, rather than polymorphic as we ran into issues with that as they can't
be nullable/optional.
The parse-member-event structure has been updated to make it easier to work with,
specifically `getObject` is only used when the event is clickable, and there is now a
join property which makes it easier to join the action and the object.