fixes https://github.com/TryGhost/Team/issues/2366
refs https://ghost.slack.com/archives/C02G9E68C/p1670232405014209
Probem described in issue.
In the old MEGA flow:
- The `email_verification_required` check is now repeated inside the job
In the new email service flow:
- The `email_verification_required` is now checked (didn't happen
before)
- When generating the email batch recipients, we only include members
that were created before the email was created. That way it is
impossible to avoid limit checks by inserting new members between
creating an email and sending an email.
- We don't need to repeat the check inside the job because of the above
changes
Improved handling of large imports:
- When checking `email_verification_required`, we now also check if the
import threshold is reached (a new method is introduced in
vertificationTrigger specifically for this usage). If it is, we start
the verification progress. This is required for long running imports
that only check the verification threshold at the very end.
- This change increases the concurrency of fastq to 3 (refs
https://ghost.slack.com/archives/C02G9E68C/p1670232405014209). So when
running a long import, it is now possible to send emails without having
to wait for the import. Above change makes sure it is not possible to
get around the verification limits.
Refactoring:
- Removed the need to use `updateVerificationTrigger` by making
thresholds getters instead of fixed variables.
- Improved awaiting of members import job in regression test
refs https://github.com/TryGhost/Team/issues/2339
- Includes a new pattern in the job manager that allows us to properly
await jobs.
- Added new convenience mocking methods to stub settings
- Tests the main flows for bulk sending:
- Sending in multiple batches
- Sending to multiple segments
- Handling a failed batch and retrying that batch
- Fixes bug in batch generation (ordering not working)
In a different PR I'll add more detailed tests.
fixes https://github.com/TryGhost/Team/issues/2310
This moves the processing of the events from the event-processor to a
new email-event-processor in the email-service package.
- The `EmailEventProcessor` only translates events from
providerId/emailId to their known emailId, memberId and recipientId, and
dispatches the corresponding events.
- Since `EmailEventProcessor` runs in a separate worker thread, we can't
listen for the dispatched events on the main thread. To accomplish this
communication, the events dispatched from the `EmailEventProcessor`
class are 'posted' via the postMessage method and redispatched on the
main thread.
- A new `EmailEventStorage` class reacts to the email events and stores
it in the database. This code mostly corresponds to the (now deleted)
subclass of the old `EmailEventProcessor`
- Updating a members last_seen_at timestamp has moved to the
lastSeenAtUpdater.
- Email events no longer store `ObjectID` because these are not
encodable across threads via postMessage
- Includes new E2E tests that test the storage of all supported Mailgun
events. Note that in these tests we run the processing on the main
thread instead of on a separate thread (couldn't do this because
stubbing is not possible across threads)
There are some missing pieces that will get added in later PRs (this PR
focuses on porting the existing functionality):
- Handling temporary failures/bounces
- Capturing the error messages of bounce events
closes https://github.com/TryGhost/Toolbox/issues/402
- The SQL error was thrown whenever a job error was happening and was trying to persist an error. Persisting an error should only happen for "named" one-off jobs, instead of just one-off jobs.
refs https://github.com/TryGhost/Toolbox/issues/358
- When a one-off job fails it could be restarted during the next call, given it has been cleared from the job queue.
- This readding WILL NOT work for jobs that are restarted within same process (while being kept in the bree's queue). It's specifically targetting one-off jobs like migrations that **might** fail and are only added once per process lifetime.
refs https://github.com/TryGhost/Toolbox/issues/358
- Without going into the model layer (schema) for a job it's hard to figure out which job statuses are available. Using an object with hard typed properties makes the code less prone to typos.
refs https://github.com/TryGhost/Toolbox/issues/358
- The method is a bit of a dangerous to use in cases when the job takes a long time to execute.
- Returning a boolean value did not make sense and provided no helpful information. Having a job model (or not having one) gives the context in which the "completion" happened.
refs ttps://github.com/TryGhost/Toolbox/issues/358
- One off jobs need a way to check for prior execution and await for their completion (in cases when it is reasonably short).
- Added `hasExecuted` and `awaitCompletion` methods to the job manager allowing to monitor one off job state
refs https://github.com/TryGhost/Toolbox/issues/359
- Sending newsletters got broken because underlying "inline job" execution had a bug.
- The real problem was in the job manager trying to verify inline unnamed job status in the database without having a name.
refs https://github.com/TryGhost/Toolbox/issues/359
- Inline one off jobs are needed in situations when we want to run a certain operation only once in the lifecycle of the Ghost instance. These operations should not be extremely long to execute though (not suited for backups or import types of tasks)
refs https://github.com/TryGhost/Toolbox/issues/359
- It's up to a user to decide initializing the job manager without a "jobModel". In these cases the regular recurring job scheduling should work as it did before
refs https://github.com/TryGhost/Toolbox/issues/357
- Job persisted in the database need to track job's execution status such as completion, failure, execution start and end times. This changeset allows to hook into job/bree lifecycle to track job's progress.
- NOTE: only supports "offloaded" jobs at the moment. Support for "inline" jobs will be added once there's a clear usecase for it.
- The "started" status and "started_at" timestamp are assigned to a job at the moment when the worker thread is created inside of bree
- The "finished" status and "finished_at" timestamp are assigned to a job when a "done" event is passed from the job script (NOTE: using process.exit(0) will not trigger the "finished" state")
- The "failed" status is assigned when the job execution is interrupted with an error
refs https://github.com/TryGhost/Toolbox/issues/357
- This is a scaffolding for what will become a one off job scheduling mechanism. The aim is allowing to run jobs which can be only ever be run once in the lifetime of the instance - persisting through restarts.
https://github.com/TryGhost/Toolbox/issues/357
- This is a groundwork before adding one-off (solo) jobs with persistance to the job manager
- Making the addJob method async also makes the whole interface consistent - removeJob and shutdown are also async
refs https://github.com/TryGhost/Ghost/issues/12496
- `workerMessageHandler` option allows for custom worker message handling and allows to eliminate a need for loggers of any type inside of jobs.
- removing loggers from jobs solves file hanle leak which used to cause Ghost process to crash (see referenced issue)
refs #122
- In future changes there's a plan to add "inline" scheduled jobs, which would conflict current method naming.
- The amount of parameters in the methods was more than 3, so it made sense to transform them into an options object
- Scheduling still doesn't work for "inline" jobs. This should be solved as a part of upstream library (https://github.com/breejs/bree/issues/68)
closes https://github.com/TryGhost/Ghost-Utils/issues/118
- Custom error handling is needed to be able to override default bree
error handling logic.
- bree bump to 4.1.0 also fixed logging errors (object Object fix in
tests)
- The handler function receives two parameters. First contains an error
that has been thrown by the job. Second, job and worker metadata
closes#117
- Having immediately executable offloaded jobs is necessary to be able to run usecases like: send batched emails now, or any other job that does not need to be scheduled
- Changed "simple" job timeout to make tests run faster
closes#119
- A future use-case which this feature caters for is allowing to migrate "post scheduler" to use job manager instead of managing scheduling itself
- removeJob method will be needed to allow "rescheduling" of the post
no issue
- When providing a crontab schedule expression it should always contain 6 elements first one of them being a "seconds" schedule description . For example: '0/5 * * * * *' - meaning to run every 5 seconds
no issue
- When jobs are performing CPU intensive tasks they block main process'
event loop. They also can cause memory leaks or unexpected crashes
effectively crashing the parent proccess. To address these issues jobs need to be performed off of main
process. Worker Threads (https://nodejs.org/dist/latest-v12.x/docs/api/worker_threads.html)
are the best candidate for such work.
- These changes introduce an integration on top of bree
(https://github.com/breejs/bree/) which allows to run recurring
jobs in worker thereads. It falls back to child process execution for
Node v10 running without `--experimental-worker` flag.
- bree was chosen not only because it gives a polyfill for older Node
versions. It has support for some of the future use-cases Ghost is looking to
implement, like scheduled jobs.
- This changeset also includes a complete example of job running on an
interval with a possibility for graceful shutdown
no issue
- This is a quick implementation change to prevent from queue stalling and never becoming idle in case there's an error thrown from within the job function/module
- More robust error handling should be designed soon!
no issue
- Jobs should not always be functions, one of the standard practices is having a job defined in a module as a self contained executable function
- First parameter of `addJob` function now also handles path to modules which it imports and executes
no issue
- Accepting the schedule or a data when scheduled job should be run would follow a signature used in other established frameworks: (1) https://github.com/mperham/sidekiq/wiki/Ent-Periodic-Jobs#definition
- Another reason to put scheduling parameter first is this would allow leaving an optional "data" parameter as last
no issue
- CRON format is the most common one used for job scheduling and is well known to most developers
- This will become one of supported formats for job scheduling