Make validation async #13

julik · 2024-07-05T16:58:01Z

Most mistakes in webhook configuration are usually related to the configuration of the secrets for signatures. While it is good in some situations to reject the incorrectly signed webhooks immediately, in way more situations it will actually be the misconfigured signature that'll be at fault. A better approach is to allow the webhooks to be respooled for processing when the signature configuration has been rectified.

Use valid? in the background job instead of the controller. Most common configuration issue is an incorrectly specified signing secret, or an incorrectly implemented input validation. When these happen, it is better to allow the webhook to be reprocessed
Use instance methods in handlers instead of class methods, as they are shorter to define. Assume a handler module supports .new - with a module using singleton methods it may return self from new.
In the config, allow the handlers specified as strings. Module resolution in Rails happens after the config gets loaded, because the config may alter the Zeitwerk load paths. To allow the config to get loaded and to allow handlers to be autoloaded using Zeitwerk, the handler modules have to be resolved lazily. This also permits the handlers to be reloadable, like any module under Rails' autoloading control.
Simplify the Rails app used in tests to be small and keep it in a single file
If a handler is willing to expose errors to the caller, let Rails rescue the error and display an error page or do whatever else is configured for Rails globally.
Store request headers with the received webhook to allow for async validation

Closes #12 #11

It should be possible to run tests from the Rakefile, and the default action for `rake` should be to run tests. Running tests (working code) is more important than formatting (standard) - so first test, then lint.

Most mistakes in webhook configuration are usually related to the configuration of the secrets for signatures. While it is good in some situations to reject the incorrectly signed webhooks immediately, in way more situations it will actually be the misconfigured signature that'll be at fault. A better approach is to allow the webhooks to be respooled for processing when the signature configuration has been rectified.

The handler is stateless, so it doesn't have to linger on in memory across calls. But it seems that the serialized ActionDispatch request loses its route params.

Rails loads the Engine module on its own, and the module loading is handled by Bundler/rubygems. This also removes a circular load warning which would happen when running tests.

To configure handlers during Rails init, you need to force-require them - which goes contrary to the Rails autoloading.

There is no need to have an entire "dummy" app structure. The same can be achieved with a single-file app (see https://greg.molnar.io/blog/a-single-file-rails-application/) With that, all the directories for model tests etc. can re-appear once we actually have tests to put inside of them. Same for fixtures - is it really needed to have a fixture with a webhook payload which fits in a string, and use the entire Rails fixture machinery? We are after finding out whether our webhook ends up in the DB correctly. Less is more.

lib/munster.rb

skatkov · 2024-07-10T14:16:26Z

lib/munster/base_handler.rb

-    rescue ActiveRecord::RecordNotUnique # Deduplicated
-      nil
+    rescue ActiveRecord::RecordNotUnique # Webhook deduplicated
+      Rails.logger.info { "#{inspect} Webhook #{handler_event_id} is a duplicate delivery and will not be stored." }


praise: I would suggest to use Rails.error.report("#{inspect} Webhook #{handler_event_id} is a duplicate delivery and will not be stored.", handled: true, severity: :info) instead here. We already rely on it and it feels confusing that we use .logger for one case and .error for another.

Suggested change

Rails.logger.info { "#{inspect} Webhook #{handler_event_id} is a duplicate delivery and will not be stored." }

Rails.error.report("#{inspect} Webhook #{handler_event_id} is a duplicate delivery and will not be stored.", handled: true, severity: :info)

lib/munster/models/received_webhook.rb

skatkov · 2024-07-10T14:19:36Z

lib/munster/jobs/processing_job.rb

+        webhook.handler.process(webhook)
+        # TODO: remove process attribute
+      else
+        Rails.logger.info { "Webhook #{webhook.inspect} did not pass validation and was skipped" }


praise: I would prefer if you use Rails.error interface here, since we already rely on it.

Any reason, you decided not to use Rails.error.report("Webhook #{webhook.inspect} did not pass validation and was skipped", handled: true, severity: :info) here?

skatkov · 2024-07-10T14:29:30Z

lib/munster/base_handler.rb

+    # we default it to `false`. The default is going to be `true` in future versions of Munster.
+    #
+    # @return [Boolean]
+    def validate_async?


thought: I lean towards making all validation asynchronous, as async processing is a core principle of the Munster gem.

Regarding the stated problem you're describing:

This prevents malicious senders from spamming your DB and causing a denial-of-service on it. That's why this is made configurable.

I have two objections to this approach:

In my experience, I've never encountered this issue. It seems unlikely that flooding a webhook endpoint that performs only a single insert operation would cause a denial-of-service. Attackers would likely target other endpoints that perform more complex operations.

If such an attack were to occur, I would address it differently. Implementing rate limiting for the entire webhooks endpoint seems like a more effective solution.

Additionally, since this gem hasn't reached a stable release yet, we don't need to maintain backward compatibility.

My idea was to ensure backwards compatibility for webhooks which have been persisted without headers, for those doing the background validation would not be possible. But given the gem is still officially in a "flux" state, I think we are at liberty to make breaking changes as we see fit. Let's make all validations async. Rate limits on the ingress without validation might be the wrong approach, but then again - this is prematurely tweaking for problems we, ourselves, haven't observed yet. So 🚀

skatkov · 2024-07-10T14:34:33Z

lib/munster/controllers/receive_webhooks_controller.rb

    end

    class HandlerInactive < StandardError
    end

-    def create
-      handler = lookup_handler(params[:service_id]).new
+    class UnknownHandler < StandardError


nitpick: Should we maybe move these error definitions elsewhere? Not sure why they are defined under ReceiveWebhooksController class.

They are convenient to place in the controller because they don't need a namespace to raise and to match, and this is the only spot where they get raised. Once we transition to a setup where these errors may get raised from the background job too - we'll move them into the gem namespace.

skatkov · 2024-07-10T14:35:31Z

lib/munster/models/received_webhook.rb

+      if self.class.column_names.include?("request_headers")
+        write_attribute("request_headers", headers)
+      else
+        Rails.logger.warn { "You need to run Munster migrations so that request headers can be persisted with the model. Async validation is not going to work without that column being set." }


nitpick: I suggest we use Rails.error interface.

I think I will remove this error path altogether - it is reasonable to expect people to migrate before running the app

skatkov · 2024-07-10T14:38:13Z

lib/munster/models/received_webhook.rb


 module Munster
  class ReceivedWebhook < ActiveRecord::Base
+    MISSING_HEADERS_COLUMN_ERROR = <<~EOS


suggestion: Can we please move these errors into a separate file? Our errors are a bit all over the place now (some are in controllers, some are here in models)

Can we also subclass StandardError for these, please?

That I am not going to do - co-locating these errors with the only spot they can get raised is an affordance to the reader, and it is deliberate. However, if we are not so fixed on maintaining backwards compatibility (and given your review I think we should not be so fixed on it, after all) these errors can just be deleted.

lib/munster/models/received_webhook.rb

skatkov · 2024-07-10T15:01:10Z

lib/munster/models/received_webhook.rb

+      raise MISSING_HEADERS_COLUMN_ERROR unless self.class.column_names.include?("request_headers")
+      raise MISSING_HEADERS_ERROR if request_headers.blank?
+      headers = try(:request_headers) || {}
+      ActionDispatch::Request.new(headers.merge!("rack.input" => StringIO.new(body.to_s.b)))


thought: I'm kind of wondering:

How big of a size all this data would take? In most cases, it will probably take up more space than body itself. Do we need to purge records from database in this case?

Can this create issues with future version of rails with chages in ActionDispatch::Request? Why saving just headers is not enough?

Re. size - as big as is necessary. We do need to strip the raw post body though, which I forgot to do - essentially we store the request body twice.

I am using ActionDispatch::Request for a number of reasons.

It was already used in valid? in all handlers, and all our handlers which validate expect that API to be available on the passed argument

ActionDispatch::Request provides access to params...

...and more importantly - to the path params, which get set by the Rails router.

While we did implement the original Munster handlers with the expectation that they will be re-parsing the request body themselves, keeping the entire request available allows Rails to take over for parsing, and folks can use params like they are used to in Rails controllers, which is also a nice UX improvement IMO.

None of these affordances are possible to realise by other means. The API for ActionDispatch::Request.new is public API in Rails - if it changes, it would be just the same change as other changes in Rails internals, and our tests should catch that it changed.

We do need to strip the raw post body though, which I forgot to do - essentially we store the request body twice.

In this case, it seems reasonable to keep body in ActionDispatch::Request and not store the body in a separate column.

julik

Thanks for your comments, quite a few good ideas there ❤️

julik · 2024-07-12T11:33:08Z

lib/munster/base_handler.rb

+    # we default it to `false`. The default is going to be `true` in future versions of Munster.
+    #
+    # @return [Boolean]
+    def validate_async?


My idea was to ensure backwards compatibility for webhooks which have been persisted without headers, for those doing the background validation would not be possible. But given the gem is still officially in a "flux" state, I think we are at liberty to make breaking changes as we see fit. Let's make all validations async. Rate limits on the ingress without validation might be the wrong approach, but then again - this is prematurely tweaking for problems we, ourselves, haven't observed yet. So 🚀

julik · 2024-07-12T11:34:10Z

lib/munster/controllers/receive_webhooks_controller.rb

    end

    class HandlerInactive < StandardError
    end

-    def create
-      handler = lookup_handler(params[:service_id]).new
+    class UnknownHandler < StandardError


They are convenient to place in the controller because they don't need a namespace to raise and to match, and this is the only spot where they get raised. Once we transition to a setup where these errors may get raised from the background job too - we'll move them into the gem namespace.

julik · 2024-07-12T11:34:34Z

lib/munster/jobs/processing_job.rb

+        webhook.handler.process(webhook)
+        # TODO: remove process attribute
+      else
+        Rails.logger.info { "Webhook #{webhook.inspect} did not pass validation and was skipped" }


julik · 2024-07-12T11:35:35Z

lib/munster/models/received_webhook.rb


 module Munster
  class ReceivedWebhook < ActiveRecord::Base
+    MISSING_HEADERS_COLUMN_ERROR = <<~EOS


That I am not going to do - co-locating these errors with the only spot they can get raised is an affordance to the reader, and it is deliberate. However, if we are not so fixed on maintaining backwards compatibility (and given your review I think we should not be so fixed on it, after all) these errors can just be deleted.

lib/munster/models/received_webhook.rb

julik · 2024-07-12T11:36:57Z

lib/munster/models/received_webhook.rb

+      if self.class.column_names.include?("request_headers")
+        write_attribute("request_headers", headers)
+      else
+        Rails.logger.warn { "You need to run Munster migrations so that request headers can be persisted with the model. Async validation is not going to work without that column being set." }


I think I will remove this error path altogether - it is reasonable to expect people to migrate before running the app

lib/munster/models/received_webhook.rb

julik · 2024-07-12T11:42:21Z

lib/munster/models/received_webhook.rb

+      raise MISSING_HEADERS_COLUMN_ERROR unless self.class.column_names.include?("request_headers")
+      raise MISSING_HEADERS_ERROR if request_headers.blank?
+      headers = try(:request_headers) || {}
+      ActionDispatch::Request.new(headers.merge!("rack.input" => StringIO.new(body.to_s.b)))


Re. size - as big as is necessary. We do need to strip the raw post body though, which I forgot to do - essentially we store the request body twice.

I am using ActionDispatch::Request for a number of reasons.

It was already used in valid? in all handlers, and all our handlers which validate expect that API to be available on the passed argument

ActionDispatch::Request provides access to params...

...and more importantly - to the path params, which get set by the Rails router.

While we did implement the original Munster handlers with the expectation that they will be re-parsing the request body themselves, keeping the entire request available allows Rails to take over for parsing, and folks can use params like they are used to in Rails controllers, which is also a nice UX improvement IMO.

None of these affordances are possible to realise by other means. The API for ActionDispatch::Request.new is public API in Rails - if it changes, it would be just the same change as other changes in Rails internals, and our tests should catch that it changed.

"jsonb" is only available in Postgres

skatkov

Nicely done! Great improvement.

I just left couple of comments where you can slightly clean up things ;-)

skatkov · 2024-07-23T09:31:38Z

lib/munster/base_handler.rb

-    rescue ActiveRecord::RecordNotUnique # Deduplicated
-      nil
+    rescue ActiveRecord::RecordNotUnique # Webhook deduplicated
+      Rails.logger.info { "#{inspect} Webhook #{handler_event_id} is a duplicate delivery and will not be stored." }


Suggested change

Rails.logger.info { "#{inspect} Webhook #{handler_event_id} is a duplicate delivery and will not be stored." }

Rails.error.report("#{inspect} Webhook #{handler_event_id} is a duplicate delivery and will not be stored.", handled: true, severity: :info)

skatkov · 2024-07-23T09:40:37Z

lib/munster/jobs/processing_job.rb

+        webhook.handler.process(webhook)
+        # TODO: remove process attribute
+      else
+        Rails.logger.info { "Webhook #{webhook.inspect} did not pass validation and was skipped" }


Any reason, you decided not to use Rails.error.report("Webhook #{webhook.inspect} did not pass validation and was skipped", handled: true, severity: :info) here?

skatkov · 2024-07-23T09:46:09Z

test/test_helper.rb

-# Configure Rails Environment
-ENV["RAILS_ENV"] = "test"
+# ENV["RAILS_ENV"] = "test"
+#


nitpick: would be nice to clean comment here.

julik · 2024-07-23T12:02:29Z

@skatkov I've decided not to use the error reporter because it is for errors and also expects an error object of some kind. We are just writing a log message. I don't see the Rails error reporting as a vehicle for structured logging, and I don't know whether, for example, a given subscriber on the error reporter chain will or will not try to read out backtrace or other things normally associated with an Exception subclass.

skatkov · 2024-07-23T12:05:20Z

@julik sounds reasonable. thanks for explanation.

julik added 28 commits July 5, 2024 18:15

Fix grammar in migration

7426558

Add "validate_async?" on BaseHandler

020f5ca

Rework the Rakefile

4492fee

It should be possible to run tests from the Rakefile, and the default action for `rake` should be to run tests. Running tests (working code) is more important than formatting (standard) - so first test, then lint.

Add x86 gems

ad49f46

Run Rake in GH CI

746152e

Slepping

e8a95ab

Tackle the instance/module ambiguity

bb4df21

Improve BaseHandler comments a bit

686c225

And some more commentage

ae1bb05

Some more motions

5bac93e

Continue

58107da

It looks like class/module methods are better

0b5d285

The handler is stateless, so it doesn't have to linger on in memory across calls. But it seems that the serialized ActionDispatch request loses its route params.

Treat path params separately

c96b9f5

Remove empty Rails test dirs

8f238a1

More docs and things

4e6df37

Yeap

f7c5121

Patch up the README

c694b9e

No need to require Munster within itself

254c7ed

Rails loads the Engine module on its own, and the module loading is handled by Bundler/rubygems. This also removes a circular load warning which would happen when running tests.

Apply the migration

46ee3cf

Add a route to test path params

33aee2b

Simplify exception handling

4b6d04e

Allow handlers to be set via strings

9e10e7d

To configure handlers during Rails init, you need to force-require them - which goes contrary to the Rails autoloading.

Revert back to instance methods

9073803

Continue

0d3a8dc

Remove unused top-level method

8fff5ce

Continue evil experiments

d8fc4f9

skatkov reviewed Jul 10, 2024

View reviewed changes

julik commented Jul 12, 2024

View reviewed changes

julik added 5 commits July 12, 2024 15:59

This test does not do much

bab1e70

Make all validations async

5415f95

Use "json" for headers

abef494

"jsonb" is only available in Postgres

Continue

642167c

Continue

2f02a95

skatkov force-pushed the make-validation-async branch from 051329e to c3b3624 Compare July 22, 2024 10:36

load global_id/railitie

5e8a17f

skatkov force-pushed the make-validation-async branch from c3b3624 to 5e8a17f Compare July 22, 2024 10:37

julik added 5 commits July 22, 2024 18:10

Continue re-enabling tests

03bc2dd

And some more

aabb01a

Some changes to error handling

869f8e0

Split the test handlers for clarity

3bad1d3

Update changelog

07ac440

julik marked this pull request as ready for review July 22, 2024 18:33

julik requested a review from skatkov July 22, 2024 18:33

skatkov approved these changes Jul 23, 2024

View reviewed changes

julik added 2 commits July 23, 2024 13:03

Remove unused test helper code

4942fd0

Fixtures are no longer used

bcc1d4c

julik merged commit d98041f into main Jul 23, 2024
1 check passed

julik mentioned this pull request Jul 23, 2024

Make handlers lazy-loadable via Zeitwerk #11

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make validation async #13

Make validation async #13

julik commented Jul 5, 2024 •

edited

Loading

skatkov Jul 10, 2024

julik Jul 12, 2024

skatkov Jul 23, 2024

skatkov Jul 10, 2024

julik Jul 12, 2024

skatkov Jul 23, 2024

skatkov Jul 10, 2024

julik Jul 12, 2024

skatkov Jul 10, 2024

julik Jul 12, 2024

skatkov Jul 10, 2024

julik Jul 12, 2024

skatkov Jul 10, 2024

julik Jul 12, 2024

skatkov Jul 10, 2024

julik Jul 12, 2024

skatkov Jul 12, 2024

julik left a comment

julik Jul 12, 2024

julik Jul 12, 2024

julik Jul 12, 2024

julik Jul 12, 2024

julik Jul 12, 2024

julik Jul 12, 2024

skatkov left a comment •

edited

Loading

skatkov Jul 23, 2024

skatkov Jul 23, 2024

skatkov Jul 23, 2024

julik commented Jul 23, 2024

skatkov commented Jul 23, 2024

	Rails.logger.info { "#{inspect} Webhook #{handler_event_id} is a duplicate delivery and will not be stored." }
	Rails.error.report("#{inspect} Webhook #{handler_event_id} is a duplicate delivery and will not be stored.", handled: true, severity: :info)

Make validation async #13

Make validation async #13

Conversation

julik commented Jul 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

julik left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

skatkov left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

julik commented Jul 23, 2024

skatkov commented Jul 23, 2024

julik commented Jul 5, 2024 •

edited

Loading

skatkov left a comment •

edited

Loading