Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

n+1 for side loaded records #83

Open
adambedford opened this issue Sep 21, 2017 · 11 comments
Open

n+1 for side loaded records #83

adambedford opened this issue Sep 21, 2017 · 11 comments

Comments

@adambedford
Copy link

adambedford commented Sep 21, 2017

I'm using JSON API include to side load two associations (two levels deep: company,company.company-master) and I'm seeing an N+1 issue when plugging in Pundit + JSONAPI Athorization into my JSONAPI Resources Processor stack.

I've tried explicitly including those two associations in the policy scope with no success, although this shouldn't be required since JSONAPI Resources should be handling this. It does when JSONAPI Authorization isn't involved.

The logs produced:

Started GET "/api/v1/job-applications?include=company%2Ccompany.company-master" for ::1 at 2017-09-20 18:11:06 -0700
[NAME COLLISION] `type` is a reserved key in EventResource.
[NAME COLLISION] `type` is a reserved key in AccountResource.
Processing by Api::V1::JobApplicationsController#index as JSON
  Parameters: {"include"=>"company,company.company-master"}
  User Load (0.7ms)  SELECT  "users".* FROM "users" WHERE "users"."id" = $1 ORDER BY "users"."id" ASC LIMIT $2  [["id", 6], ["LIMIT", 1]]
  JobApplication Load (1.4ms)  SELECT  "job_applications".* FROM "job_applications" WHERE "job_applications"."user_id" = 6 ORDER BY "job_applications"."id" ASC LIMIT $1 OFFSET $2  [["LIMIT", 1000], ["OFFSET", 0]]
  Events::Applied Load (1.6ms)  SELECT "events".* FROM "events" WHERE "events"."type" IN ('Events::Applied') AND "events"."subject_type" = $1 AND "events"."subject_id" IN (465, 466, 467, 468, 479)  [["subject_type", "JobApplication"]]
  Source Load (0.7ms)  SELECT "sources".* FROM "sources" WHERE "sources"."id" = 21
   (1.8ms)  SELECT "job_applications"."id", "companies"."id" FROM "job_applications" INNER JOIN "companies" ON "companies"."id" = "job_applications"."company_id" LEFT OUTER JOIN "events" ON "events"."subject_id" = "job_applications"."id" AND "events"."type" IN ('Events::Applied') AND "events"."subject_type" = $1 LEFT OUTER JOIN "sources" ON "sources"."id" = "job_applications"."source_id" WHERE "job_applications"."user_id" = 6 AND "job_applications"."id" IN (465, 466, 467, 468, 479)  [["subject_type", "JobApplication"]]
  Company Load (1.0ms)  SELECT "companies".* FROM "companies" WHERE "companies"."user_id" = 6 AND "companies"."id" IN (260, 285, 286, 287, 288)
  CompanyMaster Load (1.3ms)  SELECT "company_masters".* FROM "company_masters" WHERE "company_masters"."id" IN (338, 312, 79, 301, 359)
   (1.8ms)  SELECT "job_applications"."id", "companies"."id", "company_masters"."id" FROM "job_applications" INNER JOIN "companies" ON "companies"."id" = "job_applications"."company_id" INNER JOIN "company_masters" ON "company_masters"."id" = "companies"."company_master_id" LEFT OUTER JOIN "events" ON "events"."subject_id" = "job_applications"."id" AND "events"."type" IN ('Events::Applied') AND "events"."subject_type" = $1 LEFT OUTER JOIN "sources" ON "sources"."id" = "job_applications"."source_id" WHERE "job_applications"."user_id" = 6 AND "job_applications"."id" IN (465, 466, 467, 468, 479)  [["subject_type", "JobApplication"]]
  CompanyMaster Load (1.7ms)  SELECT "company_masters".* FROM "company_masters" WHERE "company_masters"."id" IN (338, 312, 79, 301, 359)
   (0.6ms)  SELECT COUNT(*) FROM "job_applications" WHERE "job_applications"."user_id" = 6
  Company Load (0.6ms)  SELECT  "companies".* FROM "companies" WHERE "companies"."id" = $1 LIMIT $2  [["id", 287], ["LIMIT", 1]]
  CompanyMaster Load (0.5ms)  SELECT  "company_masters".* FROM "company_masters" WHERE "company_masters"."id" = $1 LIMIT $2  [["id", 301], ["LIMIT", 1]]
  Company Load (0.9ms)  SELECT  "companies".* FROM "companies" WHERE "companies"."id" = $1 LIMIT $2  [["id", 288], ["LIMIT", 1]]
  CompanyMaster Load (3.7ms)  SELECT  "company_masters".* FROM "company_masters" WHERE "company_masters"."id" = $1 LIMIT $2  [["id", 359], ["LIMIT", 1]]
  Company Load (0.6ms)  SELECT  "companies".* FROM "companies" WHERE "companies"."id" = $1 LIMIT $2  [["id", 286], ["LIMIT", 1]]
  CompanyMaster Load (2.6ms)  SELECT  "company_masters".* FROM "company_masters" WHERE "company_masters"."id" = $1 LIMIT $2  [["id", 79], ["LIMIT", 1]]
  Company Load (0.6ms)  SELECT  "companies".* FROM "companies" WHERE "companies"."id" = $1 LIMIT $2  [["id", 285], ["LIMIT", 1]]
  CompanyMaster Load (0.7ms)  SELECT  "company_masters".* FROM "company_masters" WHERE "company_masters"."id" = $1 LIMIT $2  [["id", 312], ["LIMIT", 1]]
  Company Load (0.5ms)  SELECT  "companies".* FROM "companies" WHERE "companies"."id" = $1 LIMIT $2  [["id", 260], ["LIMIT", 1]]
  CompanyMaster Load (0.6ms)  SELECT  "company_masters".* FROM "company_masters" WHERE "company_masters"."id" = $1 LIMIT $2  [["id", 338], ["LIMIT", 1]]
  Rendering text template
  Rendered text template (0.4ms)
Completed 200 OK in 576ms (Views: 1.7ms | ActiveRecord: 45.0ms)

As you can see, the side loaded records are fetched as normal. However, after that, individual queries are performed, which shouldn't be happening.

This is my policy:

class JobApplicationPolicy < ApplicationPolicy
  def index?
    current_user.has_role?(:jobseeker)
  end
  alias_method :new?, :index?
  alias_method :edit?, :index?
  alias_method :create?, :index?
  alias_method :update?, :index?
  alias_method :destroy?, :index?

  class Scope < Scope
    def resolve
      # Applied Event is included because model delegates `date_applied` to `applied_event`
      new_scope = scope.includes(:applied_event, :source)

      if current_user.has_role?(:admin) || current_user.has_role?(:advisor)
        new_scope.joins(user: [:account]).where(accounts: { id: current_user.account.id })
      else
        new_scope.where(user: current_user)
      end
    end
  end
end

The scope returned should already include the includes(company: [:company_master]), and explicitly adding it doesn't resolve the issue.

@valscion
Copy link
Member

I was worried we'd have something like this.

Most likely the culprit is in here, where we authorize that you have access to display every related record when using the ?include= to side-load resources.

def authorize_include_item(resource_klass, source_record, include_item)
case include_item
when Hash
# e.g. {articles: [:comments, :author]} when ?include=articles.comments,articles.author
include_item.each do |rel_name, deep|
authorize_include_item(resource_klass, source_record, rel_name)
relationship = resource_klass._relationship(rel_name)
next_resource_klass = relationship.resource_klass
Array.wrap(
source_record.public_send(
relationship.relation_name(context: context)
)
).each do |next_source_record|
deep.each do |next_include_item|
authorize_include_item(
next_resource_klass,
next_source_record,
next_include_item
)
end
end
end
when Symbol
relationship = resource_klass._relationship(include_item)
case relationship
when JSONAPI::Relationship::ToOne
related_record = source_record.public_send(
relationship.relation_name(context: context)
)
return if related_record.nil?
authorizer.include_has_one_resource(source_record, related_record)
when JSONAPI::Relationship::ToMany
authorizer.include_has_many_resource(
source_record,
relationship.resource_klass._model_class
)
else
raise "Unexpected relationship type: #{relationship.inspect}"
end
else
raise "Unknown include directive: #{include_item}"
end
end

Would you be willing to try fixing this somehow?

It might very well be that the current approach requires re-thinking — and that is totally fine as well.

For context, this was the original issue that prompted authorization of side-loaded resources: #7 — and this PR implemented a first approach to that problem: #10

Of particular note is this comment in #10 by @lime:

There is most definitely still work that could be done here, Currently authorize_include_item is such a beast that it's bound to come back and bite us. 😬 Still, it might be best to get this merged and keep coding on top of it, so as to lessen the risk of conflicts for other PRs.

...so most likely your issue just highlights that the method got back and bit us 😅

@adambedford
Copy link
Author

@valscion Thanks for your reply and thorough breakdown of the potential problem. I've taken a brief look at the source but honestly I'm not all that familiar with how the library is doing things, and much less familiar with JSONAPI Resources.

I'm not sure how much I can contribute to the solution, but I will certainly pitch in to test it extensively.

@valscion
Copy link
Member

valscion commented Sep 26, 2017

Would you happen to have time for me to walk you through the code, for example via a video chat? Would that help?

@adambedford
Copy link
Author

I do need to make some time to figure this out since it's blocking currently.

Do you have a sense of how complicated the issue is?

@valscion
Copy link
Member

I'm afraid it might be a bit complicated, and likely warrants rethinking include authorization.

Before writing any code, we should discuss design of potential solutions first, in order for us not to shoot ourselves in the foot. Try to bear with me as I write up some architectural decisions I've made in the past.


As the authorization layer sits on the JR processor layer, we unfortunately have way less information available on the resources and thus have to usually fetch records manually. This is likely the main reason why adding includes(company: [:company_master]) to your policy scope resolver did not help with the N+1 situation.

The current approach works by deducing associations from the ?include= directive, and then eventually calling SomePolicy#index? for has-many associations providing the ActiveRecord class to it, and SomePolicy#show? for has-one associations, providing the actual ActiveRecord model that might be included in the return payload included: [] JSON array.

The reason why we need to do this authorization in the first place, is that I have had a hunch and when looking at JR code, I seem to have found out that include directives haven't gone through the resource association hooks correctly and thus have bypassed our pundit policy scope hooks.

It would be amazing if we could:

  1. Verify this hypothesis is correct
  2. Have a test case that shows pundit policy scope resolver being bypassed

It would mean that without any hook to include directives, our applications could leak information.

On the other hand, if we can show that pundit policy scopes are being used on all levels of the include directive, the current solution might be a huge overkill and an incorrect solution in the first place.

It seems that in #7 (comment) I noticed that policy scopes are being bypassed for JSONAPI::Relationship::ToOne cases in PunditScopedResource#records_for method. If we have a good solution on how we could apply authorization for has-one resources, possibly by only figuring out what the code needs to look like here:

when JSONAPI::Relationship::ToOne
record_or_records

...we might be safe again.

What we'd need would be tests that verify this functionality. Those tests should happen on the request layer for every operation allowing resource sideloading, like I did in #10 with these request specs:

  • GET /resources
  • GET /resources/1
  • PATCH /resources/1
  • POST /resources
  • GET /resources/other-resources
  • GET /resources/another-resource

@adambedford
Copy link
Author

I think this is a bit beyond my scope of understanding for Pundit, JSONAPI Resources and this gem. I may not be able to take the lead on implementing a fix here.

@valscion
Copy link
Member

valscion commented Oct 17, 2017

Thanks for thinking about this and engaging in discussion @adambedford 💞 . I am grateful that you thought about this, and it's OK to admit that this might be too much right now ☺️

One way to get you unblocked at your project could be to subclass JSONAPI::Authorization::AuthorizingProcessor and create a jsonapi-resources Processor that skips authorize_include_item method completely to avoid the N+1 problem.

# Work around N+1 problem with `?include=company,company.company-master` queries
# by subclassing jsonapi-authorization processor to skip include authorization completely
#
# https://github.com/venuu/jsonapi-authorization/issues/83#issuecomment-337152268
class CustomAuthorizingProcessorSkippingInclude < JSONAPI::Authorization::AuthorizingProcessor
  def authorize_include_item
    # no-op
  end
end
# config/initializers/jsonapi-resources.rb
JSONAPI.configure do |config|
  config.default_processor_klass = CustomAuthorizingProcessorSkippingInclude
  # ...
end

I advise you to double-check that you are not leaking information from your database in your API responses.

@ryanto
Copy link

ryanto commented Mar 20, 2018

Hi @valscion

Thanks for the write up, it was helpful in understanding this issue!

Do you still think the best action is to write a test for the following?

Have a test case that shows pundit policy scope resolver being bypassed

@valscion
Copy link
Member

Thanks for the write up, it was helpful in understanding this issue!

Great! I'm glad it was helpful ☺️

Do you still think the best action is to write a test for the following?

It would indeed be very helpful to verify my hypothesis is correct and we need to add manual authorization for the sideloaded resources case. I'd love to be proven wrong here. I have a feeling that this issue might have been resolved with upcoming jsonapi-resources versions, and having the spec in here to verify it would be ace.

@Subtletree
Copy link

Could potentially be fixed when cerebris/jsonapi-resources#1164 lands

@valscion
Copy link
Member

Wow, thanks for linking to that PR @Subtletree! It looks really promising indeed ☺️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

4 participants