Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eager-load the user feed to avoid repetitive queries #17

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

kgilpin
Copy link

@kgilpin kgilpin commented Jul 23, 2020

I would like to suggest a modification to the User.feed, to eager load the image_attachment.blob and user associations. To show the reason for this, I would like to direct you to a couple of visualizations of this particular code path. The visualizations are created using a tool called AppLand, which is a project that I’m working on. AppLand works by recording a code execution path (such as a test case), then using the recordings to build visualizations of code behavior and design.

In this case, I will show two AppMaps : one without eager loading, and one with it.

In both cases, the code execution path is the HTTP server request POST /microposts, handled in the context of the MicropostInterfaceTest called "micropost interface".

The first AppMap shows the current behavior of the sample app, without eager loading:

  1. Micropost interface test - lazy load feed attachments and users

Note that there are many SQL queries repeated to fetch the attachments and user association for each feed item.

A second AppMap shows the same request flow, but in this case all the feed data is eager-loaded:

  1. Micropost interface test - eager load the feed

You can see that the feed is now loaded in just three queries:

Thank you for your contributions to the Rails community, and I hope that you find this enhancement useful.

@mhartl
Copy link
Collaborator

mhartl commented Jul 23, 2020

Thanks for the details! I might be interested in adding an exercise on this to the end of the tutorial. Could you explain in a little more depth exactly what eager loading does in this context? I’m especially interested in knowing what

.includes(image_attachment: :blob)

does. I can’t recall seeing :blob used in this context, and it’s not covered in the Rails guide on eager loading.

Also, would this code be equivalent?

Micropost.where("user_id IN (#{following_ids})
                 OR user_id = :user_id", user_id: id)
         .includes(image_attachment: :blob, :user)

If you could help me craft an explanation suitable for the end of Chapter 14, I’d appreciate it. Also, an easy way to find the number of queries using the Rails console would be very helpful since that would allow tutorial readers to compare the two approaches directly.

@kgilpin
Copy link
Author

kgilpin commented Jul 24, 2020

This is the source I used for understanding how to eager load active storage:

https://blog.saeloun.com/2020/03/06/eagerload-active-storage-models.html

An equivalent one-liner is:

  .includes(:user, image_attachment: :blob)

I can think of two quick ways to inspect how many queries are being issued:

  • The shell-ish way (less good) - grep the queries out of the test.log and count them
  • The rails-ish way (probably better) - subscribe to sql.active_record and count the events:
      ActiveSupport::Notifications.subscribe 'sql.active_record', QueryCountHandler.new

QueryCountHandler would define:

      def call(name, started, finished, unique_id, payload)

at simply count the number of times it was called.

@mhartl
Copy link
Collaborator

mhartl commented Jul 24, 2020

Great, thanks. Do you have a suggestion for a full implementation of either counting method you mention? It would need to be simple enough to include in an exercise. (I can’t quite figure out exactly how to do it based on your description, and if I can’t, then tutorial readers certainly won’t be able to.)

@mhartl mhartl closed this Jul 24, 2020
@mhartl mhartl reopened this Jul 24, 2020
@kgilpin
Copy link
Author

kgilpin commented Jul 24, 2020

I've pushed an update which counts the number of queries in a block of the test. Here's what it looks like when I run it with and without the eager load. The number of SQL queries drops from 67 to 9.

Without eager loading

➜  sample_app_6th_ed git:(eager-load-feed-pr) ✗ bundle exec rails test test/integration/microposts_interface_test.rb -b
Running via Spring preloader in process 25969
Started with run options -b --seed 13165

Counting SQL queries for: Create a valid post--=---=---=---=---=---=---=---=---=---=---=---=---=---=---=---=---=---=---] 0% Time: 00:00:00,  ETA: ??:??:??
Observed 14 SQL queries
Counting SQL queries for: Follow redirect after posting
Observed 67 SQL queries
  1/1: [=============================================================================================================] 100% Time: 00:00:00, Time: 00:00:00

Finished in 0.78743s
1 tests, 11 assertions, 0 failures, 0 errors, 0 skips

With eager loading

➜  sample_app_6th_ed git:(eager-load-feed-pr) ✗ bundle exec rails test test/integration/microposts_interface_test.rb -b
Running via Spring preloader in process 26064
Started with run options -b --seed 4081

Counting SQL queries for: Create a valid post--=---=---=---=---=---=---=---=---=---=---=---=---=---=---=---=---=---=---] 0% Time: 00:00:00,  ETA: ??:??:??
Observed 14 SQL queries
Counting SQL queries for: Follow redirect after posting
Observed 9 SQL queries
  1/1: [=============================================================================================================] 100% Time: 00:00:00, Time: 00:00:00

Finished in 0.69565s
1 tests, 11 assertions, 0 failures, 0 errors, 0 skips

Summary

The relevant bits are here:

Counting SQL queries for: Follow redirect after posting
Observed 67 SQL queries
...
Counting SQL queries for: Follow redirect after posting
Observed 9 SQL queries

@kgilpin
Copy link
Author

kgilpin commented Aug 24, 2020

@mhartl Hi, what do you think about my suggestion above?

@mhartl
Copy link
Collaborator

mhartl commented Aug 24, 2020

Thanks for the reminder. I’ll get to this when I can.

@mhartl
Copy link
Collaborator

mhartl commented Sep 16, 2020

Looking over this again, I’m thinking it might make a good post for the Learn Enough blog. Would you be interested in collaborating on that? If so, I can get you started with a template. If not, I’ll plan on circling back to it myself when I get the chance.

@kgilpin
Copy link
Author

kgilpin commented Sep 16, 2020 via email

@mhartl
Copy link
Collaborator

mhartl commented Sep 17, 2020

Cool! If you’re willing to roll up your sleeves and learn a little LaTeX, there’s a repo I’ve just added you to with a sample article to get us started. There’s a README with initial instructions. Please let me know if you have any questions.

@kgilpin
Copy link
Author

kgilpin commented Sep 30, 2020

Hi, I just want to let you know I'm working on a draft. I can see now how I didn't include the code to hook the SQL in order to get the query counts. I will include that.

Can you let me know how the collaboration works in terms of linking back to https://app.land, cross-posting to the AppLand.com blog, etc? Naturally, I would like to raise awareness among developers of how AppLand can help them to:

a) learn and understand unfamiliar code
b) find and fix the kinds of problems that static analysis cannot find

@kgilpin
Copy link
Author

kgilpin commented Oct 1, 2020

I have a complete draft ready, please see https://blog-eager-loading.herokuapp.com/eager-loading

@kgilpin
Copy link
Author

kgilpin commented Oct 5, 2020

Nudge

@mhartl
Copy link
Collaborator

mhartl commented Oct 6, 2020 via email

@mhartl
Copy link
Collaborator

mhartl commented Oct 7, 2020

Excellent work! I’ve pushed up an edited version of the post on the edits branch. Please see the notes marked with three asterisks (***) and let me know when you’ve got a new version for me to look at.

@kgilpin
Copy link
Author

kgilpin commented Oct 9, 2020

Hi, I have merged the edits branch and made some changes. There is one specific TODO left for you, to markup the new histogram in the same way as you did the first (so you can match the style). Please take a look and let me know your thoughts.

@mhartl
Copy link
Collaborator

mhartl commented Oct 9, 2020

Looks good! It seems like there’s a small discrepancy in the new histogram, with a ~120-query data point that’s absent in the original. Any idea what’s going on there?
optimized_query_histogram

@kgilpin
Copy link
Author

kgilpin commented Oct 9, 2020 via email

@kgilpin
Copy link
Author

kgilpin commented Oct 12, 2020 via email

@kgilpin
Copy link
Author

kgilpin commented Oct 12, 2020

Here is the data for just the integration tests: https://docs.google.com/spreadsheets/d/1tmIOI3RUwKhYarDViU9F37eTG4MynhN8oQSoChyVopI/edit#gid=1407197114

This data set is free from the odd artifact in the unit test that you noticed, so maybe you will prefer to use these figures instead.

I did add memoization to current_user because otherwise current_user is triggering a ton of duplicate SELECT statements.

@mhartl
Copy link
Collaborator

mhartl commented Oct 15, 2020

Cool. I’ll plan to finish up the post when I get the chance.

@kgilpin
Copy link
Author

kgilpin commented Oct 15, 2020

Ok great. Do you want any help with that? I guess you need to move it over to the official site.

@mhartl
Copy link
Collaborator

mhartl commented Oct 15, 2020

I can take care of it. I’ll plan to let you know if anything else comes up. Thanks!

@kgilpin
Copy link
Author

kgilpin commented Nov 1, 2020

Hi, have you selected a publication date? I am looking forward to seeing this project completed.

@mhartl
Copy link
Collaborator

mhartl commented Nov 1, 2020

It’s high on my list but I’m currently overwhelmed with family obligations. It might be a few weeks before I can ship it. Thanks in advance for your patience.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants