Optimize grading/progress page to reduce database queries (cache max …

…scores). The progress page did a number of things that make performance terrible for courses with large numbers of problems, particularly if those problems are customresponse CapaModule problems that need to be executed via codejail. The grading code takes pains to not instantiate student state and execute the problem code. If a student has answered the question, the max score is stored in StudentModule. However, if the student hasn't attempted the question yet, we have to run the problem code just to call .max_score() on it. This is necessary in grade() if the student has answered other problems in the assignment (so we can know what to divide by). This is always necessary to know in progress_summary() because we list out every problem there. Code execution can be especially slow if the problems need to invoke codejail. To address this, we create a MaxScoresCache that will cache the max raw score possible for every problem. We select the cache keys so that it will automatically become invalidated when a new version of the course is published. The fundamental assumption here is that a problem cannot have two different max score values for two unscored students. A problem *can* score two students differently such that they have different max scores. So Carlos can have 2/3 on a problem, while Lyla gets 3/4. But if neither Carlos nor Lyla has ever interacted with the problem (i.e. they're just seeing it on their progress page), they must both see 0/4 -- it cannot be the case that Carlos sees 0/3 and Lyla sees 0/4. We used to load all student state into two separate FieldDataCache instances, after which we do a bunch of individual queries for scored items. Part of this split-up was done because of locking problems, but I think we might have gotten overzealous with our manual transaction hammer. In this commit, we consolidate all state access in grade() and progress() to use one shared FieldDataCache. We also use a filter so that we only pull back StudentModule state for things that might possibly affect the grade -- items that either have scores or have children. Because some older XModules do work in their __init__() methods (like Video), instantiating them takes time, particularly on large courses. This commit also changes the code that fetches the grading_context to filter out children that can't possibly affect the grade. Finally, we introduce a ScoresClient that also tries to fetch score information all at once, instead of in separate queries. Technically, we are fetching this information redundantly, but that's because the state and score interfaces are being teased apart as we move forward. Still, this only amounts to one extra SQL query, and has very little impact on performance overall. Much thanks to @adampalay -- his hackathon work in openedx#7168 formed the basis of this. https://openedx.atlassian.net/browse/CSM-17
mitocw · Mar 1, 2016 · 01a6b0d · 01a6b0d
1 parent 23d6e0b
commit 01a6b0d
Show file tree

Hide file tree

Showing 11 changed files with 465 additions and 109 deletions.
diff --git a/common/lib/xmodule/xmodule/course_module.py b/common/lib/xmodule/xmodule/course_module.py
@@ -20,6 +20,7 @@
 from xmodule.mixin import LicenseMixin
 import json
 
+from xblock.core import XBlock
 from xblock.fields import Scope, List, String, Dict, Boolean, Integer, Float
 from .fields import Date
 from django.utils.timezone import UTC
@@ -1315,11 +1316,15 @@ def grading_context(self):
         except UndefinedContext:
             module = self
 
+        def possibly_scored(usage_key):
+            """Can this XBlock type can have a score or children?"""
+            return usage_key.block_type in self.block_types_affecting_grading
+
         all_descriptors = []
         graded_sections = {}
 
         def yield_descriptor_descendents(module_descriptor):
-            for child in module_descriptor.get_children():
+            for child in module_descriptor.get_children(usage_key_filter=possibly_scored):
                 yield child
                 for module_descriptor in yield_descriptor_descendents(child):
                     yield module_descriptor
@@ -1345,6 +1350,15 @@ def yield_descriptor_descendents(module_descriptor):
         return {'graded_sections': graded_sections,
                 'all_descriptors': all_descriptors, }
 
+    @lazy
+    def block_types_affecting_grading(self):
+        """Return all block types that could impact grading (i.e. scored, or having children)."""
+        return frozenset(
+            cat for (cat, xblock_class) in XBlock.load_classes() if (
+                getattr(xblock_class, 'has_score', False) or getattr(xblock_class, 'has_children', False)
+            )
+        )
+
     @staticmethod
     def make_id(org, course, url_name):
         return '/'.join([org, course, url_name])

diff --git a/lms/djangoapps/ccx/tests/test_field_override_performance.py b/lms/djangoapps/ccx/tests/test_field_override_performance.py
@@ -29,7 +29,11 @@
 
 @attr('shard_1')
 @mock.patch.dict(
-    'django.conf.settings.FEATURES', {'ENABLE_XBLOCK_VIEW_ENDPOINT': True}
+    'django.conf.settings.FEATURES',
+    {
+        'ENABLE_XBLOCK_VIEW_ENDPOINT': True,
+        'ENABLE_MAX_SCORE_CACHE': False,
+    }
 )
 @ddt.ddt
 class FieldOverridePerformanceTestCase(ProceduralCourseTestMixin,
@@ -173,18 +177,18 @@ class TestFieldOverrideMongoPerformance(FieldOverridePerformanceTestCase):
 
     TEST_DATA = {
         # (providers, course_width, enable_ccx): # of sql queries, # of mongo queries, # of xblocks
-        ('no_overrides', 1, True): (27, 7, 14),
-        ('no_overrides', 2, True): (135, 7, 85),
-        ('no_overrides', 3, True): (595, 7, 336),
-        ('ccx', 1, True): (27, 7, 14),
-        ('ccx', 2, True): (135, 7, 85),
-        ('ccx', 3, True): (595, 7, 336),
-        ('no_overrides', 1, False): (27, 7, 14),
-        ('no_overrides', 2, False): (135, 7, 85),
-        ('no_overrides', 3, False): (595, 7, 336),
-        ('ccx', 1, False): (27, 7, 14),
-        ('ccx', 2, False): (135, 7, 85),
-        ('ccx', 3, False): (595, 7, 336),
+        ('no_overrides', 1, True): (23, 7, 14),
+        ('no_overrides', 2, True): (68, 7, 85),
+        ('no_overrides', 3, True): (263, 7, 336),
+        ('ccx', 1, True): (23, 7, 14),
+        ('ccx', 2, True): (68, 7, 85),
+        ('ccx', 3, True): (263, 7, 336),
+        ('no_overrides', 1, False): (23, 7, 14),
+        ('no_overrides', 2, False): (68, 7, 85),
+        ('no_overrides', 3, False): (263, 7, 336),
+        ('ccx', 1, False): (23, 7, 14),
+        ('ccx', 2, False): (68, 7, 85),
+        ('ccx', 3, False): (263, 7, 336),
     }
 
 
@@ -196,16 +200,16 @@ class TestFieldOverrideSplitPerformance(FieldOverridePerformanceTestCase):
     __test__ = True
 
     TEST_DATA = {
-        ('no_overrides', 1, True): (27, 4, 9),
-        ('no_overrides', 2, True): (135, 19, 54),
-        ('no_overrides', 3, True): (595, 84, 215),
-        ('ccx', 1, True): (27, 4, 9),
-        ('ccx', 2, True): (135, 19, 54),
-        ('ccx', 3, True): (595, 84, 215),
-        ('no_overrides', 1, False): (27, 4, 9),
-        ('no_overrides', 2, False): (135, 19, 54),
-        ('no_overrides', 3, False): (595, 84, 215),
-        ('ccx', 1, False): (27, 4, 9),
-        ('ccx', 2, False): (135, 19, 54),
-        ('ccx', 3, False): (595, 84, 215),
+        ('no_overrides', 1, True): (23, 4, 9),
+        ('no_overrides', 2, True): (68, 19, 54),
+        ('no_overrides', 3, True): (263, 84, 215),
+        ('ccx', 1, True): (23, 4, 9),
+        ('ccx', 2, True): (68, 19, 54),
+        ('ccx', 3, True): (263, 84, 215),
+        ('no_overrides', 1, False): (23, 4, 9),
+        ('no_overrides', 2, False): (68, 19, 54),
+        ('no_overrides', 3, False): (263, 84, 215),
+        ('ccx', 1, False): (23, 4, 9),
+        ('ccx', 2, False): (68, 19, 54),
+        ('ccx', 3, False): (263, 84, 215),
     }