-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Professional Education ETL Pipeline #1210
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works great! Back to you with a request for some variable name changes and a question
learning_resources/etl/loaders.py
Outdated
**{unique_field: unique_value}, | ||
platform=platform, | ||
resource_type=resource_type, | ||
published=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a little confused by this logic. Why is the delete necessary? And does that mean that courses that are imported as unpublished, such as mitpe courses with no future runs in the logic below are deleted and recreated every run?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code should be run only when the readable_id is not the unique_field
value, and in that case if there may be multiple unpublished resources with the same value for unique_field
that need to be removed to avoid database duplicate errors.
Good catch on the mitpe courses with no future runs though, those probably should just be skipped altogether instead of (re)created as unpublished. Not much point in importing unpublished resources.
learning_resources/etl/mitpe.py
Outdated
img_metadata = get_related_object(img_url) | ||
if img_metadata: | ||
field_image = img_metadata["relationships"]["field_media_image"] | ||
img_url_2 = field_image["links"]["related"]["href"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be called image_data_url
or something like that?
learning_resources/etl/mitpe.py
Outdated
Returns: | ||
list of dict: list containing topic dicts with a name attribute | ||
""" | ||
topic_data = resource_data["relationships"]["field_course_topics"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be
topics_exist = resource_data["relationships"]["field_course_topics"]["data"]
instead of reusing the variable name topic_data
to mean different things
@abeglova Peter and I are going to have a meeting at some point with the Professional Education API developer to discuss possible enhancements like clearly delineating programs vs courses. So I'm going to label this as "Blocked" until after that meeting. |
️✅ There are no secrets present in this pull request anymore.If these secrets were true positive and are still valid, we highly recommend you to revoke them. 🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request. |
d48bc1a
to
9a5299a
Compare
580fcc4
to
f3495f4
Compare
45537cf
to
ddcb055
Compare
27aec69
to
76542a7
Compare
023533a
to
41770bb
Compare
Blocked until the new API has production-ready data.
What are the relevant tickets?
Closes https://github.com/mitodl/hq/issues/4731
Related
ol-infrastructure
PR: mitodl/ol-infrastructure#2536Description (What does it do?)
Screenshots (if appropriate):
How can this be tested?
Set the following values in your .env file:
On the main branch, run
./manage.py backpopulate_prolearn_data
When it is done, there should be roughly >130 learning resources with
etl_source=prolearn
andoffered_by=mitpe
. They will all be courses and not programs. http://api.open.odl.local:8063/admin/learning_resources/learningresource/?o=2&offered_by__code__exact=mitpeIf you don't already have a populated featured learning path for Professional Education, run
./manage.py populate_featured_lists
to create oneSwitch to this branch. Run
./manage.py backpopulate_mitpe_data
followed by./manage.py backpopulate_prolearn_data
) When it is done, most/all the same courses returned above should now haveetl_source=mitpe
and have newreadable_id
values. Some that were courses in prolearn should be unpublished, and created as new programs instead of courses (about 27 published programs). There should not be any published dupes from different sources (prolearn and mitpe). Some will be unpublished now due to changed criteria (past enrollment end date or past end date).If you go to
http://open.odl.local:8062/learningpaths
and open the Professional Education featured learning path, it will be empty. Run the following to reassign the old featured courses from prolearn to. their new equivalents