Skip to content

Public repository for the Science Education LDA Project

Notifications You must be signed in to change notification settings

uio-ccse/ScienceEducationLDA

Repository files navigation

ScienceEducationLDA

Public repository for the Science Education LDA Project

Description

This is the public repository for the Science Education LDA research project, which is maintained by Tor Ole Odden and Alessandro Marin.

This project is based on the method published in Physical Review Physics Education Research 1. Also refer to the CCSE/PERC_TopicModel repository.

Jupyter Notebook

See the Science Education LDA Notebook, which contains an extract of the methods described in 1.

Installation

To run the main notebook PERC_TopicModeling.ipynb install the required packages:

pip install -r requirements.txt --user

A file (scied_words_bigrams_V5.pkl) contains the corpus obtained after processing the papers should be downloaded separately. Its size is about 200MB and the link will be posted soon.

The required packages include Gensim (unsupervised semantic modelling on text), NLTK (Natural Language Tool Kit), LDAVis (interactive topic model visualization), scikit-learn, along with standard data analysis libraries such as pandas, numpy, and matplotlib.

Preliminary Results

Graph of average topic prevalence over time: AvgPrev.html

Graph of cumulative topic prevalence over time: CumuPrev.html

Contact

Questions can be directed to Tor Ole Odden

Literature

1: Tor Ole B. Odden and Alessandro Marin, Marcos D. Caballero. Thematic Analysis of 18 Years of Physics Education Research Conference Proceedings using Natural Language Processing, Physical Review Physics Education Research, 2020. Link

About

Public repository for the Science Education LDA Project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published