I attended two-day Educational Data Mining (EDM) workshop by Dr. April Galyardt provided by College of Education, University of Georgia from June 9 to 10, 2014. Before beginning of the workshop, I had several questions before beginning of the workshop. My goal of taking this workshop is to get clear answers about these questions.
We learned with her handout.
1. What is EDM?
- EDM is an emerging discipline, concerned with developing methods for exploring the unique types of data that come from educational settings, and using those methods to better understand students, and the settings which they learn in.
- It is similar to Learning analytics and knowledge (LAK). LAK is the measurement, collection, analysis and reporting of data about learners and their contexts, for purposes of understanding and optimizing learning and the environments in which it occurs.
- EDM vs. LAK
- LAK and EDM share the goals of improving education by improving assessment, how problems in education are understood, and how interventions are planned and selected. EDM is more focused on generalizability. While LAK researchers have placed greater focus on addressing needs of multiple stakeholders with information drawn from data. (see, p.3 of handout, for details)
2. When may EDM be useful for my research based on my program of inquiry?
- when need detailed formative assessment
- Compared to Regression, I may use EDM when I need more interpretability for my data.
- useful for design based research (DBR)
- For my research..
- When my regression data does not meet my needs. If it looks like more complicated things are going on in my variables. If I want to do more interpretation for my data.
3. What types of data can I use for EDM design?
- not necessarily to be a big data but anything I want to.
4. What tools may I use for EDM?
- R or Rapidminer (but R is more common for EDM)
<Day 1>
So far, EDM starts from the concept of regression. But the way of finding the best fit model is different from Regression. For example, Regression only accounts on significant data/variables. However, in EDM, with Lasso, some significant variables are critical but also not significant data can be used as predictable variable. Because, Lasso shows when we can have better model(s) when which variable is added. Even though it is not significant, we still can interpret the variable affected the model based on the graph.
<Day 2>
- EDM can do both Inference vs. Prediction.
- Inference: explanatory. testing causal theory, similar to regression, finding causality. i.e., what predicts graduation?
- Prediction: predicting new/future observation. data mining. i.e., Who will graduate?
- Before choosing ways of EDM, I need to make sure whether I mainly need inference or prediction.
- For example, Lasso: What are the most important variables for pre-dicting Y?, then Lasso is a great tool.
-
By April Galyardt |
Supervised |
Unsupervised |
Continuous Y |
Regression- Linear regression- Non-parametric regression-Lasso-Ridge Regression
-Regression Trees |
Latent Variable Models Dimension Reduction Principal Components Independent Components Factor analysis IRT |
Categorical Y |
Classification-Logistic regression-Linear discriminate analysis-k-nearest neighbors-decision trees
-suppo vector machines |
Clustering- k-means- mixture models (=Gaussian model: most commonly used)- hierarchical clustering
– spectral clustering
– diagnostic classification models |