Unsupervised Deep Autoencoders for Feature Extraction with Educational Data

This repository contains the code for the paper (see bosch-dlwed17-camera.pdf) presented at the Deep Learning with Educational Data workshop at the 2017 Educational Data Mining conference.

Citation

Bosch, N., & Paquette, L. (2017). Unsupervised deep autoencoders for feature extraction with educational data. In Deep Learning with Educational Data Workshop at the 10th International Conference on Educational Data Mining.

Requirements

The code was tested with Keras 2.0.3 and Tensorflow 1.1.0 neural network libraries.

Data were from Betty's Brain. These data are required for the code to run, and are not publicly available. However, the code could be (relatively) easily adapted to another dataset.

Model-building steps

Model building generally consists of data preprocessing, autoencoder feature extraction, and supervised learning phases.

Data preprocessing

preprocess_bromp.py - takes raw BROMP files created by the HART application and combines them into an easily-used format
preprocess_timeseries.py - creates timeseries (evenly spaced in time) data from Betty's Brain interaction logs
preprocess_seq.py - creates sequences suitable for training RNN models from the timeseries data; sequences are saved to numpy binary files for faster loading later

Autoencoder feature extraction

ae_lstm.py - this and similar files (e.g., vae_lstm.py) trains the autoencoders
extracy_embeddings.py - takes a trained model, feeds in data sequences, and saves the embeddings generated by the model to be used as features for supervised models
align_embeddings+labels.py - matches up BROMP affect/behavior labels to the embeddings extracted from a model, saving only the rows with labels to create a file with features and labels which can be used for supervised learning

Supervised learning

supervised/ae_feats_test.py - trains a decision tree (CART) model with the autoencoder features
supervised/expert_feats_extract.py - extracts some simple features with the traditional method (manual design by experts) of feature extraction for model building
supervised/expert_feats_test.py - builds a model using the expert features to serve as a baseline

Visualization

visualize_activations.py generates images of model activations by feeding in a random subset of samples to a trained autoencoder and creating histograms of the activations of every layer in the network. For layers with several neurons (> 15), a subset of neurons is sampled to create a more tractable image.

The model structure is also visualized (requires the pydot package).

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
models		models
supervised		supervised
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
ae_lstm.py		ae_lstm.py
ae_lstm_deadrelu.py		ae_lstm_deadrelu.py
ae_lstm_nofuture.py		ae_lstm_nofuture.py
align_embeddings+labels.py		align_embeddings+labels.py
bosch-dlwed17-camera.pdf		bosch-dlwed17-camera.pdf
extract_embeddings.py		extract_embeddings.py
nn_util.py		nn_util.py
preprocess_bromp.py		preprocess_bromp.py
preprocess_seq.py		preprocess_seq.py
preprocess_timeseries.py		preprocess_timeseries.py
vae_lstm.py		vae_lstm.py
visualize_activations.py		visualize_activations.py

License

pnb/dlwed17

Folders and files

Latest commit

History

Repository files navigation

Unsupervised Deep Autoencoders for Feature Extraction with Educational Data

Citation

Requirements

Model-building steps

Data preprocessing

Autoencoder feature extraction

Supervised learning

Visualization

About

Topics

Resources

License

Stars

Watchers

Forks

Languages