Skip to content

Session 1: Data and simple neural networks

Chadwick Boulay edited this page Jan 18, 2019 · 2 revisions

Lesson 1 - Working with data in modern deep learning platforms

The first lesson explores data structures in TensorFlow (tf), focusing on how to represent multichannel microelectrode timeseries. We will load data into tf structures, manipulate them, and visualize them.

1.1: Confirming your Jupyter Notebook environment.

Before you proceed, ensure that you have a functional Jupyter notebook server running and that you can open notebooks in your web browser. The main repository README has instructions on Getting Started. Please also ensure that you have the example datasets available to your environment.

Run the notebook titled "Lesson1-TestEnvironment.ipynb".

TODO: Create that notebook, with some blocks to test keras, tensorflow, gpu, pytorch, fast.ai

  • Other notebook tips
    • Tab-completion
    • shift+tab completion
    • ?
    • ??
    • Press "h" for keyboard shortcuts

1.2: Loading data

For your reference, we have provided the scripts that we use to download datasets from the internet, import them, parse out the associated behaviour/labels, and save them to an intermediate data format. Even though importing your own data will likely be an entirely different process, let's look at the data import process together to go over the general concepts.

Run the notebook named "Lesson1-DataImport.ipynb" TODO: Make this notebook. Import raw data using python-neo. Inspect data structures. Convert to a common data format.

The common format will include the following variables:

  • data: a numpy array for the raw data with shape (n_channels, n_samples). David's data will also need another dimension for segments, or, if segments are unequal length, data will be a list of arrays.
  • timestamps: a numpy array of shape (n_samples,), or a list of arrays for multiple segments.
  • channels: a pandas dataframe of length n_channels. It must have columns for name, pos_x, pos_y, pos_z, and any other relevant information.
  • events: a pandas dataframe of length n_events. It must have columns for timestamp, type, value, and any other relevant information.

1.3: Describing our data

Run the notebook named "Lesson1-ExploreData.ipynb" TODO: Make this notebook.

* Arrays: matrices and tensors. For each of our example datasets:
    * Explain the experiment if applicable. Describe the recording setup (electrodes, amps, other measures). 
    * Print data shape
    * Print some of the contents
        * Look at scale, precision. Data should be standardized.
        * FP16 vs FP32 on GPU.
    * Print additional structure (labels, kinematics, etc.)
    * Visualize individual trials, colour coded by condition
    * Visualize condition-average (much information lost)
    * Visualize covariance structure.
    * Tensor decomposition
* Domain expertise and feature engineering
    * Become experts in neurophysiology of PD --> beta burst length and PAC
        * BG-thalamocortical network has oscillatory activity --> time-frequency transform to spectrogram
        * Beta band-pass filter --> Hilbert transform --> Instantaneous amplitude/phase
    * Become experts in intracortical array neurophysiology --> "Neural modes"
        * High-pass filter
        * Threshold
        * Spike-sorting
        * Demultiplexing?
        * Binned spike counts
        * Counts to rates
        * Dimensionality reduction (tensor decomp; factor analysis)

1.4: Classifying data

We will now try to classify data using a simple 1-layer neural network with linear activations. Run "Lesson1-SimpleClassification.ipynb". TODO: Make this notebook.

    * Features can then be used in 'simpler' ML algorithm
        * Describe Linear Discriminant Analysis using engineered features
            * Analytical solution
        * Show e.g. LDA in neural network parlance. (https://www.jstor.org/stable/2584434)
            * Loss function
            * Regularization
            * Loss gradient
                * Learning rate
            * Why log(p) instead of accuracy?
    * In some cases, neural networks eliminate much of the need for feature engineering.
        * Indeed, with enough data, and enough parameters, it is provable that feature engineering is unnecessary.