Learning deep learning (project 4, language translation)

April 28th, 2017

In this project, I built a neural network for machine translation (English -> French).  I built and trained a sequence to sequence model on a dataset of English and French sentences that can translate new sentences from English to French. The model was trained on my own laptop with a Nvidia M1200 GPU. In the end, we reached ~95% accuracy. And here is an example of the translation:

Input

English Words: ['he', 'saw', 'a', 'old', 'yellow', 'truck', '.']

Prediction

French Words: ['il', 'a', 'vu', 'un', 'vieux', 'camion', 'jaune', '.', '<EOS>']

As I do not know French, I checked Google Translate and it looks like the translation is pretty good.

The full project with code can be found here:
dlnd_language_translation.html

Author: Xu Cui Categories: deep learning, programming Tags:

A few recent NIH grants awarded related to NIRS

April 25th, 2017

The following email was sent from Stork to me. Stork is an easy-to-use app to alert me of new scientific publications and NIH grants based on my own keywords. Below are a few grants awarded in the NIRS field.

Dear Xu,

Stork has brought you 15 new publications.

David Boas

Awarded Grants
Multifunctional, GBM-activatable nanocarriers for image-guided photochemotherapy by Huang-chiao Huang (2017) NIH Grants Awarded (Amount: $179,035) Duration: 2017-04-01 to 2018-03-31

fmri nirs

Awarded Grants
Quantifying the Fluctuations of Intrinsic Brain Activity in Healthy and Patient Populations by Manish Saggar (2017) NIH Grants Awarded (Amount: $249,000) Duration: 2017-03-20 to 2018-02-28

fmri resting state parent child

Awarded Grants
NEUROIMAGING IN EARLY ONSET DEPRESSION: LONGITUDINAL ASSESSMENT OF BRAIN CHANGES by Deanna M Barch (2017) NIH Grants Awarded (Amount: $768,901) Duration: 2017-04-01 to 2018-03-31

hyperscanning

Awarded Grants
Brain-to-brain dynamical Coupling: A New framework for the communication of social knowledge by Uri Hasson (2017) NIH Grants Awarded (Amount: $524,425) Duration: 2017-04-01 to 2018-03-31

nirs brain

Awarded Grants
The Neurodevelopmental MRI Database by John E Richards (2017) NIH Grants Awarded (Amount: $61,625) Duration: 2017-04-01 to 2018-03-31

nirs breast

Awarded Grants
Longitudinal Assessment of Tumor Hypoxia in vivo Using Near-Infrared Spectroscopy by Bing Yu (2017) NIH Grants Awarded (Amount: $399,062) Duration: 2017-01-01 to 2019-01-31

Russell Poldrack, stanford

Awarded Grants
Elucidate the Mechanisms Underlying Inhibition Induced Devaluation by Patrick Graham Bissett (2017) NIH Grants Awarded (Amount: $59,466) Duration: 2017-04-01 to 2018-03-31

Author: Xu Cui Categories: nirs Tags:

Deep learning speed test, my laptop vs AWS g2.2xlarge vs AWS g2.8xlarge vs AWS p2.xlarge vs Paperspace p5000

April 21st, 2017

It requires a lot of resources, especially GPU and GPU memory, to train a deep-learning model efficiently. Here I test the time it took to train a model in 3 computers/servers.

1. My own laptop.
CPU: Intel Core i7-7920HQ (Quad Core 3.10GHz, 4.10GHz Turbo, 8MB 45W, w/Intel HD Graphics 630
Memory: 64G
GPU: NVIDIA Quadro M1200 w/4GB GDDR5, 640 CUDA cores

2. AWS g2.2xlarge
CPU: 8 vCPU, High Frequency Intel Xeon E5-2670 (Sandy Bridge) Processors
Memory: 15G
GPU: 1 GPU, High-performance NVIDIA GPUs, each with 1,536 CUDA cores and 4GB of video memory

3. AWS g2.8xlarge
CPU: 32 vCPU, High Frequency Intel Xeon E5-2670 (Sandy Bridge) Processors
Memory: 60G
GPU: 4 GPU, High-performance NVIDIA GPUs, each with 1,536 CUDA cores and 4GB of video memory

The AMI. I used udacity-dl - ami-60f24d76 (The official AMI of Udacity’s Deep Learning Foundations) from the community AMIs.

Test script. Adopted from https://github.com/fchollet/keras/blob/master/examples/imdb_lstm.py
Time spent is tracked

import time

from keras.preprocessing import sequence
from keras.models import Sequential
from keras.layers import Dense, Embedding
from keras.layers import LSTM
from keras.datasets import imdb

'''Trains a LSTM on the IMDB sentiment classification task.
The dataset is actually too small for LSTM to be of any advantage
compared to simpler, much faster methods such as TF-IDF + LogReg.
Notes:
- RNNs are tricky. Choice of batch size is important,
choice of loss and optimizer is critical, etc.
Some configurations won't converge.
- LSTM loss decrease patterns during training can be quite different
from what you see with CNNs/MLPs/etc.
'''
start_time = time.time()

max_features = 20000
maxlen = 80  # cut texts after this number of words (among top max_features most common words)
batch_size = 32

print('Loading data...')
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)
print(len(x_train), 'train sequences')
print(len(x_test), 'test sequences')

print('Pad sequences (samples x time)')
x_train = sequence.pad_sequences(x_train, maxlen=maxlen)
x_test = sequence.pad_sequences(x_test, maxlen=maxlen)
print('x_train shape:', x_train.shape)
print('x_test shape:', x_test.shape)

print('Build model...')
model = Sequential()
model.add(Embedding(max_features, 128))
model.add(LSTM(128, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(1, activation='sigmoid'))

# try using different optimizers and different optimizer configs
model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

print('Train...')
model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=5,
          validation_data=(x_test, y_test))
score, acc = model.evaluate(x_test, y_test,
                            batch_size=batch_size)
print('Test score:', score)
print('Test accuracy:', acc)
time_taken = time.time() - start_time
print(time_taken)

Result: The table below shows the number of seconds it took to run the above script at 3 sets of parameters (batch_size and LSTM size).

batch_size LSTM size Laptop g2.2xlarge g2.8xlarge
32 128 546 821 878
256 256 155 152 157
1024 256 125 107 110

The result is surprising and confusing to me. I was expecting g2 servers to be much much faster than my own laptop given the capacity of the GPU. But the result shows my laptop is actually faster in smaller parameter values, and only slightly worse in higher parameter values.

I do not know what is going on … Anybody has clue?

[update 2017-04-23]

I was thinking maybe the operating system or some configuration was optimal in AWS. The AMI I used was udacity-dl - ami-60f24d76 (The official AMI of Udacity’s Deep Learning Foundations) from the community AMIs. So I tried a different AMI, a commercial AMI from bitfusion: https://aws.amazon.com/marketplace/pp/B01EYKBEQ0 Maybe it will make a difference? I also tested a new instant type p2.xlarge which has 1 NVIDIA K80 GPU (24G GPU memory) and 60G memory.

batch_size LSTM size Laptop g2.2xlarge g2.8xlarge p2.xlarge
1024 256 125 151 148 101

The result is still disappointing. The AWS g2 instances perform worse than my laptop, and p2 instance only 20% better.

(Of course, GPU is still faster than CPU. On my own laptop, using GPU is ~10x faster than using CPU to run the above code)

[update 2017-04-2]

I checked out Paperspace’s p5000 computer. It comes with a dedicated p5000 GPU with 2048 cores and 16G GPU memory. I tested with the same code. I find the training is much faster on p5000 (ironically, the data downloading part is slow). The training part is 4x faster than my laptop.

batch_size LSTM size Laptop g2.2xlarge g2.8xlarge p2.xlarge paperspace p5000
1024 256 125 151 148 101 50

(Note, the above time includes the data downloading part, which is about 25 seconds).

Paperspace p5000 wins so far!

Author: Xu Cui Categories: deep learning, programming Tags:

RA and Postdoc position at Stanford

April 19th, 2017

Brain Dynamics Lab (bdl.stanford.edu) is a computational neuropsychiatry lab dedicated to developing computational methods for a better understanding of individual differences in brain functioning in healthy and patient populations.

Current projects include – [1] Characterizing spatiotemporal dynamics in brain activity to develop person- and disorder-centric biomarkers; [2] Understanding the role of brain dynamics for optimized learning and performance in individual and team settings; and [3] Developing methods that use network science (or graph theory), connectomics, machine learning, and signal processing for better understanding of brain dynamics.

To apply for either position — please email your CV, names of 3 references and a cover letter to saggar@stanford.edu

——RA position——
Applications are currently being invited for a Research Assistant position in the Brain Dynamics Lab @ Stanford, under the direction of Dr. Manish Saggar.

Responsibilities for this position include:
Developing neuroimaging experiments, collecting neuroimaging data, processing and analysis. Imaging modalities to be handled include functional and structural MRI, EEG, and fNIRS.

Job Qualifications:
[1] Bachelors in Computational Neuroscience, Cognitive Science, Computer Science, or other related scientific fields.
[2] Proficient in programming in Matlab, Python, and other related computing languages
[3] Experience with neuroimaging data collection (fMRI and/or fNIRS)
[4] Experience with one or more MRI/EEG/NIRS data analysis packages (e.g., AFNI, FSL, EEGLAB, HOMER etc.) is preferred, but not required.
[5] Ability to work effectively in a very collaborative and multidisciplinary environment.

—— Postdoc position ——
A full-time postdoctoral position is available in the Brain Dynamics Lab @ Stanford, under the direction of Dr. Manish Saggar.

The postdoctoral fellow will lead computational neuroimaging projects involving multimodal neuroimaging data (EEG+fMRI/fNIRS) to understand the role of fluctuations in intrinsic brain activity in healthy and patient populations. The fellow will participate in collecting and analyzing multimodal neuroimaging data, training and supervising students and research assistants, preparing manuscripts for publication, as well as assisting with grant applications. The position provides a unique training opportunity in computational modeling, neuroimaging, network science and machine learning.

Job Qualifications:
[1] PhD (or MD/PhD) or equivalent in computational neuroscience, computer science, psychology, statistics, bioengineering or a related field.
[2] Strong writing skills demonstrated by peer reviewed publications
[3] Proficient in programming in Matlab, Python, and other related computing languages
[4] Experience with one or more MRI/EEG/NIRS data analysis packages (e.g., AFNI, FSL, EEGLAB, HOMER etc.) is preferred, but not required.
[5] Familiarity with advanced data analysis methods, multivariate statistics, machine learning, data mining and visualization, and cloud computing is a plus.

— — — —

Author: Xu Cui Categories: brain, life Tags:

PubMed 有中文版啦!

April 18th, 2017

PubMed是生物和医学领域必不可少的搜索引擎,每天百万名医生、教授、学生及其他科研人员等都会通过PubMed搜索自己感兴趣的科学文献、病例、综述、最新进展等。

可惜,PubMed是全英文的!!!

为了让中国的医生、科研人员、学生等能更迅速地从PubMed搜寻信息,我们Stork开发了这款 PubMed中文版。

您可以用中英文关键词搜索。中文关键词(比如“皮肤癌”)会自动被翻译成英文。搜索的结果用中英文显示,期刊根据影响因子高亮显示:

您点开一篇文献后,PubMed中文版会把摘要也翻译出来,方便您快速掌握文章内容,以节省您的宝贵时间:

您可能会问,这是机器还是人工翻译的?答案是具有深度学习能力的人工智能!

怎么访问这个网站呢?PubMed中文版的网址是 https://www.storkapp.me/pubmed/。这是Stork的高级功能,需要注册Stork账户并购买这个功能。

Author: Xu Cui Categories: deep learning, stork, writing Tags:

Learning deep learning (project 3, generate TV script)

April 4th, 2017

In this class project, I generated my own Simpsons TV scripts using RNNs trained by the Simpsons dataset of scripts from 27 seasons. The Neural Network generated a new TV script for a scene at Moe’s Tavern.

This is the script generated by the network:

moe_szyslak: ya know, i think i'll volunteer, too.
barney_gumble: to homer! it's me! i'm the prime minister of ireland!
moe_szyslak: hey, homer, show ya, are you and, what's wrong which youse?
moe_szyslak: the point is, this drink is the ultimate?
man: yes, moe.
moe_szyslak: ah, that's okay. it's like my dad always said if you would never been so great.
homer_simpson: yeah, they're on top of the alcohol!
homer_simpson: wayne, maybe i can't.
moe_szyslak: ah, that's okay. it's like my dad always said that when i drink.
homer_simpson: you can't be right now what-- like, you should only drink to get back a favor.
homer_simpson: moe, why you bein' so generous and your name!(looks around) oh you, are you sure?
bart_simpson: square as" golden books," pop i had good writers. william faulkner could write an exhaust pipe gag that.
moe_szyslak:" sheriff andy" can't someone else do it

Does it make sense? :)

The full project with code can be found here:
dlnd_tv_script_generation_submit2.html

Author: Xu Cui Categories: deep learning Tags:

GPU is 40-80x faster than CPU in tensorflow for deep learning

April 4th, 2017

The speed difference of CPU and GPU can be significant in deep learning. But how much? Let’s do a test.

The computer:

The computer I use is a Amazon AWS instance g2.2xlarge (https://aws.amazon.com/ec2/instance-types/). The cost is $0.65/hour, or $15.6/day, or $468/mo. It has one GPU (High-performance NVIDIA GPUs, each with 1,536 CUDA cores and 4GB of video memory), and 8 vCPU (High Frequency Intel Xeon E5-2670 (Sandy Bridge) Processors). Memory is 15G.

The script:

I borrowed Erik Hallstrom’s code from https://medium.com/@erikhallstrm/hello-world-tensorflow-649b15aed18c

The code runs matrix multiplication and calculate the time when using CPU vs GPU.

from __future__ import print_function
import matplotlib
import matplotlib.pyplot as plt
import tensorflow as tf
import time

def get_times(maximum_time):

    device_times = {
        "/gpu:0":[],
        "/cpu:0":[]
    }
    matrix_sizes = range(500,50000,50)

    for size in matrix_sizes:
        for device_name in device_times.keys():

            print("####### Calculating on the " + device_name + " #######")

            shape = (size,size)
            data_type = tf.float16
            with tf.device(device_name):
                r1 = tf.random_uniform(shape=shape, minval=0, maxval=1, dtype=data_type)
                r2 = tf.random_uniform(shape=shape, minval=0, maxval=1, dtype=data_type)
                dot_operation = tf.matmul(r2, r1)

            with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as session:
                    start_time = time.time()
                    result = session.run(dot_operation)
                    time_taken = time.time() - start_time
                    print(result)
                    device_times[device_name].append(time_taken)

            print(device_times)

            if time_taken > maximum_time:
                return device_times, matrix_sizes

device_times, matrix_sizes = get_times(1.5)
gpu_times = device_times["/gpu:0"]
cpu_times = device_times["/cpu:0"]

plt.plot(matrix_sizes[:len(gpu_times)], gpu_times, 'o-')
plt.plot(matrix_sizes[:len(cpu_times)], cpu_times, 'o-')
plt.ylabel('Time')
plt.xlabel('Matrix size')
plt.show()
plt.plot(matrix_sizes[:len(cpu_times)], [a/b for a,b in zip(cpu_times,gpu_times)], 'o-')
plt.ylabel('CPU Time / GPU Time')
plt.xlabel('Matrix size')
plt.show()

Result:
Similar to Erik’s original finding, we found huge difference between CPU and GPU. In this test, GPU is 40 - 80 times faster than CPU.

gpu_vs_cpu time

gpu_vs_cpu time

cpu time / gpu time

cpu time / gpu time

Author: Xu Cui Categories: deep learning Tags:

Updated loadHitachiText.m

March 16th, 2017

Some labs have been using our script readHitachiData.m to load NIRS data from Hitachi ETG machines. We recently found that some output MES data contains abnormal timestamp. For example, the timestamp should be like

16:49:25.406

But for some rows (although rarely), the time is like (note the ending character)

16:49:25.406E

This will cause our script to choke. We just fixed this issue, and you need to replace loadHitachiText.m. The new version can be found here.

Author: Xu Cui Categories: brain, nirs Tags:

Learning deep learning (project 2, image classification)

March 7th, 2017

In this class project, I built a network to classify images in the CIFAR-10 dataset. This dataset is freely available.

The dataset contains 60K color images (32×32 pixel) in 10 classes, with 6K images per class.

Here are the classes in the dataset, as well as 10 random images from each:

airplane
automobile
bird
cat
deer
dog
frog
horse
ship
truck

You can imagine it’s not possible to write down all rules to classify them, so we have to write a program which can learn.

The neural network I created contains 2 hidden layers. The first one is a convolutional layer with max pooling. Then drop out 70% of the connections. The second layer is a fully connected layer with 384 neurons.

def conv_net(x, keep_prob):
    """
    Create a convolutional neural network model
    : x: Placeholder tensor that holds image data.
    : keep_prob: Placeholder tensor that hold dropout keep probability.
    : return: Tensor that represents logits
    """
    # TODO: Apply 1, 2, or 3 Convolution and Max Pool layers
    #    Play around with different number of outputs, kernel size and stride
    # Function Definition from Above:
    #    conv2d_maxpool(x_tensor, conv_num_outputs, conv_ksize, conv_strides, pool_ksize, pool_strides)
    model = conv2d_maxpool(x, conv_num_outputs=18, conv_ksize=(4,4), conv_strides=(1,1), pool_ksize=(8,8), pool_strides=(1,1))
    model = tf.nn.dropout(model, keep_prob)

    # TODO: Apply a Flatten Layer
    # Function Definition from Above:
    #   flatten(x_tensor)
    model = flatten(model)

    # TODO: Apply 1, 2, or 3 Fully Connected Layers
    #    Play around with different number of outputs
    # Function Definition from Above:
    #   fully_conn(x_tensor, num_outputs)
    model = fully_conn(model,384)

    model = tf.nn.dropout(model, keep_prob)

    # TODO: Apply an Output Layer
    #    Set this to the number of classes
    # Function Definition from Above:
    #   output(x_tensor, num_outputs)
    model = output(model,10)

    # TODO: return output
    return model

Then I trained this network using Amazon AWS g2.2xlarge instance. This instance has GPU which is much faster for deep learning (than CPU). I did a simple experiment and find GPU is at least 3 times faster than CPU:

if all layers in gpu: 14 seconds to run 4 epochs,
if conv layer in cpu, other gpu, 36 seconds to run 4 epochs

This is apparently a very crude comparison but GPU is definitely much faster than CPU (at least the ones in AWS g2.2xlarge, cost: $0.65/hour)

Eventually I got ~70% accuracy on the test data, much better than random guess (10%). The time to train the model is ~30 minutes.

You can find my entire code at:
http://www.alivelearn.net/deeplearning/dlnd_image_classification_submission2.html

Author: Xu Cui Categories: brain, deep learning Tags:

Learning deep learning on Udacity

February 9th, 2017

I am taking Udacity’s deep learning class at https://www.udacity.com/course/deep-learning-nanodegree-foundation–nd101

I have done the first project, creating a neural network with 1 hidden layer (so not deep enough :)) to predict bike demands for a bike rental company. The data are real-life data; so this project is actually has real applications. In a nutshell, we can predict how many bikes will be rented in a given day based on factors such as the weather, whether the day is a holiday, etc.

The same model can also be used in other applications such as predicting number of customers of a clothes shop, or of a website.

My homework for this project can be found here:
http://www.alivelearn.net/deeplearning/dlnd-your-first-neural-network.html

Author: Xu Cui Categories: deep learning Tags: