Jobs available @ UCSF

May 30th, 2017

Posted for Fumiko Hoeft, Director of BrainLENS at UCSF:

Join us at UCSF Hoeft Neuroscience Lab and Precision Learning Center, a multicampus science of learning initiative consisting of 6 Univ. CA schools (Berkeley, Davis, Irvine, LA, Merced, SF) and Stanford.

We are expanding and hiring!

(1) 2 RESEARCH SCIENTISTS or POSTDOCS. Experts in signal processing, neuroimaging and big data analytics
(2) 2-3 RESEARCH ASSISTANTS. Interested in neuropsychological assessment (English, Spanish, Cantonese)

UCSF is situated at the heart of San Francisco, CA, and is a premier biomedical research institution, ranked second in the world for Neuroscience and Behavior by US News.

Author: Xu Cui Categories: life Tags:

Learning deep learning (project 5, generate new celebrity faces)

May 30th, 2017

In this class project, I used generative adversarial network (GAN) to generate new images of faces, similar to celebrity faces in the database.

The model we use is a deep convolutional network, which has been used widely in image classification.

First, we use the MNIST database (collection of 60,000 handwriting digits). After the training, the model can generate digits similar to what we have in the training set. We only trained it for two epochs.  I believe we can generate more realistic images if we train it longer.

Generate new handwriting digits

Generate new handwriting digits

Then we use ~200,000 images of celebrity faces to train our model. The training takes much longer time, but with my Nvidia 1080 Ti it’s fast. In the beginning, just after learning from 20,000 images, the model was able to generate face-like patterns. Then after the complete 10 epoch training, it generate very clear faces.

Generate new faces

Generate new faces

The project can be found at

Author: Xu Cui Categories: deep learning Tags:

Does Facebook’s “mind reading” project use NIRS?

May 12th, 2017

Facebook just announced that they are experimenting with mind-reading technology using optical neuro-imaging systems. This technology will allow people to type words by thoughts at 100 words per minute. Check out the news here.

Wow! This is unbelievable! The “optical neuro-imaging” technology is probably NIRS (Near Infrared Spectroscopy). As a NIRS researcher myself, I have done some mind-reading experiments and found NIRS signal (blood flow) is too slow for rapid mind-reading. With machine learning technology such as SVM, we can decode a signal at most ~2s after a behavior event (see our paper). This is still too far from a real life application.

But some researchers have suggested that there might be some subtle “fast signal” embedded in NIRS signal. In a 2004 (!) paper, Morren et al published a paper tilted “Detection of fast neuronal signals in the motor cortex from functional near infrared spectroscopy measurements using independent component analysis“. In this paper, they claimed that fast signal, in the range of milliseconds rather than seconds, can be detected.

Maybe this is what Facebook is using?

Author: Xu Cui Categories: brain, nirs Tags:

Learning deep learning (project 4, language translation)

April 28th, 2017

In this project, I built a neural network for machine translation (English -> French).  I built and trained a sequence to sequence model on a dataset of English and French sentences that can translate new sentences from English to French. The model was trained on my own laptop with a Nvidia M1200 GPU. In the end, we reached ~95% accuracy. And here is an example of the translation:


English Words: ['he', 'saw', 'a', 'old', 'yellow', 'truck', '.']


French Words: ['il', 'a', 'vu', 'un', 'vieux', 'camion', 'jaune', '.', '<EOS>']

As I do not know French, I checked Google Translate and it looks like the translation is pretty good.

The full project with code can be found here:

Author: Xu Cui Categories: deep learning, programming Tags:

A few recent NIH grants awarded related to NIRS

April 25th, 2017

The following email was sent from Stork to me. Stork is an easy-to-use app to alert me of new scientific publications and NIH grants based on my own keywords. Below are a few grants awarded in the NIRS field.

Dear Xu,

Stork has brought you 15 new publications.

David Boas

Awarded Grants
Multifunctional, GBM-activatable nanocarriers for image-guided photochemotherapy by Huang-chiao Huang (2017) NIH Grants Awarded (Amount: $179,035) Duration: 2017-04-01 to 2018-03-31

fmri nirs

Awarded Grants
Quantifying the Fluctuations of Intrinsic Brain Activity in Healthy and Patient Populations by Manish Saggar (2017) NIH Grants Awarded (Amount: $249,000) Duration: 2017-03-20 to 2018-02-28

fmri resting state parent child

Awarded Grants
NEUROIMAGING IN EARLY ONSET DEPRESSION: LONGITUDINAL ASSESSMENT OF BRAIN CHANGES by Deanna M Barch (2017) NIH Grants Awarded (Amount: $768,901) Duration: 2017-04-01 to 2018-03-31


Awarded Grants
Brain-to-brain dynamical Coupling: A New framework for the communication of social knowledge by Uri Hasson (2017) NIH Grants Awarded (Amount: $524,425) Duration: 2017-04-01 to 2018-03-31

nirs brain

Awarded Grants
The Neurodevelopmental MRI Database by John E Richards (2017) NIH Grants Awarded (Amount: $61,625) Duration: 2017-04-01 to 2018-03-31

nirs breast

Awarded Grants
Longitudinal Assessment of Tumor Hypoxia in vivo Using Near-Infrared Spectroscopy by Bing Yu (2017) NIH Grants Awarded (Amount: $399,062) Duration: 2017-01-01 to 2019-01-31

Russell Poldrack, stanford

Awarded Grants
Elucidate the Mechanisms Underlying Inhibition Induced Devaluation by Patrick Graham Bissett (2017) NIH Grants Awarded (Amount: $59,466) Duration: 2017-04-01 to 2018-03-31

Author: Xu Cui Categories: nirs Tags:

Deep learning speed test, my laptop vs AWS g2.2xlarge vs AWS g2.8xlarge vs AWS p2.xlarge vs Paperspace p5000

April 21st, 2017

It requires a lot of resources, especially GPU and GPU memory, to train a deep-learning model efficiently. Here I test the time it took to train a model in 3 computers/servers.

1. My own laptop.
CPU: Intel Core i7-7920HQ (Quad Core 3.10GHz, 4.10GHz Turbo, 8MB 45W, w/Intel HD Graphics 630
Memory: 64G
GPU: NVIDIA Quadro M1200 w/4GB GDDR5, 640 CUDA cores

2. AWS g2.2xlarge
CPU: 8 vCPU, High Frequency Intel Xeon E5-2670 (Sandy Bridge) Processors
Memory: 15G
GPU: 1 GPU, High-performance NVIDIA GPUs, each with 1,536 CUDA cores and 4GB of video memory

3. AWS g2.8xlarge
CPU: 32 vCPU, High Frequency Intel Xeon E5-2670 (Sandy Bridge) Processors
Memory: 60G
GPU: 4 GPU, High-performance NVIDIA GPUs, each with 1,536 CUDA cores and 4GB of video memory

The AMI. I used udacity-dl - ami-60f24d76 (The official AMI of Udacity’s Deep Learning Foundations) from the community AMIs.

Test script. Adopted from
Time spent is tracked

import time

from keras.preprocessing import sequence
from keras.models import Sequential
from keras.layers import Dense, Embedding
from keras.layers import LSTM
from keras.datasets import imdb

'''Trains a LSTM on the IMDB sentiment classification task.
The dataset is actually too small for LSTM to be of any advantage
compared to simpler, much faster methods such as TF-IDF + LogReg.
- RNNs are tricky. Choice of batch size is important,
choice of loss and optimizer is critical, etc.
Some configurations won't converge.
- LSTM loss decrease patterns during training can be quite different
from what you see with CNNs/MLPs/etc.
start_time = time.time()

max_features = 20000
maxlen = 80  # cut texts after this number of words (among top max_features most common words)
batch_size = 32

print('Loading data...')
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)
print(len(x_train), 'train sequences')
print(len(x_test), 'test sequences')

print('Pad sequences (samples x time)')
x_train = sequence.pad_sequences(x_train, maxlen=maxlen)
x_test = sequence.pad_sequences(x_test, maxlen=maxlen)
print('x_train shape:', x_train.shape)
print('x_test shape:', x_test.shape)

print('Build model...')
model = Sequential()
model.add(Embedding(max_features, 128))
model.add(LSTM(128, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(1, activation='sigmoid'))

# try using different optimizers and different optimizer configs

print('Train...'), y_train,
          validation_data=(x_test, y_test))
score, acc = model.evaluate(x_test, y_test,
print('Test score:', score)
print('Test accuracy:', acc)
time_taken = time.time() - start_time

Result: The table below shows the number of seconds it took to run the above script at 3 sets of parameters (batch_size and LSTM size).

batch_size LSTM size Laptop g2.2xlarge g2.8xlarge
32 128 546 821 878
256 256 155 152 157
1024 256 125 107 110

The result is surprising and confusing to me. I was expecting g2 servers to be much much faster than my own laptop given the capacity of the GPU. But the result shows my laptop is actually faster in smaller parameter values, and only slightly worse in higher parameter values.

I do not know what is going on … Anybody has clue?

[update 2017-04-23]

I was thinking maybe the operating system or some configuration was optimal in AWS. The AMI I used was udacity-dl - ami-60f24d76 (The official AMI of Udacity’s Deep Learning Foundations) from the community AMIs. So I tried a different AMI, a commercial AMI from bitfusion: Maybe it will make a difference? I also tested a new instant type p2.xlarge which has 1 NVIDIA K80 GPU (24G GPU memory) and 60G memory.

batch_size LSTM size Laptop g2.2xlarge g2.8xlarge p2.xlarge
1024 256 125 151 148 101

The result is still disappointing. The AWS g2 instances perform worse than my laptop, and p2 instance only 20% better.

(Of course, GPU is still faster than CPU. On my own laptop, using GPU is ~10x faster than using CPU to run the above code)

[update 2017-04-02]

I checked out Paperspace’s p5000 computer. It comes with a dedicated p5000 GPU with 2048 cores and 16G GPU memory. I tested with the same code. I find the training is much faster on p5000 (ironically, the data downloading part is slow). The training part is 4x faster than my laptop.

batch_size LSTM size Laptop g2.2xlarge g2.8xlarge p2.xlarge paperspace p5000
1024 256 125 151 148 101 50

(Note, the above time includes the data downloading part, which is about 25 seconds).

Paperspace p5000 wins so far!

[update 2017-05-03]

I purchased Nvidia’s 1080 Ti and installed it on my desktop. It has 3,584 cores and 11G GPU memory. It look 7s for this GPU to train 1 epoch of the above script and it’s 3x times faster than my laptop.

batch_size LSTM size Laptop (1 epoch) 1080 Ti (1 epoch)
1024 256 21 7
Author: Xu Cui Categories: deep learning, programming Tags:

RA and Postdoc position at Stanford

April 19th, 2017

Brain Dynamics Lab ( is a computational neuropsychiatry lab dedicated to developing computational methods for a better understanding of individual differences in brain functioning in healthy and patient populations.

Current projects include – [1] Characterizing spatiotemporal dynamics in brain activity to develop person- and disorder-centric biomarkers; [2] Understanding the role of brain dynamics for optimized learning and performance in individual and team settings; and [3] Developing methods that use network science (or graph theory), connectomics, machine learning, and signal processing for better understanding of brain dynamics.

To apply for either position — please email your CV, names of 3 references and a cover letter to

——RA position——
Applications are currently being invited for a Research Assistant position in the Brain Dynamics Lab @ Stanford, under the direction of Dr. Manish Saggar.

Responsibilities for this position include:
Developing neuroimaging experiments, collecting neuroimaging data, processing and analysis. Imaging modalities to be handled include functional and structural MRI, EEG, and fNIRS.

Job Qualifications:
[1] Bachelors in Computational Neuroscience, Cognitive Science, Computer Science, or other related scientific fields.
[2] Proficient in programming in Matlab, Python, and other related computing languages
[3] Experience with neuroimaging data collection (fMRI and/or fNIRS)
[4] Experience with one or more MRI/EEG/NIRS data analysis packages (e.g., AFNI, FSL, EEGLAB, HOMER etc.) is preferred, but not required.
[5] Ability to work effectively in a very collaborative and multidisciplinary environment.

—— Postdoc position ——
A full-time postdoctoral position is available in the Brain Dynamics Lab @ Stanford, under the direction of Dr. Manish Saggar.

The postdoctoral fellow will lead computational neuroimaging projects involving multimodal neuroimaging data (EEG+fMRI/fNIRS) to understand the role of fluctuations in intrinsic brain activity in healthy and patient populations. The fellow will participate in collecting and analyzing multimodal neuroimaging data, training and supervising students and research assistants, preparing manuscripts for publication, as well as assisting with grant applications. The position provides a unique training opportunity in computational modeling, neuroimaging, network science and machine learning.

Job Qualifications:
[1] PhD (or MD/PhD) or equivalent in computational neuroscience, computer science, psychology, statistics, bioengineering or a related field.
[2] Strong writing skills demonstrated by peer reviewed publications
[3] Proficient in programming in Matlab, Python, and other related computing languages
[4] Experience with one or more MRI/EEG/NIRS data analysis packages (e.g., AFNI, FSL, EEGLAB, HOMER etc.) is preferred, but not required.
[5] Familiarity with advanced data analysis methods, multivariate statistics, machine learning, data mining and visualization, and cloud computing is a plus.

— — — —

Author: Xu Cui Categories: brain, life Tags:

PubMed 有中文版啦!

April 18th, 2017



为了让中国的医生、科研人员、学生等能更迅速地从PubMed搜寻信息,我们Stork开发了这款 PubMed中文版。





Author: Xu Cui Categories: deep learning, stork, writing Tags:

Learning deep learning (project 3, generate TV script)

April 4th, 2017

In this class project, I generated my own Simpsons TV scripts using RNNs trained by the Simpsons dataset of scripts from 27 seasons. The Neural Network generated a new TV script for a scene at Moe’s Tavern.

This is the script generated by the network:

moe_szyslak: ya know, i think i'll volunteer, too.
barney_gumble: to homer! it's me! i'm the prime minister of ireland!
moe_szyslak: hey, homer, show ya, are you and, what's wrong which youse?
moe_szyslak: the point is, this drink is the ultimate?
man: yes, moe.
moe_szyslak: ah, that's okay. it's like my dad always said if you would never been so great.
homer_simpson: yeah, they're on top of the alcohol!
homer_simpson: wayne, maybe i can't.
moe_szyslak: ah, that's okay. it's like my dad always said that when i drink.
homer_simpson: you can't be right now what-- like, you should only drink to get back a favor.
homer_simpson: moe, why you bein' so generous and your name!(looks around) oh you, are you sure?
bart_simpson: square as" golden books," pop i had good writers. william faulkner could write an exhaust pipe gag that.
moe_szyslak:" sheriff andy" can't someone else do it

Does it make sense? :)

The full project with code can be found here:

Author: Xu Cui Categories: deep learning Tags:

GPU is 40-80x faster than CPU in tensorflow for deep learning

April 4th, 2017

The speed difference of CPU and GPU can be significant in deep learning. But how much? Let’s do a test.

The computer:

The computer I use is a Amazon AWS instance g2.2xlarge ( The cost is $0.65/hour, or $15.6/day, or $468/mo. It has one GPU (High-performance NVIDIA GPUs, each with 1,536 CUDA cores and 4GB of video memory), and 8 vCPU (High Frequency Intel Xeon E5-2670 (Sandy Bridge) Processors). Memory is 15G.

The script:

I borrowed Erik Hallstrom’s code from

The code runs matrix multiplication and calculate the time when using CPU vs GPU.

from __future__ import print_function
import matplotlib
import matplotlib.pyplot as plt
import tensorflow as tf
import time

def get_times(maximum_time):

    device_times = {
    matrix_sizes = range(500,50000,50)

    for size in matrix_sizes:
        for device_name in device_times.keys():

            print("####### Calculating on the " + device_name + " #######")

            shape = (size,size)
            data_type = tf.float16
            with tf.device(device_name):
                r1 = tf.random_uniform(shape=shape, minval=0, maxval=1, dtype=data_type)
                r2 = tf.random_uniform(shape=shape, minval=0, maxval=1, dtype=data_type)
                dot_operation = tf.matmul(r2, r1)

            with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as session:
                    start_time = time.time()
                    result =
                    time_taken = time.time() - start_time


            if time_taken > maximum_time:
                return device_times, matrix_sizes

device_times, matrix_sizes = get_times(1.5)
gpu_times = device_times["/gpu:0"]
cpu_times = device_times["/cpu:0"]

plt.plot(matrix_sizes[:len(gpu_times)], gpu_times, 'o-')
plt.plot(matrix_sizes[:len(cpu_times)], cpu_times, 'o-')
plt.xlabel('Matrix size')
plt.plot(matrix_sizes[:len(cpu_times)], [a/b for a,b in zip(cpu_times,gpu_times)], 'o-')
plt.ylabel('CPU Time / GPU Time')
plt.xlabel('Matrix size')

Similar to Erik’s original finding, we found huge difference between CPU and GPU. In this test, GPU is 40 - 80 times faster than CPU.

gpu_vs_cpu time

gpu_vs_cpu time

cpu time / gpu time

cpu time / gpu time

Author: Xu Cui Categories: deep learning Tags: