Archive for the ‘programming’ Category

Hyperscanning experiment file (matlab)

June 8th, 2017

Below is the experiment script (in MatLab) for our hyperscanning project (”NIRS-based hyperscanning reveals increased interpersonal coherence in superior frontal cortex during cooperation.”). For detailed information please refer to

Psychtoolbox-3 is required.


Author: Xu Cui Categories: brain, matlab, nirs, programming, psychToolbox Tags:

Learning deep learning (project 4, language translation)

April 28th, 2017

In this project, I built a neural network for machine translation (English -> French).  I built and trained a sequence to sequence model on a dataset of English and French sentences that can translate new sentences from English to French. The model was trained on my own laptop with a Nvidia M1200 GPU. In the end, we reached ~95% accuracy. And here is an example of the translation:


English Words: ['he', 'saw', 'a', 'old', 'yellow', 'truck', '.']


French Words: ['il', 'a', 'vu', 'un', 'vieux', 'camion', 'jaune', '.', '<EOS>']

As I do not know French, I checked Google Translate and it looks like the translation is pretty good.

The full project with code can be found here:

Author: Xu Cui Categories: deep learning, programming Tags:

Deep learning speed test, my laptop vs AWS g2.2xlarge vs AWS g2.8xlarge vs AWS p2.xlarge vs Paperspace p5000

April 21st, 2017

It requires a lot of resources, especially GPU and GPU memory, to train a deep-learning model efficiently. Here I test the time it took to train a model in 3 computers/servers.

1. My own laptop.
CPU: Intel Core i7-7920HQ (Quad Core 3.10GHz, 4.10GHz Turbo, 8MB 45W, w/Intel HD Graphics 630
Memory: 64G
GPU: NVIDIA Quadro M1200 w/4GB GDDR5, 640 CUDA cores

2. AWS g2.2xlarge
CPU: 8 vCPU, High Frequency Intel Xeon E5-2670 (Sandy Bridge) Processors
Memory: 15G
GPU: 1 GPU, High-performance NVIDIA GPUs, each with 1,536 CUDA cores and 4GB of video memory

3. AWS g2.8xlarge
CPU: 32 vCPU, High Frequency Intel Xeon E5-2670 (Sandy Bridge) Processors
Memory: 60G
GPU: 4 GPU, High-performance NVIDIA GPUs, each with 1,536 CUDA cores and 4GB of video memory

The AMI. I used udacity-dl - ami-60f24d76 (The official AMI of Udacity’s Deep Learning Foundations) from the community AMIs.

Test script. Adopted from
Time spent is tracked

import time

from keras.preprocessing import sequence
from keras.models import Sequential
from keras.layers import Dense, Embedding
from keras.layers import LSTM
from keras.datasets import imdb

'''Trains a LSTM on the IMDB sentiment classification task.
The dataset is actually too small for LSTM to be of any advantage
compared to simpler, much faster methods such as TF-IDF + LogReg.
- RNNs are tricky. Choice of batch size is important,
choice of loss and optimizer is critical, etc.
Some configurations won't converge.
- LSTM loss decrease patterns during training can be quite different
from what you see with CNNs/MLPs/etc.
start_time = time.time()

max_features = 20000
maxlen = 80  # cut texts after this number of words (among top max_features most common words)
batch_size = 32

print('Loading data...')
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)
print(len(x_train), 'train sequences')
print(len(x_test), 'test sequences')

print('Pad sequences (samples x time)')
x_train = sequence.pad_sequences(x_train, maxlen=maxlen)
x_test = sequence.pad_sequences(x_test, maxlen=maxlen)
print('x_train shape:', x_train.shape)
print('x_test shape:', x_test.shape)

print('Build model...')
model = Sequential()
model.add(Embedding(max_features, 128))
model.add(LSTM(128, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(1, activation='sigmoid'))

# try using different optimizers and different optimizer configs

print('Train...'), y_train,
          validation_data=(x_test, y_test))
score, acc = model.evaluate(x_test, y_test,
print('Test score:', score)
print('Test accuracy:', acc)
time_taken = time.time() - start_time

Result: The table below shows the number of seconds it took to run the above script at 3 sets of parameters (batch_size and LSTM size).

batch_size LSTM size Laptop g2.2xlarge g2.8xlarge
32 128 546 821 878
256 256 155 152 157
1024 256 125 107 110

The result is surprising and confusing to me. I was expecting g2 servers to be much much faster than my own laptop given the capacity of the GPU. But the result shows my laptop is actually faster in smaller parameter values, and only slightly worse in higher parameter values.

I do not know what is going on … Anybody has clue?

[update 2017-04-23]

I was thinking maybe the operating system or some configuration was optimal in AWS. The AMI I used was udacity-dl - ami-60f24d76 (The official AMI of Udacity’s Deep Learning Foundations) from the community AMIs. So I tried a different AMI, a commercial AMI from bitfusion: Maybe it will make a difference? I also tested a new instant type p2.xlarge which has 1 NVIDIA K80 GPU (24G GPU memory) and 60G memory.

batch_size LSTM size Laptop g2.2xlarge g2.8xlarge p2.xlarge
1024 256 125 151 148 101

The result is still disappointing. The AWS g2 instances perform worse than my laptop, and p2 instance only 20% better.

(Of course, GPU is still faster than CPU. On my own laptop, using GPU is ~10x faster than using CPU to run the above code)

[update 2017-04-02]

I checked out Paperspace’s p5000 computer. It comes with a dedicated p5000 GPU with 2048 cores and 16G GPU memory. I tested with the same code. I find the training is much faster on p5000 (ironically, the data downloading part is slow). The training part is 4x faster than my laptop.

batch_size LSTM size Laptop g2.2xlarge g2.8xlarge p2.xlarge paperspace p5000
1024 256 125 151 148 101 50

(Note, the above time includes the data downloading part, which is about 25 seconds).

Paperspace p5000 wins so far!

[update 2017-05-03]

I purchased Nvidia’s 1080 Ti and installed it on my desktop. It has 3,584 cores and 11G GPU memory. It look 7s for this GPU to train 1 epoch of the above script and it’s 3x times faster than my laptop.

batch_size LSTM size Laptop (1 epoch) 1080 Ti (1 epoch)
1024 256 21 7
Author: Xu Cui Categories: deep learning, programming Tags:

We contributed to MatLab (wavelet toolbox)

January 25th, 2017

We use MatLab a lot! It’s the major program for brain imaging data analysis in our lab. However, I never thought we could actually contribute to MatLab’s development.

In MatLab 2016, there is a toolbox called Wavelet Toolbox. If you read the document on wavelet coherence (link below), you will find that they used our NIRS data as an example:

Back in 2015/4/9, Wayne King from MathWorks contacted us, saying that they are developing the wavelet toolbox and asking if we can share some data as an example. We did. I’m glad that it’s part of the package now.

The following section are from the page above:

Find Coherent Oscillations in Brain Activity

In the previous examples, it was natural to view one time series as influencing the other. In these cases, examining the lead-lag relationship between the data is informative. In other cases, it is more natural to examine the coherence alone.

For an example, consider near-infrared spectroscopy (NIRS) data obtained in two human subjects. NIRS measures brain activity by exploiting the different absorption characteristics of oxygenated and deoxygenated hemoglobin. The data is taken from Cui, Bryant, & Reiss (2012) and was kindly provided by the authors for this example. The recording site was the superior frontal cortex for both subjects. The data is sampled at 10 Hz. In the experiment, the subjects alternatively cooperated and competed on a task. The period of the task was approximately 7.5 seconds.

load NIRSData;
hold on
legend('Subject 1','Subject 2','Location','NorthWest')
title('NIRS Data')
grid on;
hold off;

Obtain the wavelet coherence as a function of time and frequency. You can use wcoherence to output the wavelet coherence, cross-spectrum, scale-to-frequency, or scale-to-period conversions, as well as the cone of influence. In this example, the helper function helperPlotCoherence packages some useful commands for plotting the outputs of wcoherence.

[wcoh,~,F,coi] = wcoherence(NIRSData(:,1),NIRSData(:,2),10,'numscales',16);

In the plot, you see a region of strong coherence throughout the data collection period around 1 Hz. This results from the cardiac rhythms of the two subjects. Additionally, you see regions of strong coherence around 0.13 Hz. This represents coherent oscillations in the subjects’ brains induced by the task. If it is more natural to view the wavelet coherence in terms of periods rather than frequencies, you can use the ‘dt’ option and input the sampling interval. With the ‘dt’ option, wcoherence provides scale-to-period conversions.

[wcoh,~,P,coi] = wcoherence(NIRSData(:,1),NIRSData(:,2),seconds(0.1),...
    'Time (secs)','Periods (Seconds)');

Again, note the coherent oscillations corresponding to the subjects’ cardiac activity occurring throughout the recordings with a period of approximately one second. The task-related activity is also apparent with a period of approximately 8 seconds. Consult Cui, Bryant, & Reiss (2012) for a more detailed wavelet analysis of this data.


In this example you learned how to use wavelet coherence to look for time-localized coherent oscillatory behavior in two time series. For nonstationary signals, it is often more informative if you have a measure of coherence that provides simultaneous time and frequency (period) information. The relative phase information obtained from the wavelet cross-spectrum can be informative when one time series directly affects oscillations in the other.


Cui, X., Bryant, D.M., and Reiss. A.L. “NIRS-Based hyperscanning reveals increased interpersonal coherence in superior frontal cortex during cooperation”, Neuroimage, 59(3), pp. 2430-2437, 2012.

Grinsted, A., Moore, J.C., and Jevrejeva, S. “Application of the cross wavelet transform and wavelet coherence to geophysical time series”, Nonlin. Processes Geophys., 11, pp. 561-566, 2004.

Maraun, D., Kurths, J. and Holschneider, M. “Nonstationary Gaussian processes in wavelet domain: Synthesis, estimation and significance testing”, Phys. Rev. E 75, pp. 016707(1)-016707(14), 2007.

Torrence, C. and Webster, P. “Interdecadal changes in the ESNO-Monsoon System,” J.Clim., 12, pp. 2679-2690, 1999.

Author: Xu Cui Categories: brain, matlab, nirs, programming Tags:

Deep learning

January 20th, 2017

In the past months, I am shocked by the progress of artificial intelligence (mostly implemented by deep learning). In March 2016, AlphaGo won Lee Sedol (李世石) in Weiqi (go). I had mixed feelings, excited, sad, and some fear. Around new year of 2017, AlphaGo won 60 games in a row against numerous top professional Weiqi players in China, Korea and Japan, including #1 Ke Jie. There is no doubt AlphaGo is at least a level better than top human player. It’s interesting to see that the way how people call AlphaGo has changed from “dog” to “Teacher Ah”, reflecting the change of our attitude toward artificial intelligence.

Game is not the only area where AI shocked me. Below are some area AI / deep learning has done extremely well:

  1. convert text to handwriting: Try yourself at Maybe in the future you can use AI to write your greeting cards.
  2. Apply artistic style to drawings. Check out and
  3. Fluid simulation
  4. Generate a text description of an image
  5. Real time facial expression transfer
  6. Language translation
  7. Handwriting recognition (try it here: This is not new progress but still worth mentioning
  8. Medical diagnosis
  9. And many more. I will update this list constantly
In the field of biology and medicine, deep learning also progresses rapidly. Below is the number of publications using keyword “deep learning” in PubMed.
deep learning publications in PubMed
deep learning publications in PubMed
“Deep Learning” is also a keyword in my Stork. I got new papers almost every day.
Some resources to learn more about deep learning and keep updated:
  1. Track “Deep Learning” publications using Stork
  2. Subscribe youtube channel Two Minute Papers ( It contains many excellent short videos on the application of deep learning
  3. Play it here:
  4. A few examples here:
  5. I am going to take Udacity’s deep learning class at–nd101
Author: Xu Cui Categories: deep learning, programming Tags:


January 17th, 2017








Author: Xu Cui Categories: programming, stork, web Tags:

Communications between two MatLabs (2): over socket

October 17th, 2016

Aaron Piccirilli

Aaron Piccirilli

After the previous blog post Communications between two MatLabs (1) over file, Aaron Piccirilli in our lab suggested a more efficient way to communicate between two matlabs, i.e. over socket. Below is the source code provided by Aaron:

udpSocket = udp('', 'LocalPort', 2121, 'RemotePort', 2122);
udpCleaner = onCleanup(@() fclose(udpSocket));
for ii = 1:100
    fprintf(udpSocket, '%d%f', [ii ii/100]);
    disp(['Sending ' num2str(ii)]);
udpSocket = udp('', 'LocalPort', 2122, 'RemotePort', 2121);
udpCleaner = onCleanup(@() fclose(udpSocket));
    if udpSocket.BytesAvailable
        ii = fscanf(udpSocket, '%d%f');
        disp(['Received' num2str(ii(1)) '-' num2str(ii(2))])

More words from Aaron:

I also wrote a couple of quick tests to compare timing by having each method pass 10,000 random integers as quickly as they could. Using UDP is over four times faster on my work machine, and would be sufficient to keep up with sampling rates up to about 900 Hz, whereas the file-based transfer became too slow at about 200 Hz.

Obviously these rates and timings are going to be system and data-dependent, but the UDP implementation is about the same amount of code. It has some added benefits, too. First is what I mentioned before - that this allows you to communicate between different languages. Second, though, is what might be more important: buffer management. If your data source is sending data faster than you can process it, even for just a moment, the UDP method handles that gracefully with a buffer. To get the same functionality with the file method you have to write your own buffer maintenance - not too tricky, but adds another layer of complexity and probably won’t be as efficient.

I did a similar timing test passing 40 floats each time (say for 20 channels of NIRS data) instead of a single integer and the timing did not really change on my machine.

I also tested the above scripts, and they work beautifully! I definitely recommend this method over the ‘file’ method. One thing to note: when you Ctrl-C to quit the program, remember to close the socket (fclose(udpSocket)) AND clean the variables (udpSocket, udpCleaner); otherwise you will run into the “Unsuccessful open: Unrecognized Windows Sockets error: 0: Cannot bind” error.

Note from Aaron:

One note: the onCleanup function/object is designed as a callback of sorts: no matter how the function exits (normally, error, crash, Ctrl-C), when the onCleanup object is automatically then destroyed, its anonymous function should run. Thus, the UDP connection should be closed no matter how you exit the function. This won’t work for a script, though, or if you were just running the code on its own in a command window, as the onCleanup object wouldn’t be automatically destroyed. I would just exclude that line completely if you weren’t running it as a function.

Author: Xu Cui Categories: matlab, programming Tags:

Communications between two MatLabs (1) over file

October 3rd, 2016

Ref to: Communications between two MatLabs (2): over socket

It’s common that two MatLab programs needs to communicate. For instance, one program is collecting the brain imaging data but not display them, and the other program is to display the data. (Another case is at Sometimes it is not practical to merge the two program together (e.g. to keep the code clean). In this case we can run two MatLabs simultaneously. One keeps saving the data to a file, and the other keep reading the file.

Here I played with such a setup, and find they communicate well with small delay (small enough for hemodynamic responses). Check out the video below:


for ii=1:100
    disp(['write ' num2str(ii)])

last_ii = 0;
        load data
        if(ii ~= last_ii)
            disp(['get data. i=' num2str(ii)])
        last_ii = ii;

Caveat: writing/reading to/from disc is slow. So if your experiment requires real time communication without any delay (say <1ms), this method may not work. Also, the amount of data to write/read each time should be very small, and the frequency of write should be small too. The file needs to locate in your local hard drive instead of a network drive.

———- Comments ———–
Paul Mazaika from Stanford:
Cool piece of code! There may be a way to do this with one umbrella Matlab program that calls both components as subroutines. The potential advantage is that one program will keep memory in cache, not at disk, which can support rapidly updating information. For high speeds, it may be better to only occasionally update the graphical display, which otherwise may be a processing bottleneck.

Aaron Piccirilli from Stanford:
There is, sort’ve! I think Xu’s little nugget is probably best choice for many applications, but if speed is an especially big concern then there are a couple of options that I’ve come across that will maintain some sort of shared memory.

Perhaps the easiest is to use sockets to communicate data, via UDP or TCP/IP, just like you use over the internet, but locally. You write some data to a socket in one program, and read it from that same socket in another program. This keeps all of your data in memory as opposed to writing it to disk, but there is definitely some overhead for housekeeping and to move the data from one program’s memory into the operating system’s memory then back into the other program’s memory. An added bonus here: you can communicate between different languages. If you have a logging function written in Python and a visualization program in MATLAB, they can pretty easily communicate with each other via sockets.

MATLAB doesn’t have explicit parallel computing built-in like many other languages, sadly, but we all have access here to the Parallel Computing Toolbox, which is another option for some more heavy-duty parallel processing where you have a problem you can easily distribute to multiple workers.

Finally, true shared memory might be more trouble than it’s worth for most applications, as you then have to deal with potential race conditions of accessing the same resource at the same time.


More on this topic: Please continue to read Communications between two MatLabs (2): over socket

Author: Xu Cui Categories: brain, matlab, nirs, programming Tags:


August 11th, 2016



1. 我的导师是否有经费?
2. 我在寻找博后的职位;我未来的老板是否有足够的经费支持我?
3. 有多少经费拨给了我的研究领域(比如NIRS)?谁得到了这些经费?他们将用这些经费做什么?



Stork 就是这样一个工具。

我在Stork里输入如下关键字,“pearl chiu”(我之前同事的名字)以及“NIRS brain”(我的研究领域)。以下是Stork发给我的邮件:

Stork notifies me of awarded grants


有了Stork提供的信息,我了解了在我的研究领域,谁得到了经费以及他们打算用这笔经费做什么研究。实际上邮件里的第三个基金是拨给我的同事Manish, 用于他进行利用NIRS对静息状态下的脑回路的研究。我还看到Pearl得到了很大一笔经费,所以我给她发送了一封祝贺邮件。与Stork的另一个功能论文提醒比起来,经费提醒让我更早的对自己研究领域的趋势了如指掌。得到经费支持的研究,通常需要几年之后才有相关论文发表出来。


Author: Xu Cui Categories: brain, programming, stork, web, writing Tags:

A mistake in my False discovery rate (FDR) correction script

August 8th, 2016

I have posted an FDR script at I noticed that there is a small bug. In rare cases, this bug will cause the most significant voxel to be classified as ‘non-significant’ while other voxels are ’significant’.

Consider the following example:

p = [0.8147 0.9058 0.0030 0.9134 0.6324 0.0029 0.2785 0.5469 0.9575 0.9649 0.1576 0.9706 0.9572 0.4854 0.8003 0.1419 0.4218 0.9157];

The previous script will classify p(3) as significant but p(6) as non-significant.

Here is the updated version of the script:

function y = fdr0(p, q)
% y = fdr0(p, q)
% to calculate whether a pvalue survive FDR corrected q
% p: an array of p values. (e.g. p values for each channel)
% q: desired FDR threshold (typically 0.05)
% y: an array of the same size with p with only two possible values. 0
% means this position (channel) does not survive the threshold, 1 mean it
% survives
% Ref:
% Genovese et al. (2002). Thresholding statistical maps in functional
% neuroimaging using the false discovery rate. Neuroimage, 15:722-786.
% Example:
%   y = fdr0(rand(10,1),0.5);
% Xu Cui
% 2016/3/14

pvalue = p;
y = 0 * p;

[sortedpvalue, sortedposition] = sort(pvalue);
v = length(sortedposition);
for ii=1:v
    if q*ii/v >= sortedpvalue(ii)
        y(sortedposition(1:ii)) = 1;

Author: Xu Cui Categories: brain, matlab, nirs, programming Tags: