It would be nice if we can predict the future. For example, give the following time series, can we predict the next point?
Let’s use SVM regression, which is said to be powerful. We use the immediate past data point as the predictor. We train our model with the first 70% of data. Blue and Black are actual data, and Red and Pink are predicted data.
The prediction in general matches the trend. But if you look closely, you see that the predicted data is always lagging the actual data by one time step. See a zoom in below.
Why does this lag come from?
Let’s plot the predictor and the predicted (i.e. the current data point vs the next data point):
It took me a few hours to think about this. Well, the reason turns out to be simple. It’s because our SVM model is too simple (only taking the last data point as predictor): if a data has a increasing trend, then the SVM model, which only consider the immediate history, will give a high predicted value if the current data value is high, a low value if the current data value is low. As a consequence, the predicted value is actually more similar to the current value – and that gives a lag if compared to the actual data.
To reduce the lag, you can build a more powerful SVM model – say use the past 2 data points as the predictor. It will make a more reliable prediction – if the data is not random. See below comparison: you can easily see the lag is much smaller.
Source code can be downloaded here test_svr. Part of the source code is adapted from http://stackoverflow.com/questions/18300270/lag-in-time-series-regression-using-libsvm