AdaBoost, Adaptive boosting

51 sec read

AdaBoost is an algorithm to linearly combine many classifiers and form a much better classifier. It has been widely used in computer vision (e.g. identify faces in a picture or movie). Look at the following example:

How it works?
First, you have a training dataset and pool of classifiers. Each classifier does a poor job in correctly classifying the datasets (they are usually called weak classifier). Then you go through the pool and find the one which does the best job (minimizing classification error). You then increase the weight of the samples which are wrongly classified (so the next classifier has to work better on these samples). Then you go through the pool again. Formally, it is (click to see a larger view):


Now let’s try it. In the following example, our data is nonlinearly separated. In our pool of weak classifiers, they are all dummy (can only classify with a single threshold on a single data dimension). We repeat 10 times (i.e. find 10 weak classifiers) and combine them to form a super strong classifier.
Original dataset:

Better and better super (strong) classifier when more and more weak classifiers are incorporated (click to see a larger view). t means the number of weak classifiers included.

The error rate is decreasing as more and more weak classifiers are included:

Here is the source code:
example.m example
weakLearner.m weaklearner


Want to receive new post notification? 有新文章通知我


作者:北京师范大学 龙宇航,[email protected]代码来源(见本页底部):周思远 在使用wtc计算脑间神经同步后,我们需要在多个频率段、多个通道组合上对神经同步值进行统计检验,因
Xu Cui
1 min read

Calculate phase difference between two general signals (e.g. HbO…

In a recent fNIRS journal club (vedio recorded here), Dr. Tong talked about their work on the phase difference between oxy and deoxy Hb, and its relationship with participants’ age. This article is a demo of how to use Hilbert transform to calc
Xu Cui
1 min read

Functional Near Infrared Spectroscopy (fNIRS): List of manufactures

Contents Company Products Artinis Brite, OxyMon, OctaMon, PortaMon, PortaLite BIOPAC fNIR 100, fNIR2000M Series CORTIVISION PHOTONCAP Huichuang Medical NirScan-900X, NirSmartⅡ- Pro GOWERLABS NTS fNIRS system, LUMO HAMAMATSU NIRO-200NX HITACHI ETG-410
Xiaodan Feng
14 min read

67 Replies to “AdaBoost, Adaptive boosting”

  1. This is the best explanation of Adaboost I found on the web. Looking at this and after understanding adaboost, I implemented it in Python and used in my project. Thank you very much!

  2. Very nice and helpful tutorial.But it would be lot more attractive and easy to understanding the basics if you will explain each and every step of the algorithm with reasons of doing that step..

  3. Xu Cui :
    Zt is the normalization factor, making sure the sum of D is 1

    How can we get Zt? which D you refer to in “sum of D is 1”, Dt(i) or Dt+1(i)? and what happen if sum of D is less or more than 1?
    Thank you.

  4. Thank you for your excellent adaboost tutorial.
    I would appreciate if it possible for you to send me multiclass Adaboost Matlab code.
    yours faithfully.

  5. Thanks for the post and the code, it’s helping me a lot! I’m having trouble with some bits, though. It seems that for one instance of t you are training 244 classifiers and pick the one with the lowest error. Each of these temporary classifiers has attributes threshold, dimension and direction. These directions, can those be interpreted as class labels? The threshold, is that a horizontal decision boundary?

    I’m also curious about the dimension variable. It looks like x has 2 features (so for example weight and height). When searching for a classifier, you only consider 1 dimension at a time? I haven’t encountered this in the pseudo-code in the scientific literature (which leaves out many details, so I’m not implying you are doing something wrong). I’m trying to work towards boosting with SVR as a base learner with continuous data, so training on only one dimension seems troublesome for my goal.

  6. Isn’t it true that Adaboost doesn’t actually create a “SINGLE STRONG CLASSIFIER” but uses a set of weak classifiers to achieve better performance in the testing phase after assigning weights in the training phase.

  7. I’ve read that Adaboost also picks best features apart from weighting weak classifiers. How does Adaboost pick best features from the data.

  8. @Rahul

    The single, strong classifier consists of several weak classifiers. It takes the predictions from the weak ones and then uses some method (such as taking the mean or median) to take the separate predictions and output one single prediction.

  9. @Rahul
    You are mixing two different things here. The weak classifiers are weighted based on the error (expressed as alpha, you can look it up in the code).
    The datapoints are weighted, as to put more emphasis on the datapoints that are difficult to classify. This is done by changing weight matrix D (also see code).

  10. Is AdaBoost used only for learning/training? Can AdaBoost used for testing new case (from formula above,to find E we must sum Dt(i) which yi!=hj(xi))? If adaBoost in only used for learning sample, how can we get the prediction for new case or new data ?

  11. Excellent tutorial about adaboost classifier.Thank you
    but I have a question, can I use SVM classifier instead of weaklearner?? if yes, could you please help me with this?
    Thank you

  12. as you know the function svmtrain of libsvm (my weaklearner in this case) has this structure: [model]=svmtrain(label,data). and the model is a struct with 10 fileds. so how I am supposed to deal with this structure to be able to run the exemple knowing that the weaklearner above don’t have the same input and output of the function svmtrain??! Hope that I was clear

  13. @rima
    What you could do is, use svmtrain to get a model, then use that model and svmpredict in the weaklearner.

    Besides svm, what are the other weaker learners?

  14. Mr.Xu Cui, can we use ADABOOST to improve the performance of Adaptive filtering technique (LMS)?
    If YES can you explain the code how to append the adaboost & LMS so that the robustness can be improved.

  15. @rima: svmpredict returns set of predicted labels. Then you need to calculate error(t) as described in the algorithm and update weights. Do it until the number of iterations is reached or et is more than 1/2.

  16. @Xu Cui: whats the motivation to use thresholds (-3 to 3)? In the code you mention that you are using 244 weak classifiers? But in the article you mention you are using T=10 weak classifiers?

  17. @Rahul
    That’s because the data range is from -3 to 3.

    The pool of weak classifiers contains 244 weak classifiers. Then one by one we choose 10 weak classifiers to include in our strong classifier. The criterion to choose a week classifier is to see if it is the best classifier (among 244) that correctly classify the data.

  18. @Xu Cui,
    Thank you very much for the tutorial, I would like to ask for your advise in programing adaboost algorithm for two classes. I had used ionosphere data set from the uci repository. Also I’m using subtractive clustering as my weak learner. After a number of iteration the error resulted from the weak classifier’s is greater than 0.5 and no matter what value of the raduis of the cluster I gave. I thought that using clustering on only two classes data set will have an output which is better than random guessing but now I’m feeling confused about it.
    Can you advise me in this matter please.
    Thank you in advance

  19. Hi
    I have the same problem as Rima had. I want to use SVM as my weak classifier and then adaboost. Can you please let me know how I can use your code for adaboost. I am confused. Thanks a lot

  20. I need to implement the RUSboost algorithm fully in MATLAB and I am stuck with the Weak Learner part. I want to simply use SVM as my Weak Learner but not sure about the parameters that should be passed in the svmtrain and the svmclassify function. To be more specific, two commands are used for training and predicting in the SVM,

    PredictedLabel=svmclassify(svmStruct, test_data);

    As per the algorithm, it says that the undersampled data and the weights W (which are initialized to be all same) should be used for training the weak Learner. So, my question is how should I introduce the weights in my SVM.?

  21. Dear Xu Cui,

    How can i use AdaBoost algorithm in JAVA OR C++ for object recognition..?
    object can be a Symbol or anything….

    Thanks in Advance.

  22. Other Adaboost talk about “iterations” (100 or so). How do I relate “iterations” in this explanation of Adaboost?

  23. Dear Xu Cui,

    can u please tell me , in your above algorithm , what is (x1, y1) to (xm,ym).. also what is the value of m , how it is calculated?

  24. i go throuh the whole algo and i understood it. But how I it is used for face detection?
    Reply would be appreciable

    Ashish Kumar

  25. hi
    thanks a lot for this proje.I work on adaboost algoritm and signals.can you help me about this.


  26. Thank you Mr. Xu Cui for this great explanation of the AdaBoost. I’ve read pretty much most of all the articles about the Adaboost and i still couldn’t understand one step pf this algorithms. I have been working with classifiers like naive Bayes, nearest neighbor, Bayesian linear and quadratic classifiers. As you may know, these classifiers works with training samples so they could calculate their decision then predict the label of the test samples. But i have noticed with the adaboost that there is no need for training and testing samples (correct me if i’m wrong). Can i work with one of these classifiers as my weak learner ?? and in this case how can I train the weak classifiers using Dt in each iteration ?

    Thanks for advance

  27. @Wissal
    In principle you can use part of your data as training, then you get a strong classifier. Use this strong classifier to test the remaining data.

    Yes, you can use one (or more) of your classifiers as weak learner. You do not “train” your weak classifier, instead you pick the best weak classifier. For example, if you use quadratic classifier, you probably have parameters you can change. Each parameter value corresponds to a weak classifier.

  28. how can this code be used on a database which has palm images?
    I need it for my project in palmprint recognition

  29. Thanks very much for your explanation. It’s pretty good.
    But it could be used for face recognition?
    If yes, i am curious at accuracy it can reach.
    It could be over than PCA + LDA, or various kernel methods?

  30. @cheer
    cheer, face recognition is unfortunately some area I am not familiar with. But apparently adboost has been used in this area based on search. If you know the answer please let us know.

  31. Does it look like a good thing to try to boost several already “strong” classifiers of different types? For example, to combine SVM, kNN and some neural networks classifiers? I am going to try this anyway but maybe there is some information about such a thing? Thanks!

  32. I never tried but I do think it’s a great idea. Do you mind letting me know how much improvement you can get?

  33. I do not understand the meaning of threshold, dimension and direction, neither how their values are chosen.
    Any clue for a beginner?

  34. Hello Dirk-Jan Kroon,
    Can you please let me know how can I use neural network as weak classifier? .Can you tell me how to put the Adaboost algorithm connect with the neural network ?

  35. Hi Mr Xu,
    Thank you very much for your very helpfull code. Can you please explain me a point. I’m working on adaboost in the aim of selecting best features. So I use the tmph struct to obtain the dimentions of the best features (tmph.dim = kk). But I have a confusion when I found tmph.pos =1 and sometimes tmph.pos =-1 . My question is: by -1 do you want to say that those features are the best features representing the first classe and features having as position 1 are the ones representing the second classe???

    Or do you want to say that all selected features (pos=1 and pos=-1) are all representing the 1rst classe. (the discrimination of the two classes)

    I will very gratuful if you respond on my question.

    Thank you in advance

  36. Hi Mr Xu,
    Thank you very much for your very helpful there any provision that instead of taking Training data as 1000 samples 2D …an text file or xls or csv file of datasets can be attached to code and get the outputs..
    I will very gratuful if you respond on my question

  37. Hi Xu, firts i want to thank for the code, it’s been a great help for me. My question is after the algorithm is trained, in what point of the code, i can enter the testing data? @Xu Cui

  38. @Ruben Dario
    You can put your data to “data” variable below:

    finalLabel = 0;
    for t=1:length(alpha)
    finalLabel = finalLabel + alpha(t) * weakLearner(h{t}, data);
    tfinalLabel = sign(finalLabel);

  39. Hi Xu,
    Thank you very much for your useful code.
    I would be grateful if you could put a code for multiclass adaboost.

  40. Hi Mr Xu Cui
    Thank you for your useful tutorial.
    It really helped me.
    I have a question here. I dont get what “position” does in the code.
    I know it is used in classification part by weak learner, but I dont know the ‘use’ and why it’s fixed to -1 and 1.

    thank you, again

  41. label = double((data(:,1).^2 – data(:,2).^2)<1); % a nonlinear separation example
    %label = double((data(:,1).^2 + data(:,2).^2)<1); % a nonlinear separation example

    pos1 = find(label == 1);
    pos2 = find(label == 0);
    label(pos2) = -1;

    % plot training data
    figure(‘color’,’w’);plot(data(pos1,1), data(pos1,2), ‘b*’)
    hold on;plot(data(pos2,1), data(pos2,2), ‘r*’)
    xlim([-3 3])
    ylim([-3 3])
    title(‘original data’)
    axis square

    % addBoost

    T = 10;
    D = ones(size(label))/length(label);
    h = {};
    alpha = [];

    for t=1:T
    err(t) = inf;
    [out tree acc er] = weaktree(data,label);
    tmpe = sum(D.*(weaktree(data,label)~= label));
    if( tmpe = 1/2)
    disp(‘stop b/c err>1/2’)

    alpha(t) = 1/2 * log((1-err(t))/err(t));

    D = D.* exp(-alpha(t).*label.* weaktree2(h{t}, data));
    D = D./sum(D);
    stop b/c err>1/2

  42. Xu Cui :
    @Ruben Dario
    You can put your data to “data” variable below:
    finalLabel = 0;
    for t=1:length(alpha)
    finalLabel = finalLabel + alpha(t) * weakLearner(h{t}, data);
    tfinalLabel = sign(finalLabel);

    I want to implement Adaboost.M1 but I don’t understand arcmax and I cannot coding, please show the example Adaboost.M1. Thank you

  43. hi, thanks for your tutorial.
    i have a problem in using svm as a weak learner.
    i have no idea. i need to guide me how to use another learner.

    im gonna use svm as weak learner. but i dont know how.
    please help me.
    i would appreciate if you reply me by email.
    this is my email address :[email protected]
    please guide me.

    best regards
    m. norizadeh

Leave a Reply

Your email address will not be published. Required fields are marked *