Since realworld data sets usually contain large instances, it is meaningful to develop efficient and effective multiple instance learning (MIL) algorithm. As a learning paradigm, MIL is different from traditional supervised learning that handles the classification of bags comprising unlabeled instances. In this paper, a novel efficient method based on extreme learning machine (ELM) is proposed to address MIL problem. First, the most qualified instance is selected in each bag through a single hidden layer feedforward network (SLFN) whose input and output weights are both initialed randomly, and the single selected instance is used to represent every bag. Second, the modified ELM model is trained by using the selected instances to update the output weights. Experiments on several benchmark data sets and multiple instance regression data sets show that the ELMMIL achieves good performance; moreover, it runs several times or even hundreds of times faster than other similar MIL algorithms.
Multiple instance learning (MIL) was first developed to solve the problem of drug prediction [
Numerous learning methods for MIL problem have been proposed in the past decade. As the first learning algorithm for MIL, AxisParallel Rectangle (APR) [
Extreme learning machine (ELM) provides a powerful way for learning pattern which has several advantages such as faster learning speed, higher generalization performance [
The remainder of this paper is organized as follows. In Section
In this section, we first introduce ELM theory; then, a modified ELM is proposed to address the MIL problem, where the most positive instance in positive bag or the least negative instance in negative bag is selected.
ELM is a single hidden layer feedforward neural network where the hidden node parameters (e.g., the input weights and hidden node biases in additive nodes and Fourier series nodes, centers, and impact factors in RBF nodes) are chosen randomly and the output weights are usually determined analytically by using the least square method. Because updating of the input weights is unnecessary, the ELM can learn much faster than back propagation (BP) algorithm [
Concretely, suppose that we are given a training set comprising
Sigmoid function:
Gaussian function:
For notational simplicity, (
The least square solution with minimal norm is analytically determined by using generalized MoorePenrose inverse:
Assume that the training set contains
Based on the assumption if a bag is positive at least one of its instances is positive, we can simply define
Up to now, the last problem is how we can find the most likely instance that has the maximum output. As we know, ELM chooses the input weights randomly and determines the output weights of SLFNs analytically. At first, the output weights are not known; thus, the
Given a training set
Randomly assign the input weight
Calculate the output of the SLFNs
Select the wininstance
Now, we have
Calculate the hidden layer output matrix
Calculate the new output weights:
Five most popular benchmark MIL data sets are used to demonstrate the performances of the proposed methods, which are the MUSK1, MUSK2, and images of Fox, Tiger, and Elephant [
ELMMIL network with 166 input units, where each unit corresponds to a dimension of the feature vectors, is trained for ranging hidden units. It should be noted that outputs
ELMMIL performance on benchmark data sets.
Algorithm  MUSK1  MUSK2  Elephant  Fox  Tiger 

Iterateddiscrim APR [ 
92.4  89.2  N/A  N/A  N/A 
Citation 
92.4  86.3  N/A  N/A  N/A 
Diverse Density [ 
88  84  N/A  N/A  N/A 
ELMMIL (proposed)  86.5 (4.2)  85.8 (4.6)  76.7 (3.9)  59.5 (3.7)  74.6 (2.4) 
EMDD [ 
84.8  84.9  78.3  56.1  72.4 
BPMIP [ 
83.7  80.4  N/A  N/A  N/A 
MISVM [ 
77.9  84.3  81.4  59.4  84 
C4.5 [ 
68.5  58.5  N/A  N/A  N/A 
The relation between the number of hidden layer nodes and the prediction accuracy with different regulator parameter
The predictive accuracy of MIELM on MUSK1 changes as the number of hidden neurons increases.
The predictive accuracy of MIELM on MUSK2 changes as the number of hidden neurons increases.
As time is limited, we have conducted experiments on several typical algorithms and recorded their computation time. The training of ELMMIL, Citation
Accuracy and computation time on MUSK1.
Algorithm  Accuracy  Computation time (min) 

ELMMIL  86.5 

Citation 

1.1 
BPMIP  83.8  110 
Diverse Density  88  350 
Accuracy and computation time on MUSK2.
Algorithm  Accuracy  Computation time (min) 

ELMMIL  85.8 

Citation 

140 
BPMIL  84  1200 
Diverse Density  84  3600 
Table
We compare ELMMIL, BPMIP, Diverse Density, and MIkernel [
Squared loss and computation time (second) on regression data sets.
Squared loss 
LJ160.166.1  LJ160.166.1S  LJ80.166.1  LJ80.166.1S  

MIKernel 

90 

8000 

120 

10100 
ELMMIL  0.0376 

0.0648 

0.0485 

0.0748 

BPMIP  0.0398  4980  0.0731  13000  0.0487  5100  0.0752  12500 
Diverse Density  0.0852  12000  0.0052  17000  N/A  N/A  0.1116  17600 
In this paper, a novel multiple instance learning algorithm is proposed based on extreme learning machine. Through modifying the specific error function for the characteristics of multiple instance problems, the most representative instance is chosen in each bag, and the chosen instances are employed to train the extreme learning machine. We have tested ELMMIL over the benchmark data sets which are taken from applications of drug activity prediction, artificial data sets, and multiple instance regression. Compared with other methods, ELMMIL algorithm learns much faster and its classification accuracy is slightly worse than stateoftheart multiple instance algorithms. The experimental results recorded in this paper are rather preliminary. For continuous work, there may be two directions. First, it is possible to improve our method performance by exploiting feature selection techniques [
The authors declare that there is no conflict of interests regarding the publication of this paper.
This work was supported by the Specialized Research Fund for the Doctoral Program of Higher Education of China (no. 20124101120001), Key Project for Science and Technology of the Education Department of Henan Province (no. 14A413009), and China Postdoctoral Science Foundation (nos. 2014T70685 and 2013M541992).