Gene selection for cancer classification with the help of bees

Background Development of biologically relevant models from gene expression data notably, microarray data has become a topic of great interest in the field of bioinformatics and clinical genetics and oncology. Only a small number of gene expression data compared to the total number of genes explored possess a significant correlation with a certain phenotype. Gene selection enables researchers to obtain substantial insight into the genetic nature of the disease and the mechanisms responsible for it. Besides improvement of the performance of cancer classification, it can also cut down the time and cost of medical diagnoses. Methods This study presents a modified Artificial Bee Colony Algorithm (ABC) to select minimum number of genes that are deemed to be significant for cancer along with improvement of predictive accuracy. The search equation of ABC is believed to be good at exploration but poor at exploitation. To overcome this limitation we have modified the ABC algorithm by incorporating the concept of pheromones which is one of the major components of Ant Colony Optimization (ACO) algorithm and a new operation in which successive bees communicate to share their findings. Results The proposed algorithm is evaluated using a suite of ten publicly available datasets after the parameters are tuned scientifically with one of the datasets. Obtained results are compared to other works that used the same datasets. The performance of the proposed method is proved to be superior. Conclusion The method presented in this paper can provide subset of genes leading to more accurate classification results while the number of selected genes is smaller. Additionally, the proposed modified Artificial Bee Colony Algorithm could conceivably be applied to problems in other areas as well. Electronic supplementary material The online version of this article (doi:10.1186/s12920-016-0204-7) contains supplementary material, which is available to authorized users.

pheromone. Table S4 presents performance of the algorithm with and without using pheromone. The accuracy remains almost unchanged with the use of pheromone. But from the experimental results we can see that there is an enormous reduction in the number of selected genes. The reason behind this improvement is that the pheromone emphasizes the informative genes found in previous iterations. Much more iterations will be needed to reach the target fitness if we rely mostly on random exploration.
Probability of Local Search, probLS Values from 0 to 1, with step size 0.1 are tested to tune this parameter. The results are listed in Table S5. The relation between change in accuracy with probLS is presented in the Fig. S1. And the relation between change in selected gene size with probLS is exhibited in the Fig. S2. From the experimental values it is observed that with the increase of the probability of employing local search, accuracy of the algorithm also increases. Selected gene size tends to decrease with increasing probability of local search. Note also that more application of the local search increases the running time.  Table S6. The accuracy seems to improve with higher values of nd. This is because large value of nd means that large number of genes will be removed from the individual which also increases the probability of removing more noisy genes. But high value of nd results in high number of selected genes, because, too little or too big a jump at a time fails to find a potential neighbor. So, if nd is too large, finding potential candidate after reducing the gene subset will be less frequent. As a result the algorithm fails to achieve a small gene subset.
Pheromone Persistence Factor, ρ The amount of pheromone to be retained from previous iterations is determined by ρ. So, (1 − ρ) is known as the pheromone evaporation coefficient. In this experiment, the parameter ρ takes values from 0 to 1, with step size 0.1 are tested to tune this parameter. The performance for different parameter values for ρ is presented in Table S7. The experimental outcome shows that an increase in the value of ρ results in an increase in both accuracy (upto 0.8) and selected gene set size. In each iteration, (1 − ρ) × 100% of the current pheromones are evaporated. So increase in ρ means less evaporation and thus, pheromone containing history of the previous iterations can retain longer. So good solution components will contain high pheromone values for more iterations as previous knowledge is remembered for much longer period of time. As a result during the gene selection, they are more likely to be selected. Hence, bees ability to use their experience while selecting genes also strengthens which results in higher accuracy. Again redundant and noisy genes that get selected also contributes for longer period of time which results in higher selected genes size. The value 0.8 is set as the tuned value for ρ as it shows the highest accuracy. When no pheromone is evaporated (ρ is 1.0), the algorithm selects lowest number of genes. As the information about all the components are stored and thus the most informative genes are detected. But also for this value the obtained accuracy is lowest because of stagnation.
Weight of Accuracy in Fitness, w 1 The tradeoff between accuracy and the number of selected genes is controlled by the parameter w 1 . In the fitness equation, w 1 is the weight for the accuracy whereas (1 − w 1 ) is the weight for the number of selected genes. To tune the parameter w 1 , values from {0.5, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9} are considered. Results of using different parameters are given in Table S8. The Fig. S3 represents a graph showing change in accuracy with respect to change in w 1 . The graph shows increasing trend of accuracy. It reaches its peak at the value of 0.85 for w 1 . For further values accuracy remains almost same. Increase in w 1 results in increase in accuracy, which is expected because increase in w 1 means accuracy is weighted more. The Fig. S4 represents the graph showing the behavior of selected gene size with respect to change in w 1 . Also increase of w 1 means selected gene size weighs less in fitness function. So for higher values of w 1 the number of selected gene is also high. From the graphs ( Fig. S3 and S4) it is confirmed that the algorithm perform best in the range of 0.7 to 0.85 considering both the constraints. The value 0.85 is selected as optimized value as it gives the highest accuracy.
Population Size, P S The values in the range of 10 to 50 with step size 5, are considered for tuning the parameter P S. Experimental results using different parameter values are given in Table S9. Notable increase in P S shows negligible increase in accuracy. Also a slight reduction of the selected gene set size is noticed up to a value of 30. For population size 30, the number of selected gene is noticeably small but again the value increases for higher population size. For population size 40 highest accuracy is achieved.

Prefiltering Method
We have considered two prefiltering methods: one is parametric (F -test) and the other is non-parametric (Kruskal-Walllis). The performance of application of these two methods in our approach is reported in Table S10. Kruskal-Wallis method shows better performance according to accuracy. This is expected because Kruskal-Wallis is a nonparametric method which is known to be suitable for gene selection [2,3]. On the other hand, F -test seems to filter out the redundant genes more effectively resulting in relatively smaller selected gene size (at least for the dataset we used for parameter tuning, i.e., 9 T umors). The performance of the filtering methods depends on whether the dataset contains normal distribution or not. So further study may consider choosing prefiltering method depending upon the result of checking the normality of the dataset. We have used Kruskal-Wallis method in our final experiments.
Selection Method at the Onlooker Bee Stage As the selection procedure at the onlooker bee stage, Tournament Selection and Fitness-Proportionate Selection have been reviewed. The experimental results are presented in Table S11. Fitnessproportionate selection yields smaller size for the selected gene set. On the other hand tournament selection provides slightly higher classification accuracy. We have used tournament selection in our final experiments.
Kernel Method for SVM As the kernel method for SVM, use of both linear and RBF kernels have been examined. The experimental results are presented in Table S12. The outcome exhibits better performance for linear kernel according to both accuracy and the number of selected genes. This is because the linear kernel is suitable for high dimensional, small sample sized data [4,5]. Also, linear kernel is reported to have shown better performance than RBF kernel for gene selection [6]. Notably however, one of the studies suggests that choice of kernel does not affect performance in most of the cases [7].
Inertia Weight (w) Update Approach For updating the inertia weight w, we have discussed two approaches (i.e., Eq. ?? and Eq. ??). The experimental outcome for updating inertia weight is reported in Table S13. The random update method i.e., Eq. ?? gives better performance according to selected gene size while the Eq. ?? gives better accuracy. We have decided to use Eq. ?? to update the inertia weight in our final experiments.
Local Search Local search procedure is used in both employed bee and onlooker bee stages. As local search procedures, Hill Climbing (HC), Simulated Annealing (SA), and Steepest Ascent Hill Climbing with Replacement (SAHCR) are proposed for both the stages. Results employing different local search methods in these stages are reported in Table S14. For all the experiments related to local search methods we have kept the value of probLS to 1.0 unless stated otherwise.
Local Search at the Employed Bee Stage Performance of Hill Climbing as the local search procedure at the employed bee stage with HC or SAHCR or SA as the local search method at the onlooker bee stage has been examined. Hill climbing shows satisfactory performance in this stage when HC or SAHCR is run at the onlooker bee stage; the result is poor when SA is run as the local search method for onlooker bee stage. Hill Climbing only exploits a potential solution. Upward movement of HC gets lost by random walk of SA at the onlooker bee stage. Hill climbing is more likely to get stuck in a local optima. So when HC is used for the employed bee stage, the possibility to get stuck in a local optima for the employed bee is higher. As a result we need combination of further exploitation and exploration by the onlooker bee when HC is applied in employed bee stage.
Implementing SA at this stage performs well only with SAHCR at the onlooker bee stage. Since SA needs parameters to be tuned to perform well, the choices of SA parameters can have a significant impact on the method's effectiveness. Here for this experiment we have used SA parameter values without tuning. So it failed to exhibit a good performance. But despite that, SAHCR at the onlooker bee stage found good solution from SA outcome of the employed bees.
Use of SAHCR at the employed bee stage gives satisfactory performance irrespective of any local search method employed at the onlooker bee stage. SAHCR tweaks multiple times in each iteration which allows it to explore enough to find a good solution. Thus application of SAHCR results in increased running time compared to SA and HC. Use of SA as the local search at the onlooker bee stage with SAHCR at the employed bee stage shows comparatively poor performance. This can be attributed to the fact that a potential solution found by SAHCR might get lost by initial random walk of SA. Thus in this stage, we need an algorithm that will help the employed bee to land in a potentially good slope so that, when the onlooker bee exploits and explores further it can achieve a good solution.
Local Search at the Onlooker Bee Stage HC at the onlooker bee stage shows poor performance. HC will provide a good solution in this stage only if it starts from a prospective slope supplied from the employed bee stage. Thus because of a lack of exploration capability, HC performs poorly at this stage.
SA at the onlooker bee stage also performs poorly. This can be attributed to the fact that SA might go downhill sometimes. Initially SA performs mostly random walk. At the onlooker bee stage, we need to upgrade the already found solution by the employed bee. Therefore, it is expected for onlooker bees to exploit more than to explore. But use of SA here might result in degradation of a potential solution. It is expected that from the employed bee stage already the individual is in the slope of a possible good solution. But because of its initial random walk, use of SA at this stage makes the probability of loosing the progress by the onlooker bee high. This mostly resulted in a poor performance.
In this stage SAHCR performs really well, because, besides exploitation it also does enough exploration to find a good solution. In fact, the algorithm performs best when SAHCR is applied in both stages. But use of SAHCR in both stages increases the running time. Performance of SAHCR at the onlooker bee stage with either HC or SA at the employed bee stage seems quite satisfactory. For onlooker bee stage both SAHCR and HC perform significantly better than SA both with respect to accuracy and selected gene size.
Hill Climbing For hill climbing the only parameter that is tuned is the iteration count when HC is applied at the employed bee stage and SAHCR is applied in the onlooker bee stage . Iteration count from 10 to 20 with step size 2 have been considered. Results of using different iteration counts for HC at the employed bee stage are presented in Table S15. The experimental results show increase of selected gene size with the increase of iteration count. Accuracy remains stable with changes in iteration count. The accuracy reached highest at the value of 14. For this experiment the probability of local search is set to 1.0.
Simulated Annealing To tune the SA parameters we applied SA at the employed bee stage and SAHCR at the onlooker bee stage. To assess the contribution of iteration count in SA, values from 10 to 20 with step size 2 have been considered. For both default (0.4) and highest (1.0) value of probability of local search the experiments are performed. The results are given in Table S16. Accuracy increase with increased iteration count and reaches its peak at the value of 14 (20) for probability of local search 1.0 (0.4). When probLS is 1.0, further increase in the iteration count causes slight decrease in the accuracy. Also number of selected gene tends to decrease with higher iteration count. For both accuracy and the number of selected genes the value of 14 shows good performance which is set as the optimized value for this parameter.
For simulated annealing the parameter temperature t is tuned using different values from {1, 3, 5, 7, 10, 12, 15}. Experiments are performed for both default (0.4) and highest (1.0) value of probability of local search. Performance measure of using different values of t for SA at the onlooker bee stage is given in Table S17. Increasing the value of t causes little degradation in accuracy but number of selected genes increases with the increase of t. Initially t is set to a high number, which causes the algorithm to random walk in the search space. So, higher value of t means that we do random walk for a longer period of time by accepting every newly created solution regardless of how good it is. As a result the local search becomes poor at exploitation, resulting in comparatively inferior solutions. In SA t decreases slowly, eventually to 0, at which point the algorithm is doing just Hill Climbing.
The rate at which we decrease t is called the algorithm's schedule. The parameter schedule of simulated annealing is tuned using different values from 0.1 to 1 with step size 0.1. Experiments are performed for both the highest (1.0) and the default (0.4) probability of local search. The results of using different values for schedule is presented in Table S18. The accuracy remains almost same with the change in values of schedule.

Steepest Ascent Hill Climbing with Replacement
The values from 10 to 20 with step size 2 have been evaluated as the iteration count of SAHCR. For this experiment SAHCR is set as local search method in onlooker bee stage and SA is set as local search method in employed bee stage. The results of tuning iteration counts of SAHCR are presented in Table S19. The results show increase of accuracy with higher iteration count.
To review the consequence of applying different amount of tweak the values form {5, 6,7,8,9,10,12,15, 20} are considered. The experimental outcomes are listed in Table S20. Results of this experiment show improvement of accuracy with more tweaking. The selected gene size also tends to decrease with higher number of tweaks at both the stages. For all the experiments related to different parameter of SAHCR, both the highest (1.0) and the default values of probLS are considered.
Percentage of Gene to be Selected from Prefiltering Step, th n After the filtering technique is applied the genes are ranked. A percentage of top ranked genes are selected to be passed on to the next stage. The parameter th n determines the percentage of gene to be selected. The values for tuning th n are taken from {0.004, 0.01, 0.02, 0.03, 0.04, 0.045, 0.05, 0.06, 0.065, 0.07, 0.075, 0.08, 0.085, 0.09, 0.1, 0.2} when Kruskal-Wallis test is utilized as the prefiltering technique. The results are presented in Table S21. We can see that too small value of th n causes the filtering technique to discard most of the genes including the informative ones, which results in poor predictive accuracy. For this dataset, the value 0.03 (171 genes selected from the prefiltering stage) gives the highest accuracy. For higher values of th n accuracy remains almost same. In fact the accuracy reduces a little for high values of th n . The task of preprocessing step is to discard the irrelevant genes. So higher value of th n gives lower accuracy because of the presence of noisy genes. Again, too lower value of th n means informative genes are discarded in the prefiltering step. The experimental results also support the finding [8,9,10,11,12,13] that the use of all the genes potentially hampers the classifier performance. Table S21 also reports the number of genes selected from the preprocessing stage.

Threshold to Select Gene from Prefiltering
Step, th p The p-value for each gene is computed and then all the genes are sorted according to the p-value in the filtering step. To select the top ranked genes form Kruskal-Wallis (F -test) we need to fix a threshold and select all the genes having p-value less (greater) than the threshold. The values from {0.0005, 0.001, 0.002, 0.003, 0.005, 0.007, 0.009, 0.01, 0.02, 0.025, 0, 03, 0.04, 0.045} are taken into consideration to tune the parameter th p and Kruskal-Wallis was set as the prefiltering method. The results of the experiments are reported in Table S22. Increase in th p shows rapid increase in the selected gene size. The highest accuracy is attained at the value of 0.001 (150 genes selected from prefiltering stage). For further values of th p little declination in the accuracy is noticed. This happens because with the increase of th p less genes are filtered which results in the selection of noisy genes for the next stage. Thus the algorithm exhibits unsatisfactory performance. From the both approaches it is apparent that for the 9 T umors dataset informative genes are present in top 200 genes.  Table S23. From the observed results we can find that too low a value of c 0 results in low accuracy and the selected gene size. Gene subset remains small because the best solutions contribute more to the pheromone. So only small number of components receive high pheromone values and they get selected. But little exploration may cause the algorithm to get stuck at local optima. Again too high a value for c 0 masks the contribution of good solution components obtained from personal and global best solutions which also results in lower accuracy and higher selected gene size.
Maximum Number of Algorithm Iterations, M AX IT ER This parameter gives the maximum number of times the modified ABC will be run to achieve a single solution. To tune the parameter the values are taken from the range 10 to 50 with step size 10. The experimental results are presented in Table S24. From the outcome no significant correlation can be found. The accuracy remains almost the same. Change in the selected gene size can be noticed with the variation in the value of M AX IT ER, but no actual relation is found. Also from the experimental data it is observed that on average number of iterations needed by the algorithm to reach to its final result remains almost the same for various values of M AX IT ER. So we can conclude that the algorithm converges really soon. As the selected value for M AX IT ER, 20 has been chosen, because it shows the best performance. Notably, a higher value will result in a higher running time without improvement in the accuracy. Besides the value 20 shows best performance.
Number of Trials without Improvement, limit A food source which cannot be improved by a predetermined number (limit) of trials is abandoned by its employed bee and the employed bee is converted to a scout. The variable trial i keeps track of number of times fitness remains unchanged for the i th bee S i . To tune the parameter limit, the values {5, 7, 10,12,15,20,25,30,35,40,100,200, 500, 800, ∞} are used. The value ∞ for the parameter limit means that no food sources are abandoned by employed bees. In the other words there is no scout bees when limit is set to ∞. The results are reported in Table S25. Increase in limit exhibits improvement of performance according to both accuracy (Fig. S5) and the number of selected genes (Fig. S6). The accuracy is highest when the value of limit is 100. From the graph presented in Fig. S5 increasing trend for accuracy is visible. But when limit is very high, accuracy remains almost the same. For the number of selected gene we can see from the graph presented in Fig. S6 that when limit is very low selected gene size is very high. Otherwise decreasing trend is visible. Small value of limit means that the individuals get random initialization more frequently. So there is a possibility of discarding a potential solution before the individual gets chance to exhaust it amply [14]. So a good solution might be lost in the midway. Thus lower value of limit results in reduction in accuracy. Increase in limit also tends to increases the average number of iterations needed for the algorithm to reach in its final solution. This might be because larger value of limit means less random exploration capability. So the algorithm needs more time to reach in the final solution. When limit is too small or too large, the results obtained by our algorithm are worse than those produced by using the moderate values of limit. Therefore, results show that proper frequency of new solution production has useful effect on the solution fitness, which can perform enough explorations to improve the search ability of the algorithm. However, the balance between exploration and exploitation processes will be altered whether limit is too small or too large, which will produce worse solutions and cost much more running time. The obtained accuracy is highest for the limit value 100. But high value of limit may result in less exploration.

Parameter Tuning of Different Evolutionary Algorithms
Different evolutionary algorithms can be considered as search method for gene selection. In this section we will present performance of gene selection for utilizing different evolutionary algorithms including genetic algorithm (GA), ACO, and ABC in this section.
Genetic Algorithm Behavior of four parameters of genetic algorithms: population size, crossover ratio (r), mutation ratio (m), and M AX IT ER are studied. The whole process is repeated for at least 15 times with genetic algorithm as search method for each parameter combination. During these runs, all the parameter values except for the parameter in consideration are set to default. The default values for the parameters are given in Table S26. The values from {20, 30, 40, 50} are considered to study the parameter P S. The Table S27 presents the performance outcome for different values of P S. From the experimental outcome it is clearly visible that increase in population results in increased performance both according to accuracy and selected gene size upto population size 80. For further values a little performance degradation is visible. So, the value 80 is selected as optimized value for P S in GA. Values from {100, 200, 300, 400} are considered as maximum iteration count. The value 100 is considered as default value for M AX IT ER. The Table S28 presents the performance outcome for different values of M AX IT ER. The value 200 is selected as final value as it exhibits good performance considering both the constraints. The values from 0.6 to 1.0 with step size 0.1 are considered as crossover ratio. The default crossover rate is set to 1.0. The results are listed in Table S29. Increase in crossover ratio (r) exhibits increased performance both considering accuracy and selected gene size. But best result is found for crossover rate 0.9. For mutation ratio (m) 0.1 and 0.01 are considered for tuning. The results are presented in Table S30. Both accuracy and selected gene size decreases with high values of m. For crossover ratio the value 0.9 is considered as final value. Mutation ratio of 0.01 is set as the optimized value. For the parameter M AX IT ER the value 200 is finalized as it shows good results both according to accuracy and selected gene size.
Artificial Bee Colony For ABC, corresponding values for parameters are set according to tuning presented in Section for the proposed method. The optimized parameter settings for ABC is given in Table S31. The whole process is repeated for at least 15 times with artificial bee colony algorithm as search method. The results are presented in Table S32.

Ant Colony Optimization
The whole process is repeated for at least 15 times with ant colony optimization as search method for each parameter combination. As local search method SAHCR with iteration count 12 and no. of tweaks 5 is used. The value of ρ is set to 0.8. Tuning for two parameters of ACO: population size, and M AX IT ER is done. The default values for both the parameters are set to 20. For both the parameters values from {20, 30, 40, 50} are considered. When one of the parameters is tuned the value of other parameter is kept default. The Table S34 and S35 present the performance outcome for different values of P S and M AX IT ER respectively. For both P S and M AX IT ER best results are found for the value 40. The other parameter values are set as given in Table S33. Initially pheromone values for all the components are set to 0. While updating pheromone only the current solution and the gbest is considered.