Skip to content

efficomp/vegetto

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Vegetto

Vegetto is a DEAP-based evolutionary procedure designed to solve multi-objective optimization feature selection problems.

Specifically, a wrapper has been designed where NSGA-II is used as search strategy, while k-NN is used as classification algorithm for the evaluation of potential solutions.

This wrapper is designed with two objectives in mind: to reach solutions as close as possible to the global optimum and to perform the computation in an efficient way. For the latter, four efficient implementations of k-NN have been developed in C++, where the data conversion between both languages is carried out with the Pybind11- library. If maximum efficiency is desired, the last version of k-NN should be chosen, since this is a mechanism that chooses the most optimal version depending on the number of selected features.

Documentation

Vegetto is fully documented in its github-pages. You can also generate its docs from the source code. Simply change directory to the doc subfolder and type in make html, the documentation will be under build/html. You will need Sphinx to build the documentation.

Usage

The bash script main_script.sh is in charge of executing the wrapper. Apart from the installation of the libraries found in the requirements.txt file, it's necessary to install the MonboDB library.

All the hyperparameters configuration can be found in the config.xml file. Concretely, these hyperparameters are:

  • FolderDataset - Folder name where the training and test data are stored. This folder must be located inside db and the dataset has to be split into four different files: training data, training labels, test data, and test labels. These files have to be in the .npy format. In the db folder there are some examples of datasets, which belongs to the UCI repository.
  • Features - Number of features of the dataset used to execute the wrapper.
  • Executions - Number of executions.
  • Individual - Number of individuals in each subpopulation.
  • GenerationsConvergence - Interval of generations to be analyzed to check if convergence has been reached.
  • MaximumGenerations - Maximum number of generations that the wrapper is executed.
  • SubPopulations - Number of subpopulations.
  • Migrations - Number of migrations. If this hyperparameter is set to 0, the number of migrations will depend on the number of total generations required to reach the convergence.
  • EvaluationVersion - (1) Sklearn k-NN; (2) KNN_O1; (3) KNN_O2; (4) KNN_O3; (5) KNN_O4;
  • FitnessEvolution - Boolean hyperparameter that allows to store in the database the fitness evolution along generations.
  • PercentageFS - Percentage of selected features in the individuals of the initial population.
  • AccuracyConvergence - Threshold to determine the convergence according to the difference between the mean validation Kappa coefficient of the individuals in the population obtained in the first and last of those GenerationsConvergence generations. If this hyperparameter is set to -1, the number of generations carried out will be equal to GenerationsConvergence * Migrations + 1.
  • SDConvergence - Threshold to determine the convergence according to the standard deviation of the Kappa coefficient of the individuals in the population obtained in all GenerationsConvergence generations. If this hyperparameter is set to -1, the number of generations carried out will be equal to GenerationsConvergence * Migrations + 1
  • K - Number of neighbors used in the k-NN classification. If this hyperparameter is set to -1, K will be equal to the square root of the number of training samples.
  • ExperimentName - Experiment name.
  • CrossoverProbability - Probability to apply crossover.
  • MutationProbability - Probability to apply mutation.
  • Grain - Probability that an individual belonging to the pareto front is selected to be a migrant.
  • Period - Number of generations elapsed between migration process.
  • DecisionFeatures - Threshold of number of selected features to choose between KNN_O2 and KNN_O3.
  • ProjectPath - Project path where vegetto is located.

Finally, to use either k-NN versions, it is necessary to compile the C++ code, which is located in the KNN_C folder. The command to be executed is:

g++ -O2 -Wall -funroll-loops -march=native -fopenmp -fno-builtin-malloc -fno-builtin-calloc -fno-builtin-realloc -fno-builtin-free -shared -std=c++11 -fPIC $(python3 -m pybind11 --includes) library.hpp library.cpp -o knn_library$(python3-config --extension-suffix)

This command will create a .so object that has to be located in the src folder. Next, the line #from knn_library import * in the ag.py file can be uncommented.

Acknowledgments

This work was supported by project New Computing Paradigms and Heterogeneous Parallel Architectures for High-Performance and Energy Efficiency of Classification and Optimization Tasks on Biomedical Engineering Applications (HPEE-COBE), with reference PGC2018-098813-B-C31, funded by the Spanish Ministerio de Ciencia, Innovación y Universidades, and by the European Regional Development Fund (ERDF).

License

GPLv3 © 2020-2021 EFFICOMP.

About

Vegetto is a DEAP-based evolutionary procedure designed to solve multi-objective optimization feature selection problems.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors