Casting: A Bio-Inspired Method for Restructuring Machine Learning Ensembles 1.0.0
The wisdom of the crowd refers to the phenomenon in which a group of individuals, each making independent decisions, can collectively arrive at highly accurate solutions—often more accurate than any individual within the group. This principle relies heavily on independence: if individual opinions are unbiased and uncorrelated, their errors tend to cancel out when averaged, reducing overall bias. However, in real-world social networks, individuals are often influenced by their neighbors, introducing correlations between decisions. Such social influence can amplify biases, disrupting the benefits of independent voting. This trade-off between independence and interdependence has striking parallels to ensemble learning methods in machine learning. Bagging (bootstrap aggregating) improves classification performance by combining independently trained weak learners, reducing bias. Boosting, on the other hand, explicitly introduces sequential dependence among learners, where each learner focuses on correcting the errors of its predecessors. This process can reinforce biases present in the data even if it reduces variance. Here, we introduce a new meta-algorithm, casting, which captures this biological and computational trade-off. Casting forms partially connected groups (“castes”) of weak learners that are internally linked through boosting, while the castes themselves remain independent and are aggregated using bagging. This creates a continuum between full independence (i.e., bagging) and full dependence (i.e., boosting). This method allows for the testing of model capabilities across values of the hyperparameter which controls connectedness. We specifically investigate classification tasks, but the method can be used for regression tasks as well. Ultimately, casting can provide insights for how real systems contend with classification problems.
Release Notes
Ensemble Connectivity Analysis
Python and R scripts evaluate and visualize the effect of P (ensemble connectivity).
Files
Save these in the same folder:
ant_ensemble_class.py # Classifier (Python)
testAllData.py # Runs experiments, outputs CSVs
modelPerformanceGraphs.R# Generates plots (R)
1. Run Python Code in Spyder (Anaconda)
Open Spyder via Anaconda Navigator → File → Open
→ testAllData.py
.
Set Working Directory in the toolbar to the script folder.
Install packages in Anaconda Prompt or IPython Console:
pip install numpy pandas scikit-learn mlxtend seaborn matplotlib
Run Script (Run
button).
This calls ant_ensemble_class.py
and creates:
- ValidationResults.csv
- VerificationResults.csv
2. Visualize in R (Anaconda)
Open R (RStudio via Anaconda Navigator or R
in prompt).
Set Working Directory:
setwd("path/to/folder")
Install packages:
install.packages(c("tidyverse","colorspace","ggpubr","scales","ggridges"))
Run script:
source("modelPerformanceGraphs.R")
Reads the CSVs and plots:
- Bias/variance vs P
- Composite metric vs P
3. Output
Plots match those in CastingModel.pdf.
Workflow:
1. Run testAllData.py
→ generates CSVs.
2. Run modelPerformanceGraphs.R
→ creates plots.