When trained on large-scale object classification datasets, certain artificial neural network models begin to approximate core object recognition behaviors and neural response patterns in the primate brain. While recent machine learning advances suggest that scaling compute, model size, and dataset size improves task performance, the impact of scaling on brain alignment remains unclear. In this study, we explore scaling laws for modeling the primate visual ventral stream by systematically evaluating over 600 models trained under controlled conditions on benchmarks spanning V1, V2, V4, IT and behavior. We find that while behavioral alignment continues to scale with larger models, neural alignment saturates. This observation remains true across model architectures and training datasets, even though models with stronger inductive biases and datasets with higher-quality images are more compute-efficient. Increased scaling is especially beneficial for higher-level visual areas, where small models trained on few samples exhibit only poor alignment. Our results suggest that while scaling current architectures and datasets might suffice for alignment with human core object recognition behavior, it will not yield improved models of the brain's visual ventral stream, highlighting the need for novel strategies in building brain models.
We find distinct scaling trends for behavioral and neural alignment across more than 600 models. While behavioral alignment continues to improve with increasing model size and training data, neural alignment saturates—indicating that larger models and more data are insufficient to enhance correspondence with brain data beyond a certain point.
Increasing parameter count improves both neural and behavioral alignment, but efficiency varies by architectural inductive bias: convolutional networks align more rapidly per parameter than transformers and other modern variants.
Generalist datasets such as ImageNet and EcoSet closely follow predicted alignment scaling laws. In contrast, specialized or limited datasets deviate from these trends and yield only marginal improvements in neural alignment. Strong inductive biases—such as convolution and recurrence—yield higher alignment in data-scarce regimes. However, with sufficient data, the performance gap across architectures diminishes.
By fitting two-dimensional scaling laws, we estimate that optimal brain alignment is achieved when compute is allocated in a 0.3:0.7 ratio between model parameters and dataset size, respectively. Larger models alone fail to improve alignment unless paired with proportionally scaled datasets.
Higher-level regions (IT cortex and behavior) gain substantially more from increased compute than early areas (V1, V2), highlighting a graded effect of scaling along the cortical visual hierarchy.
We observe a strong correlation between validation accuracy and behavioral alignment. However, neural alignment exhibits a nonlinear trend — the gains taper off with increased performance.
@inproceedings{
gokce2025scalingprimatevvs,
title={Scaling Laws for Task-Optimized Models of the Primate Visual Ventral Stream},
author={Abdulkadir Gokce and Martin Schrimpf},
booktitle={Forty-second International Conference on Machine Learning},
year={2025},
url={https://openreview.net/forum?id=WxY61MmHYo}
}
This website is adapted from LLaVA-VL, Nerfies, and VL-RewardBench, licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Usage and License Notices: Model checkpoints are intended and licensed for research use only.