Top-down vs. bottom-up approaches to computational modeling of vision

Tuesday, October 26, 12:00 – 14:00 ET


This event has already occurred. You can watch a replay here.


Computational models fulfill many roles including theory specification, causal explanation, prediction, and visualization. Over the last decade, modeling has become increasingly elaborate and complex. While more complex models are impressive in their predictive power, they can be theoretically underwhelming, and are difficult to relate to underlying neural circuitry. In this session the goals, strengths, and weaknesses of a wide variety of state-of-the-art modeling approaches will be compared.

This special session is being hosted by the Clinical Vision Sciences Technical Group and the Applications of Visual Science Technical Group along with the Fall Vision Meeting Planning Committee.

Invited Speakers:

  • Mark Lescroart, University of Nevada, Reno

  • Tatyana Sharpee, Salk Institute for Biological Studies

  • Fred Rieke, University of Washington

Moderator:

  • Li Zhaoping, University of Tübingen

  • Ione Fine, University of Washington


Limits of prediction accuracy on randomly selected natural images for model evaluation

Mark Lescroart, Cognitive & Brain Sciences, University of Nevada, Reno

Prediction accuracy on held-out data has become a critical analysis for quantitative model evaluation and hypothesis testing in computational cognitive neuroscience. In this talk, I will discuss the limits of prediction accuracy as a standalone metric, and highlight other considerations for model evaluation and interpretation. First, comparing two models on prediction accuracy alone does not reveal the degree to which the models have common underlying factors. I will advocate addressing this issue with variance partitioning, a form of commonality analysis, to reveal shared and unique variance explained by different models. Concretely, I will show how variance partitioning reveals representation of body parts and object boundaries in responses to multiple data sets of movie stimuli. Second, prediction accuracy is a metric for the variance explained by a given model. But for any experiment, the stimulus constrains the variance in the measured brain responses. Any given stimulus set runs the risk of excluding important sources of variation. A popular way to address this issue is to use photographs or movie clips as stimuli. Such naturalistic stimuli are typically sampled broadly from the world and thus have increased ecological validity, but random selection of natural stimuli often results in correlated features both within and between models. This often leads to ambiguous results, e.g. shared variance between models intended to capture different types of features. Furthermore, I will show that the same models (again of body parts and object boundaries) can yield quantitatively and in some cases qualitatively different results when applied to different data sets. This raises a critical question: if results for the same model vary across stimulus sets, which result provides a more solid basis for future work? Just as two clocks telling different times need a reference clock to be set, I will argue that we need broadly sampled sets of natural stimuli to use as a baseline for what, in various feature domains, constitutes "natural” variation and covariation. I will describe our collaboration to create just such a dataset of human visual experience, in the form of hundreds of hours of first-person video.


Elements of cortical computation that enhance robustness of visual object recognition

Tatyana Sharpee, Computational Neurobiology Laboratory, Salk Institute for Biological Studies


Retinal encoding of natural images

Fred Rieke, University of Washington