Since the beginning of my PhD, I’ve been interested in a quantitative understanding of intelligence, both artificial and biological — and in the possible relationship between the two. I am mainly focused on neural network models, which are a canonical model for neural computation in the brain and are a central part of many modern artificial intelligence systems.
Such artificial neural networks with deep architectures have dramatically improved the state-of-the-art in computer vision, speech recognition, natural language processing, and many other domains.
Despite this impressive progress, artificial neural networks are still far behind the capabilities of biological neural networks in most areas: even the simplest fly is far more resourceful than our most advanced robots. This indicates we have much to improve!
At the same time, if we wish to understand biological neural networks we must first be able to understand learning in the simplest non-linear artificial neural network – which still remains a mystery.
Over the years my research aimed to uncover the fundamental mathematical principles governing both types of neural networks. Since I started my faculty position in the Technion, I focused on artificial neural networks, in the context of machine learning.
My research so far covered many aspects of neural networks and deep learning. See below more information on a few open questions that interested me during my academic life.
Modeling and analysis of
There are several open theoretical questions in deep learning. Answering these theoretical questions will provide design guidelines and help with some important practicals issue (explained below). Two central questions are:
Why is it happening?
For example, as can be seen in the figure below from Wu, Zu & E 2017 , polyomial curves (right) tend to overfit much more than neural networks (left):
Why is it happening?
There are many practical bottlenecks in deep learining (the following fifures are from Sun et al. 2017.Such bottlenecks occur since neural networks models are large, and keep getting larger over the years:
How can we train and use neural networks more efficiently (i.e., better speed, energy, memory), without sacrificing accuracy? See my talk here (in Hebrew) for some of our results on this.
How can we decrease the amount of label data required for training?
Can we find automatic and robust method to find the “optimal” hyper-parameters?
Neuroscience datasets are typically very challenging. They are usually very noise, of limited duration, and are affected by many unobserved latent variables. Analyzing and modeling these datasets becomes more and more challenging over the years, since the number of recorded neurons increases exponentially, similarly to “Moore’s law” (Figure from Stevenson&Kording 2011):
In order to analyze neuroscience data, certain inference tasks are typically necessary to be able to interpret the data:
Can we infer the “spiking” firing patterns of each neuron from the observed movie? This includes automatic localization of each neurons, demixing of signals from nearby neurons, denoising and deconvlusion of the observed fluorescence to obtain the original “spikes”.
Connectivity estimation.Given the activity patterns of various neurons in the network, can we infer their synaptic and functional connectivity?
Efficient simulation.Accurately simulating large network models, or even highly detailed single neuron models can be very slow and inefficient. Can we improve the simulation methods?
A central issue in neuroscience is to find the “appropriate” level of modeling: in every level and component of the nervous system we find complex biophysical machinery that affects their functional input output relation. There are many possible levels of modeling:
What is the simplest neural network model that reproduces this phenomenon, and produces useful (disprovable) predictions? Can we infer from this model the “purpose” of this neural circuit?
Given such biophysical and functional complexity, how can we build and analyze useful neural models, with meaningful predictions?
How can memory of past events be retained in the brain despite these large changes? Can we find the simplest “effective” input-output relation that can be used to model single neurons?