Jump to ContentJump to Main Navigation
Neuroconstructivism Volume TwoPerspectives and Prospects$

Denis Mareschal, Sylvain Sirois, Gert Westermann, and Mark H. Johnson

Print publication date: 2007

Print ISBN-13: 9780198529934

Published to Oxford Scholarship Online: March 2012

DOI: 10.1093/acprof:oso/9780198529934.001.0001

Show Summary Details
Page of

PRINTED FROM OXFORD SCHOLARSHIP ONLINE (oxford.universitypressscholarship.com). (c) Copyright Oxford University Press, 2020. All Rights Reserved. An individual user may print out a PDF of a single chapter of a monograph in OSO for personal use.  Subscriber: null; date: 26 October 2020

What neuro-robotic models can teach us about neural and cognitive development

What neuro-robotic models can teach us about neural and cognitive development

Chapter:
(p.179) Chapter 8 What neuro-robotic models can teach us about neural and cognitive development
Source:
Neuroconstructivism Volume Two
Author(s):

Olaf Sporns

Publisher:
Oxford University Press
DOI:10.1093/acprof:oso/9780198529934.003.0008

Abstract and Keywords

This chapter highlights the importance of biologically plausible mechanisms in developing robot models. It also argues for the importance of robotic models that are controlled by artificial neural networks. Such models provide a tangible way of investigating how the body, the environment and the neurocontrol system of the brain co-determine a child's unfolding abilities. The chapter illustrates this with a robotic model that learns to navigate and interact with objects in the world. Building such systems underscores the importance of attributing differential ‘value’ to different objects or actions on the world for obtaining a realistic developmental profile. In terms of neural information, processing ‘value’ can be linked to differing levels of dopamine in the brain associated with particular events and tasks. Dopamine is a neuromodulator that effects the level of plasticity in a neural network.

Keywords:   neuro-robotic models, neural development, cognitive development, dopamine, synthetic modelling, neuromodulation, reward conditioning

Introduction

The brain is a complex network continually engaged in neural dynamics and in interactions with the environment (Sporns et al., 2004). Brain activity is associated with a distinct repertoire of cognitive states, many of which are accessible to consciousness. What renders a complex network of a few hundred billion nerve cells capable of generating cognitive capacities? How does cognition grow as the network builds its connectivity and begins to interact with the surrounding world? Formulating these questions highlights two major theoretical aspects of cognitive development that are fundamental to the work discussed in this chapter: (a) cognition is a ‘network phenomenon’, a manifestation of neural activity and connectivity; (b) cognition is embodied, a consequence of the dynamic coupling between brain, body and world.

Brain networks undergo extensive growth, refinement and remodelling in the course of embryonic and postnatal development. They remain malleable and plastic for virtually the entire life of the organism. Perceptual, cognitive and behavioural capabilities at any time within an organism's lifeline are tied to the structure of its brain and body. Importantly, the architecture of the brain, as well as the morphology of the body and the statistics of the environment, is not fixed. Rather, brain connectivity is subject to a broad spectrum of input-, experience- and activity-dependent processes which shape and structure its patterning and strengths (Johnson, 2001). These changes, in turn, result in altered interactions with the environment, exerting causal influences on what is experienced and sensed in the future. Thus, whatever effects cognitive states may have is intricately linked to network structure, input statistics and embodiment.

(p.180) Recent years have seen an explosion of progress in experimental and computational neuroscience, resulting in a huge increase in the amount of knowledge about functional processes in the developing and adult brain. Some of this knowledge is beginning to be integrated with developmental psychology and cognitive science, leading towards the formulation of models and theories of autonomous mental development (Weng et al., 2001), computational developmental psychology (Shultz, 2003) and a resurgence of cognitive and neural constructivism (Quartz, 1993, Mareschal and Shultz, 1996; Karmiloff-Smith, 1998; Johnson and Mareschal, 2001). Many of the contributions in this volume highlight the various research approaches currently being undertaken in these new areas. What many of these approaches have in common is the central role given to development in the emergence of cognition, and the rejection of nativist accounts of the human mind. This theoretical stance has important consequences for strategies in the design of artificial embodied systems capable of intelligent and autonomous behaviour.

This chapter is about some of the past, present and future contributions of neuro-robotic models to our understanding of neural and cognitive development. Most of the chapter is devoted to a discussion of two emerging principles that may be useful components of theories of embodied development. After introducing some of the design features of neuro-robotic models, we first discuss the principle of value, which provides a theoretical and mechanistic framework for learning and development in embodied nervous systems. Value refers to the outcomes or consequences of behavioural interactions between the organism and robot and its environment, which may lead to the encounter of rewarding or noxious stimuli. The neural basis for value signals in the brain is provided by neuromodulation, the activation of the brain's diffuse ascending systems (such as the dopamine system related to the processing of reward) and their numerous synaptic, neuronal and behavioural effects. After discussing the principle of value, we briefly touch upon the issue of information and its crucial role in shaping neural connectivity and development. Throughout the chapter, we will focus on robot studies that help to clarify the notion that embodiment is a crucial ingredient in assessing the value of neural and behavioural patterns as well as in structuring, sampling and processing information.

Design of neuro-robotic models

Coupling brain, body and world

The design of neuro-robotic systems offers a new research methodology to study the relationship between neuronal processes and behaviour (Beer et al., (p.181) 1998; Sporns, 2002). Such models are ideally suited to capture key aspects of embodiment because their implementation quite naturally involves continuous coupling of neural states and behaviour in real time. Many neuro-robotic systems are physically instantiated, but the development of powerful software packages has allowed the implementation of some sophisticated embodied models that exist only in simulation. There is some disagreement among researchers about whether embodiment requires the physical instantiation of the model in a robot (for a discussion of this issue see Ziemke, 2004). While the issue is controversial, it seems that physical instantiation is not an absolute requirement, as long as simulated models incorporate some aspects of body morphology, movement and external environment. In a developmental context, simulated embodied systems may in some cases be easier to study as larger regions of parameter space can be explored efficiently. Also, some paradigms for self-organization of connection weights such as evolutionary algorithms can more easily be instantiated in a simulated environment. However, physically instantiated embodied models may also have distinct advantages. It is quite difficult (if not impossible) to capture all aspects of a physical environment, or of a robot body, in a simulation. Thus, even sophisticated simulations of embodied agents run the risk of missing unexpected or unanticipated interactions between brain, body and world.

What many (if not most) neuro-robotic systems have in common is that they consist of three integrated components (Figure 8.1; after Beer, 2000): (a) a simulated neuronal model incorporating anatomical and physiological properties of nervous systems, (b) an autonomous robot with a defined body

                   What neuro-robotic models can teach us about neural and cognitive development

Fig. 8.1 The embodied approach. Neural model, robot and environment are dynamically coupled and continually interact. After a similar diagram by Beer (2000).

(p.182) morphology and movement repertoire, and (c) the environment containing objects and events. These three components are dynamically and reciprocally coupled. Obviously, neural signals can cause movements of the body and thus action in the environment. It is perhaps less obvious, but equally important to note that the effects of neural states on the environment can have an impact on the statistics and nature of sensory inputs reaching the nervous system. In other words, an embodied system determines what its future inputs will be and thus imposes structure on its own sensory input space (Nolfi and Parisi, 1993; Pfeifer and Scheier, 1999; Schlesinger, 2002). To give a concrete example, consider a robot which can push (physically displace) objects in a visual scene. The action of the robot causes the sudden appearance of spatially and temporally correlated movement in the visual array, which can be used to segment the object from a background (e.g. Fitzpatrick et al., 2003; Fitzpatrick and Metta, 2003). Another sensorimotor function with very dramatic effects on the statistics of visual inputs is attention-driven saccades (camera/eye movements) which determine the direction of gaze (e.g. Breazeal et al., 2001). Depending on the state of the attentional system different kinds of objects are preferentially selected and placed in the central region of the visual array where they may be subjected to closer visual analysis. Action and perception of neuro-robotic systems form closely coupled dynamical loops and are often inseparably linked. We will return to this crucial aspect of embodied systems later in this chapter.

Synthetic modelling

The design and construction of an artificial embodied system often provides unique insights into the relationship between brain and behaviour (Reeke and Sporns, 1993; Pfeifer and Scheier, 1999). The experimenter can manipulate and record a broad range of neural and behavioural variables, including anatomical pathways, biophysical characteristics of synapses, neuronal tuning parameters, physical characteristics of the robot body, the behavioural repertoire and stimuli and events in the external world. Such models can provide insights into normal brain and behavioural processes and can also be used to illuminate brain disease and dysfunction across multiple levels of organization.

Synthetic models quite naturally embrace the idea that cognition emerges from interactions between a (neural) architecture and an environment. More complex neural properties and functions (including ‘representations’) appear as a result of experience and behaviour and of encountering particular kinds of sensory stimuli and motor challenges. Synthetic models never begin structureless, as a tabula rasa. In fact, even at the initial stage of their construction they contain abundant structure, both physical and neural. The morphology of the body, the capabilities of sensors and effectors, the kinematics and dynamics of (p.183) the motor repertoire all impose severe constraints on the kinds of interaction between the system and the environment that are possible. The structure of the nervous system, the topology of its connection pattern, the biophysical properties of synapses and cells, the capacity of connections to change or remain constant all define a specific set of structural parameters. The structure of any embodied system strongly deviates from randomness, and thus, in a statistical sense, incorporates a huge amount of information right from the start.

While synthetic models contain lots of structure, this does not amount to an endorsement of the nativist paradigm. Rather, it is absolutely essential that their structure be modifiable and expandable to deal with aspects and challenges of the environment that cannot be ‘wired in’ from the beginning. In this chapter, we will focus entirely on modifiable neural structures, and leave aside the interesting domain of modifiable body structure (Hara and Pfeifer, 2003). Neural structures may be modified through activity- or experience-dependent processes, which affect the strengths of existing connections or control their growth and structural elaboration. This modification of neural structures, viewed at a large scale and over longer developmental periods, is not a random process, but serves the overall goal of increased performance of the organism or robot. In order to ultimately generate learning and increased adaptation, there needs to be ‘matching’ between the (physical and neural) structure of the synthetic model and the structure of the environment. This matching requires that internal and external structure be compared and continuously updated. Viewed from the perspective of matching, the dialogue between the embodied system and the environment becomes the motor of development and defines its global trajectory.

Learning and development

Can we come up with a quantitative, mechanistic framework for what learning and development accomplish in terms of neural information? Neural structures can be conceptualized as networks that generate and integrate information (Tononi et al., 1998; Tononi and Sporns, 2003). Neural activations express and capture statistical features of their inputs (arriving from the environment via sensory channels, or from other neurons via connection pathways), thus generating information. This information must not remain isolated but needs to be integrated to allow the emergence of coherent cognitive states and behaviour utilizing multiple distributed sources of information.

Learning and development operate on the two main dimensions of neural function, neuronal activity and connectivity, to alter the informational capacity of the brain in accordance with the particular statistics of the environment (p.184) that this brain is exposed to. There is a plethora of candidate neural mechanisms, ranging from changes in the numbers of cells and synapses, to alterations in cellular neuronal morphology and synaptic plasticity (Singer, 1995; Katz and Shatz, 1996; Sur and Leamey, 2001). Cortical growth and maturation, including myelination of long-range connections (e.g. Paus et al., 1999), may also contribute to the capacity of neural tissue to acquire new response properties and integrative capacities. A variety of biological mechanisms of learning and development have been analysed in computational models, ranging from traditional connectionist models, where learning is often implemented as a process of convergence to a specific end state, to computational neuroscience models that are validated by direct comparison with results from experimental neurobiology.

Structural and activity-dependent changes in the nervous system do not occur indiscriminately but rather follow very specific spatial and temporal patterns. The timing of stimuli and their statistics are important ingredients in ‘directing’ learning-related changes such that improved performance in a specific task becomes integrated within the overall behavioural repertoire of the organism. Value and information are important candidate principles for learning and development which we will now explore.

Value: neuromodulation and learning

Value

Why is value such a central principle in development? Reeke et al. (1990) conceptualized value as imposing biases ‘on the outcome of previous interactions with the environment’. In order for a system to be adaptive, value signals are needed that ‘reflect the global evaluation of recent behaviour’. Friston et al. (1994) stated that ‘the value of a global pattern of neuronal responses to a particular environmental situation (stimulus) is reflected in the capacity of that response pattern to increase the likelihood that it will recur in the same context’. This definition of value highlighted its conceptual similarity to the concept of fitness in evolutionary adaptation. If we consider that neuronal response patterns can be mapped onto behaviours, value then defines the shape of an adaptive landscape (Sporns and Edelman, 1993) with peaks of high value (associated with ‘valuable’ neuronal response patterns and their concomitant behaviours) and valleys of low value (associated with neuronal patterns that do not generate value). The shape of this value landscape is determined by external factors related to the environment within which behaviour occurs and to the body structure of the behaving system, as well as by internal needs or ‘biases’.

(p.185) Value has a number of properties that set it apart from other learning mechanisms such as Hebbian learning, or back-propagation. It is essential that value acts as (a) a global signal which is (b) internally derived by the behaving system, (c) after actual behaviour has occurred. Thus, value is essentially embodied, tied to the physical structure of the robot or organism and to its actions as they unfold in time. Unlike the ‘error signals’ central to some learning algorithms such as back-propagation, value is determined in an unsupervised (or self-supervised) manner. The value signal reflects global performance of the organism, not the differential contributions of cells or synapses relative to a ‘desired output’. Value is most closely related to reinforcement learning, with a few notable differences. Unlike in most formulations of reinforcement learning, value is not derived or picked up from the environment, but is determined within the behaving system itself as a function of current and previous sensory inputs, as part of the neural and bodily context. As discussed in Friston et al. (1994), value-dependent learning is formally related to a variant of reinforcement learning, called temporal difference learning.

Many studies suggest that some environmental stimuli and events, such as rewards, noxious or painful stimuli, or stimuli that are novel or violate expectations have a greater impact on learning and development. Such highly salient inputs may include touching an object, high-contrast visual features such as blobs or edges, or moving and colourful stimuli (reviewed in Thelen and Smith, 1994). In human infants, sensitivity to highly salient sensory inputs emerges very early in postnatal development and seems capable of driving at least some forms of learning and development. The incidence of behaviours that lead to the occurrence of such salient stimuli tends to grow (e.g. Angulo-Kinzler, 2001), as these behaviours are highly ‘valuable’. Value can thus act to change the behavioural repertoire of an individual organism in a manner that channels behaviour towards regions of the adaptive landscape that produce more salient consequences.

Value systems

Value systems serve as internal correlates and mediators of value and saliency and consist of well-defined neural substrates that are found in virtually all vertebrate species. The functional roles of value systems have been studied in a number of computer simulations and robot models. Value systems were integral components of the Darwin series of developing autonomous robots (Reeke et al., 1990; Edelman et al., 1992; Almassy et al., 1998; Sporns et al., 2000). The Darwin robots learned to perform simple sensorimotor tasks (foveation and reaching) as well as visual and visuotactile categorization. Value systems have also been employed in the SAIL robot (Huang and Weng, (p.186) 2002) modelling habituation, reinforcement and novelty. Several robot designs incorporate motivational systems whose functions are related to (although generally more complex than) those of value systems. The control architecture of the humanoid robot Cog contained a motivational system regulating its behaviour, particularly its social interactions with humans (Adams et al., 2000). Similarly, drives and emotions were related to the needs and states of arousal of Kismet (Breazeal, 1998, 2002), a robot capable of social interactions with humans. The motivational system was intricately coupled to the robot's attention module and to processes involved in behavioural selection and control, especially those related to human–robot interactions.

Value systems generate signals that are used to adjust the probabilities of behaviours by modulating synaptic changes within neuronal networks through value-dependent learning. As mentioned above, a key difference with reinforcement learning is the explicit connection of value systems with specific neural structures and processes. Value systems have several characteristic anatomical and physiological properties: (a) they act by modulating neural activity or synaptic plasticity; (b) they exert their effects by delivering a diffuse, global signal over extended brain regions. In typical implementations, value systems are incorporated in the neural network architecture of the agent (robot or organism) and become active after the occurrence of specific sensory stimuli, which are often encountered as a consequence of behavioural actions. The activation of these value systems is short-lasting (phasic) and constitutes a timing (gating) signal for synaptic modification.

What constitutes a valuable or salient stimulus is itself subject to change over the course of learning and development. Value systems cannot rely on prewired inputs from sensory regions to generate their signals, but must incorporate activity- and experience-dependent processes. Friston et al. (1994) introduced the distinction between innate and acquired value (see the schematic in Figure 8.2). Innate value was viewed as evolutionarily determined, similar to an innate bias. Behaviours that satisfy homeostatic or appetitive needs, consummatory activities, or avoidance of noxious stimuli predominantly reflect priour evolutionary selection and are therefore in most cases independent of learning or experience. Such innate value, however, cannot reflect the specific configuration of the environment and cannot include stimuli that are themselves initially neutral but, in a specific environmental context, become predictive of future valuable events. Acquired value is activity-dependent and allows the value system to become sensitive to stimuli that are not by themselves able to trigger a value-related response.

Modulatory neurotransmitter systems, including dopamine, serotonin and acetylcholine, possess many of the structural and functional properties of value (p.187)

                   What neuro-robotic models can teach us about neural and cognitive development

Fig. 8.2 Schematic diagram illustrating the action of value systems. Neural signals in ‘vision’ trigger ‘motor’ activity and behaviour. Behaviour in turn results in sensory inputs. Some sensory inputs are relayed through sensory afferents to the value system where they elicit a neuromodulatory (value) response. The value signal modulates visuomotor connections, as well as sensory afferent from ‘vision’ to ‘value’. Future occurrences of the visual patterns preceding the encounter of the innately valuable sensory inputs becomes capable of triggering a value response (acquired value). Grey boxes = neural networks for vision, motor and value; thin lines = excitatory connection pathways; shaded arrow = neuromodulatory (value) connections; shaded ellipses = projection targets of the value signal (local neuromodulator concentration) where value-dependent learning takes place. Note the distinction between sensory inputs to the value system that mediate innate and acquired value.

systems, including their capacity to acquire new response characteristics in the course of experience. In biological nervous systems, neuromodulators have an extremely broad range of functions including the regulation of neuronal excitability, effects on the expression of cellular proteins, structural modifications in neurons and neural circuits, as well as modulation of synaptic plasticity (Hasselmo et al., 2002). Several of these functions have been implemented in neurocomputational models, primarily as effects on neuronal response functions, learning rates, or other model parameters (Hasselmo, 1995; Fellous and Linster, 1998). From the perspective of neural and cognitive development, the potential functional roles of diffusely projecting neuromodulatory systems in influencing the magnitude and direction of synaptic plasticity are of particular importance. Synaptic plasticity is a main mechanism for generating lasting behavioural and representational change. A developing robot, built on (p.188) biological principles, should emulate the pivotal role of neuromodulation in real brains. As a first step in this direction we focus on the actions of one neuromodulator (dopamine) in the context of one type of learning (reward conditioning).

Dopamine and reward conditioning

Recent studies of the function of mammalian midbrain dopamine system implicate one of its major components, the ventral tegmental area (VTA), in reward conditioning (Schultz et al., 1997; Schultz, 1998). Dopaminergic neurons within the VTA project to parts of the striatum, the amygdala and to widespread cortical areas, including frontal and prefrontal cortex. Dopamine modulates long-term potentiation, a major candidate mechanism for synaptic plasticity in the central nervous system. Long-term potentiation (LTP) and long-term depression (LTD) have been shown to occur in some of the afferent and efferent connections of the VTA, including plasticity of excitatory inputs to both dopaminergic (LTD) and GABAergic (LTP) cells of the ventral tegmental area (Bonci and Malenka, 1999). Also, dopamine modulates synaptic plasticity of excitatory synapses in both the VTA and the nucleus accumbens (Thomas et al., 2000; Overton et al., 1999). The functional significance of this type of plasticity in the context of reward-related behaviours is not yet fully understood. In addition to their potential roles in reward conditioning, dopamine projections to cortical areas have a significant impact on the functional topography and representations of cortical maps (Bao et al., 2001).

VTA firing patterns undergo highly characteristic changes in the course of reward conditioning. Prior to conditioning, VTA dopamine neurons show a phasic response to food rewards. Immediately after the food reward is received, dopamine neurons discharge a brief burst of spikes lasting no more than a few hundred milliseconds. In the course of learning, this response pattern changes in a characteristic manner. When a reward is preceded by stimuli which reliably predict the occurrence of the reward, dopamine neurons no longer respond to the occurrence of the (already predicted) reward. Rather, their phasic activation is ‘transferred’ to the onset of the reward-predicting stimulus, while activation at the occurrence of the primary (now completely predicted) reward becomes attenuated or disappears entirely. Moreover, if a reward is fully predicted but does not actually occur, there is a transient depression in the baseline firing rate of dopamine neurons at the time of the expected occurrence of the reward. Several models of the midbrain dopamine system have been proposed (Montague et al., 1996; Schultz et al., 1997; Suri and Schultz, 2001; Suri, 2002), emphasizing the relationship between dopaminergic responses and temporal difference learning.

(p.189) While neuromodulatory systems related to reward, such as dopamine, normally mediate the effects of naturally occurring rewarding stimuli, their function is altered in a number of brain diseases, including drug addiction (Wise, 1996; Berke and Hyman, 2000). Addiction is a chronic disorder, characterized by a persistent state of compulsive drug use, intense drug craving and cue-initiated relapse, often accompanied by the experience of serious negative consequences. Which specific brain processes cause the transition from normal reward behaviour to full-blown drug addiction is still largely unknown. A neurobiological hypothesis states that drugs may cause reorganization of neural circuitry, either through effects on synaptic strength, neuronal excitability or neuronal morphology (Hyman and Malenka, 2001). Supportive evidence suggests that cocaine exposure can cause rapid and marked increases in synaptic strengths in midbrain neurons (Ungless et al., 2001), and that drug-related plasticity can contribute to relapse in drug-seeking behaviour (Vorel et al., 2001). One of our goals in exploring reward processing in neuro-robotic models is to develop new hypotheses regarding the origin of addiction.

Structure of a neuro-robotic model of neuromodulation in learning

Because of the central role played by dopamine in a variety of cognitive processes, numerous computational models of dopamine function have been proposed. A comprehensive, or even casual, review of these models is beyond the scope of this chapter. Here we present a brief summary of some of the results we obtained by implementing a dopamine neuromodulatory system in a neuro-robotic model. In discussing our results we will focus on the issue of embodiment and the role it plays in the development of the model. In this chapter, we only provide a brief description of the design of the robot, its environment and the neural model (see Box 8.1), noting that they are described in detail in several previous publications (Sporns and Alexander, 2002; Alexander and Sporns, 2003). The interested reader should consult these original publications for additional detail.

Neuromodulation in an autonomous robot

In testing our model we used computer simulation in addition to implementing the model in an autonomous robot. First, we presented stimuli in relatively fixed temporal patterns, reproducing conditions in animal experiments in which stimuli and rewards are separated by a constant time interval. In robot (p.190) (p.191) (p.192) (p.193) experiments, constant timing can be achieved by manually presenting objects, or by conducting experiments in environments with low object density. When using such discrete learning trials, both computer simulation and experiments conducted with the robot yielded results that were consistent with the observed physiological characteristics of the mammalian dopamine system (Figure 8.4). The model was able to reproduce a broad spectrum of phenomena, including: (a) dopamine responses to unpredicted rewards; (b) transfer of the dopamine response to reward-predicting sensory stimuli (as a result of synaptic plasticity in afferents to the dopamine system itself); (c) inhibition (attenuation) of the dopamine response at the time of omission of a previously predicted rewarding stimulus; (d) dopamine responses to reward-predicting and predicted stimuli that occur either early or late; and (e) extinction. Phenomena (a), (b), (c) and (e) are illustrated in Figure 8.4 (for more data see, e.g. Alexander and Sporns, 2003).

However, little experimental data is available on the performance and temporal characteristics of the dopamine system in the context of unconstrained and autonomous behaviour. In robot experiments with a high density of objects present in the environment, we noticed that the displacement of objects that resulted from the robot's behavioural activity quickly produced non-uniform and highly clustered spatial distributions of reward (Figure 8.5). This spatial distribution altered the statistics of stimulus encounters (Figure 8.6). At the beginning, objects were still well separated and relatively constant temporal relationships between visual inputs and subsequent rewards allowed the emergence of synaptic patterns that were consistent between timed trials (Figure 8.6a, right) and autonomous behaviour (Figure 8.6b, right). As the experiment progressed, however, objects were pushed together and the timing of successive object encounters was more irregular. Specifically, rewarding objects were often encountered in rapid succession, as the robot was ‘attracted’ to object clusters. Synaptic patterns reflected this altered timing of reward and developed marked differences to patterns obtained with timed trials.

This simple set of experiments provided an indication of the potentially close and reciprocal interactions between neural plasticity and behavioural activity. (p.194)

                   What neuro-robotic models can teach us about neural and cognitive development

Fig. 8.4 Plasticity of dopamine responses in the simulation (a) and in the neuro-robotic model (b). Activity traces of appetitive taste (Tap), colour-selective visual units (Cred), and the value signal VR are shown. Markers ‘R’ and ‘S’ signal the time of onset of the primary reward (appetitive taste) and the visual appearance of the red object, respectively. Before learning, S is neutral and R is unpredicted. R triggers a phasic response of VR. After S and R have been coupled for several trials, S now predicts the occurrence of R. S elicits a VR response, R does not. If, after learning, the predicted reward is unexpectedly omitted, S still elicits a VsR response, but at the time of the expected reward, VR is decreased below zero (baseline). If the reward is omitted for some time, the reward prediction and VR response is extinguished. The plots in panel B show stacks of traces of VR (dopamine or value signal) that are temporally aligned to the onset of taste (R) for sets of 15–20 individual object encounters (trials). These response traces were obtained in robot experiments conducted with discrete learning trials, in which a single object was presented manually, allowing consistent timing between visual appearance of the object and reward delivery. Note that, under these conditions, the pattern of dopamine response (corresponding to the value signal VR) is virtually identical in simulations (a) and in the neuro-robotic model (b).

The differences we observed between timed trials and autonomous behaviour in environments with high reward density were unexpected and might not have been obtained if we had implemented the model only as a simulation with fixed stimulus sequences and timings. There are two main reasons for the (p.195)
                   What neuro-robotic models can teach us about neural and cognitive development

Fig. 8.5 Spatial distribution of objects at the beginning (a) and at the end (b) of a typical robot experiment. Motor actions of the robot resulted in a clustered distribution of reward (objects).

unexpected nature of these findings. First, the robot is navigating from object to object choosing its own path of approach, which introduces a temporal ‘jitter’ into the relative timing between the onset of the reward-predicting stimulus and the reward itself. Second, the actions of the robot alter the spatial distribution of the rewarding objects by increasing their local density, which in turn further degrades the precision of the temporal prediction. As we will argue now, this type of interaction between behaviour and plasticity form an important rationale for conceptualizing and studying development and learning as an embodied process.

Summary

Our robot experiments demonstrate that neuromodulatory systems are essential in linking behaviour and neuroplasticity. In the model, a ‘value signal’ was used to modulate the strengths of synaptic connections, thus implementing the observed functional effects of neuromodulators in neural plasticity. Importantly, value acts as a global signal affecting widespread projection areas through a burst of phasic and short-lasting activation. Thus, the signal combined spatial uniformity and temporal specificity. These properties of value systems correspond closely with the anatomical and physiological characteristics of mammalian neuromodulatory systems, including the dopamine system. This correspondence provides a promising rationale for the incorporation of such systems in computational models implemented in autonomous robots. The action of the value system in the present model, through changes in sensorimotor connections and inputs to the neuromodulatory system itself, allows an autonomous agent to learn and adaptively change its behaviour without an external ‘teacher’.

Interesting differences emerged in our model when comparing neural development during timed trials (in which behaviour is fairly stereotypic) and (p.196)

                   What neuro-robotic models can teach us about neural and cognitive development

Fig. 8.6 Changes in the statistics of object encounters and synaptic patterns during autonomous behaviour. Panels in (a) show data from a robot experiment with timed learning trials (manual feeding of objects). Panels in (b) show data from a robot experiment allowing autonomous behaviour and involving the spatial redistribution of objects (see Figure 8.5). Plots show time series of object encounters (left) and the distribution of time intervals between successive encounters (middle), as well as the temporal development of synaptic strengths for the 12 connections linking the temporal stimulus representation to the VTA (right). In the plots of synaptic strengths, black lines indicate the connection profiles at 100, 700 and 2500 (3500) time steps.

(p.197) during autonomous behaviour. In experiments involving autonomous behaviour, we observed a time-dependent alteration of an important environmental variable, the spatial distribution of reward throughout the environment, due to the behavioural activity of the robot. This alteration, in turn, had consequences on synaptic patterns encoding predictions about the occurrence of future rewards. The differences in spatial reward distribution between early and late phases in experiments with high object densities were not the result of purposeful rearrangements of the environment by either robot or experimenter. Instead they were a ‘by-product’ of sensorimotor activity displayed by the robot, its tendency to push objects around while interacting with them until they form clusters—see also the didabots of Maris and te Boekhorst (1996) and Scheier et al. (1998) for similar effects in very different robot experiments. This spatially non-uniform distribution of reward was the outcome of the coupling between brain, body and environment. As this example illustrates, this coupling was strongly reciprocal. Behaviour affected the statistics of reward timings which drove synaptic plasticity through activation of the neuromodulatory value system. In turn, synaptic changes altered the coupling between visual and motor units which affected subsequent behaviour.

Information: shaping input statistics

The discussion and the examples presented above of the action of value and neuromodulation in learning revealed a surprising degree of coupling between neural and behavioural variables. A robot's actions altered the spatial distribution of rewarding objects, which in turn impacted on the timing of rewarding stimulus encounters. This simple example highlights a fundamental developmental issue, to which we now turn briefly. We suggest that robots and organisms do not passively absorb information from their surrounding environment, but rather their actions on the environment select and shape the information.

The statistics of sensory inputs are of great importance in learning and development. Early sensory areas of the brain, including visual, auditory and somatosensory cortex, show a high degree of experience-dependent plasticity. Cortical representations reflect the history of sensory stimulation (e.g. Kilgard et al., 2001) as well as motor actions and behaviours (e.g. Nudo et al., 1996). In severe cases of sensory deprivation or over-stimulation the development of normal sensory capacities may be disrupted. The sensitivity of brain tissue to the distribution and temporal sequence of inputs gives special significance to any causal influences of embodiment on input statistics.

(p.198) Neuro-robotic models have provided several examples of how the development of specific neural and cognitive functions may depend on the actions of a robot. Scheier and Lambrinos (1996) and Pfeifer and Scheier (1999) have developed several models of perceptual categorization that utilized robot behaviour to generate inputs allowing the discrimination of objects belonging to different categories. Almassy et al. (1998) modelled the development of complex receptive field properties in visual cortex in a behaving robot. They found that self-generated movements of the robot resulted in smooth lateral displacements of objects within the robot's visual field, thus generating temporal correlations over multiple image frames that could be exploited by neurons in inferiour temporal cortex. In addition, the map of modelled neurons in inferior temporal cortex showed experience-dependent fluctuations which reflected the history and frequency of stimulus encounters. Kritchmar et al. (2000) extended this aspect of the model and recorded systematic changes in receptive field properties of object-selective visual neurons with object composition of the environment, demonstrating the experience-dependence of perceptual categorization.

Can we measure the extent to which sensory information is structured and generated? Extensive analyses of static visual data have shown that natural images have highly characteristic statistical structure (Simoncelli and Olshausen, 2001). Some of the methods used to characterize natural images can be applied to input streams obtained from robots or living creatures. For example, appropriate information-theoretical measures could be applied to streams of sensory data gathered using different sensorimotor strategies. Initial analyses have shown that simple sensorimotor functions like gaze direction and foveation can generate high mutual information and complexity in visual inputs (Lungarella and Pfeifer, 2001; Sporns and Pegors, 2003; Lungarella et al., 2005). Such informational patterns can be exploited by neural circuits and promote the stabilization of matching neural connections that incorporate recurrent statistical features. Thus, the structuring for learning and development in robots and organisms of sensory inputs through embodied action may emerge as a key principle (Sporns and Pegors, 2004).

Lessons learned from studying neuro-robotic models

What have we learned from our studies of neuro-robotic models about the specific constraints imposed by brain architecture and function, and by embodiment on neural and cognitive development? Our studies have pointed to the central role of value systems in signalling the consequences of behaviour and influencing the development of synaptic patterns in widespread regions of the (p.199) brain. Value systems are part of the brain itself. They provide means by which a specialized part of the brain can influence other brain regions to influence learning and development. It was essential in our model that value systems could modify their own inputs and thus show flexible and dynamic response properties. Value systems are also essential components of embodied systems, because they allow the system to autonomously evaluate its own (embodied) actions. Its actions may also modify its value system, through changes in connections that link sensory representations to the value system itself.

Our robot experiments lead to several experimental predictions regarding the activity and functional role of neuromodulatory systems (in particular, dopamine) in the course of ‘natural’, self-guided behaviour. We found that clusters of rewarding objects exerted an ‘attractive force’ on the robot, which resulted in restricted trajectories of robot movement and navigation as well as repeated ‘rapid-fire’ sequences of reward encounters. Disruptions of the neurobiological bases of reward processing are thought to form a major cause for lasting behavioural changes and, eventually, chronic disease (addiction) in humans. Our results suggest the hypothesis that a pattern of persistent reward-seeking behaviour may in part be generated as a result of a progressive reshaping of the environment coupled with long-lasting synaptic changes in specific neural structures. Future robot experiments will investigate this hypothesis in detail.

The effects of behaviour and motor activity on shaping the statistics of sensory inputs provide a fundamental theoretical argument for the importance of embodied cognition. The observations reported in this paper depended critically on behaviour carried out in a real environment. They could not have been predicted from the consideration of timed trials alone or from computer simulations that do not explicitly incorporate the coupling between brain and behaviour. Rather, the fact that the neuro-robotic system, through its embodied actions, was altering the stimulus distribution in the environment had a significant impact on the development of synaptic patterns and neural representations.

Conclusion

Can artificial cognition be created without development? Numerous research groups around the world are engaged in the design of robots or sophisticated software agents with the explicit goal of creating systems that are capable of some form of cognition, perhaps approaching or replicating that of humans. This research effort is in the middle of a very significant paradigm shift. Classical AI, for the most part, attempted to create intelligence and higher (p.200) cognition directly, by crystallizing knowledge and formalizing cognitive principles mostly obtained from the study of adult human beings. Essentially, this approach assumed that it is possible to side-step development entirely, relegating development to the role of a necessary but non-essential transient. An increasing number of researchers in AI, robotics and autonomous systems are diverging from this non-developmental approach. Instead, development is becoming the central research issue in the ongoing endeavour of creating machine intelligence. Weng et al. (2001) envision that the creation of robots with human capabilities will depend on developmental processes and ‘programs’ and that sophisticated cognitive functions will emerge from prolonged interactions between robots and their environments, including human teachers.

Addressing the challenges of development will require new constructive, integrative and synthetic computational models. Connectionism, which strongly emphasizes the role of learning in the creation of internal representations, has provided mechanistic models and explanations for numerous developmental phenomena. Traditional formulations of connectionism need to be made more compatible with the design of embodied systems and the central role of growth, exploration and task-independence in learning and development. We need, in addition, new connectionist modelling approaches that can deal with complex non-stationary environments and incorporate the fundamental tendency of embodied systems to act on and change their environment through sensory-motor interactions. Neuro-robotic models incorporating a blend of connectionist techniques and models that incorporate neurobiological structures and learning rules may serve as new research tools for developmental psychologists and neuroscientists interested in the behaviour of organisms from an integrated systems perspective.

Acknowledgements

Part of the work described in this chapter was supported by NIH/NIDA grant R21DA15647.

Glossary

  • Neuro-robotic model

    A model consisting of a neural simulation that is embedded in or connected to a robot. Such a model may be physically instantiated, or it may be implemented as a simulation.

  • Neuromodulation

    The functional effects of a set of neurotransmitters (called neuromodulators) on neural circuits and systems. These effects generally (p.201) lead to changes in the information-processing characteristics of these circuits, for example by altering the responsiveness of their neuronal components or the efficacy of neural connections.

  • Phasic signal

    A signal that is emitted in the form of a sharply rising and falling ‘spike’ or ‘burst’. A phasic dopamine signal consists of a temporary elevation of the discharge of dopamine neurons lasting a few hundred milliseconds.

  • Reinforcement learning

    The learning of a task, a representation or an algorithm based on reinforcement. Reinforcement refers to response-coupled events that result in the increased occurrence of the response in the future. For example, the tendency to produce an action that is reliably followed by a reward tends to be strengthened.

  • Temporal difference learning

    The learning of a task using a method for estimating the future expected reinforcement or reward. Errors in the prediction of future reward drive the learning process.

  • Ternary learning rule

    A synaptic learning rule that includes three factors: the presynaptic activity, the postsynaptic activity, and a value (or reinforcement) signal. A typical Hebbian learning rule only has two factors, the pre- and postsynaptic activities.

References

Bibliography references:

Adams, B, Breazeal, C, Brooks, R and Scassellati, B (2000). Humanoid robots: A new kind of tool. IEEE Intelligent Systems, 15, 25–31.

Alexander, WH and Sporns, O (2003). An embodied model of learning, plasticity and reward. Adaptive Behavior, 10, 141–159.

Almassy, N Edelman, GM and Sporns, O (1998). Behavioral constraints in the development of neuronal properties: A cortical model embedded in a real world device. Cerebral Cortex, 8, 346–361.

Angulo-Kinzler, RM (2001). Exploration and selection of intralimb coordination patterns in 3-month-old infants. J Motor Behavior, 33, 363–376.

Bao, S, Chan, VT and Merzenich, MM (2001). Cortical remodeling induced by activity of ventral tegmental dopamine neurons. Nature, 412, 79–83.

Beer, RD (2000). Dynamical approaches to cognitive science. Trends in Cognitive Sciences, 4, 91–99.

Beer, RD, Chiel, HJ, Quinn, RD and Ritzmann, RE (1998). Biorobotic approaches to the study of motor systems. Current Opinion in Neurobiology, 8, 777–782.

Berke, JD and Hyman, SE (2000). Addiction, dopamine, and the molecular mechanisms of memory. Neuron, 25, 515–532.

Bonci, A and Malenka, RC (1999). Properties and plasticity of excitatory synapses on dapaminergic and GABAergic cells in the ventral tegmental area. J Neurosci, 19, 3723–3730.

(p.202) Breazeal, C (1998). A motivation system for regulating human-robot interaction. In Proceedings of the fifteenth national conference on artificial intelligence (AAAI 98), pp. 54–61. Madison, WI.

Breazeal, C, Edsinger, A, Fitzpatrick, P and Scassellati, B (2001). Active vision for sociable robots. IEEE Transactions Man Cybernetics Systems, 31, 443–453.

Breazeal, CL (2002). Designing sociable robots. Cambridge, MA: MIT Press.

Edelman, GM, Reeke, GN Jr, Gall, WE, Tononi, G, Williams, D and Sporns, O (1992). Synthetic neural modeling applied to a real-world artifact. Proceedings of the National Academy of Sciences, USA, 89, 7267–7271.

Fellous, J-M and Linster, C (1998). Computational models of neuromodulation. Neural Computation, 10, 771–805.

Fitzpatrick, P and Metta, G (2003). Grounding vision through experimental manipulation. Phil Trans R Soc Lond A, 361, 2165–2185.

Fitzpatrick, P, Metta, G, Natale, L, Rao, S and Sandini, G (2003). Learning about objects through action—initial steps towards artificial cognition. Proceedings of the 2003 IEEE International Conference on Robotics and Automation (ICRA), pp. 3140–3145.

Friston, KJ, Tononi, G, Reeke, GN, Jr, Sporns, O and Edelman, GM (1994). Value-dependent selection in the brain: Simulation in a synthetic neural model. Neuroscience, 59, 229–243.

Hara, F and Pfeifer, R (2003). Morpho-functional machines: The new species—designing embodied intelligence. Tokyo: Springer.

Hasselmo, ME (1995). Neuromodulation and cortical function: Modeling the physiological basis of behavior. Behavioral Brain Resarch, 67, 1–27.

Hasselmo, ME, Wyble, BP and Fransen, E (2002). Neuromodulation in mammalian nervous systems. In M Arbib (ed.) Handbook of brain theory and neural networks, 2nd edn, pp. 761–765. Cambridge, MA: MIT Press.

Huang, X and Weng, J (2002). Novelty and reinforcement learning in the value system of developmental robots. In Prince, Christopher G. and Demiris, Yiannis and Marom, Yuval and Kozima, Hideki and Balkenius, Christian, (eds.) Proceedings of the second international workshop on epigenetic robotics: modeling cognitive development in robotic systems 94, pp. 47–55. Edinburgh: Scotland.

Hyman, SE and Malenka, RC (2001). Addiction and the brain: The neurobiology of compulsion and its persistence. Nature Reviews Neuroscience, 2, 695–703.

Johnson, MH (2001). Functional brain development in humans. Nature Reviews Neuroscience, 2, 475–483.

Johnson, MH and Mareschal, D (2001). Cognitive and perceptual development during infancy. Current Opinion in Neurobiology, 11, 213–218.

Karmiloff-Smith, A (1998). Development itself is the key to understanding developmental disorders. Trends Cog. Sci, 2, 389–398.

Katz, LC and Shatz, CJ (1996). Synaptic activity and the construction of cortical circuits. Science, 274, 1133–1138.

Kilgard, MP, Pandya, PK, Vazquez, J, Gehi, A, Schreiner, CE and Merzenich, MM (2001). Sensory input directs spatial and temporal plasticity in primary auditory cortex. J Neurophysiol, 86, 326–338.

Krichmar, JL, Snook, JA, Edelman, GM and Sporns, O (2000). Experience-dependent perceptual categorization in a behaving real-world device. In JA Meyer, A Berthoz, D Floreano, H Roitblat, SW Wilson (eds) Animals to animats 6: proceedings of the sixth (p.203) international conference on the simulation of adaptive behavior, pp. 41–50. Cambridge, MA: MIT Press.

Lungarella, M and Pfeifer, R (2001). Robots as cognitive tools: Information-theoretic analysis of sensory-motor data. In Proceedings of the 2001 IEEE-RAS international conference on humanoid robots, pp. 245–252.

Lungarella, M, Pegors, T, Bulwinkle, D, and Sporns, O (2005). Methods for quantifying the informational structure of sensory and motor data. Neuroinformatics, 3, 243–262.

Mareschal, D and Shultz, TR (1996). Generative connectionist networks and constructivist cognitive development. Cognitive Development, 11, 571–603.

Maris, M, and te Boekhorst, R (1996). Exploiting physical constraints: Heap formation through behavioral error in a group of robots. In M Asada (ed.) Proceedings of IROS 96, IEEE/RSJ international conference on intelligent robots and systems, pp. 1655–1660.

Montague, PR, Dayan, P and Sejnowski, TJ (1996). A framework for mesencephalic dopamine systems based on predictive hebbian learning. Journal of Neuroscience, 16, 1936–1947.

Nolfi, S and Parisi, D (1993). Self-selection of input stimuli for improving performance. In GA Bekey and KY Golberg (eds) Neural networks in robotics, pp. 403–418. Kluwer: Boston.

Nudo, R, Milliken, G, Jenkins, W and Merzenich, M (1996). Use-dependent alterations of movement representations in primary motor cortex of adult squirrel monkeys. J Neurosci, 16, 785–807.

Overton, PG, Richards, CD, Berry, MS and Clark, D (1999). Long-term potentiation at excitatory amino acid synapses on midbrain dopamine neurons. Neuroreport, 10, 221–226.

Paus, T, Zijdenbos, A, Worsley, K, Collins, DL, Blumenthal, J, Giedd, JN, Rapoport, JL and Evans, AC (1999). Structural maturation of neural pathways in children and adolescents: In vivo study. Science, 283, 1908–1911.

Pfeifer, R and Scheier, C (1999). Understanding intelligence. Cambridge, MA: MIT Press.

Quartz, SR (1993). Neural networks, nativism, and the plausibility of constructivism. Cognition, 48, 223–242.

Reeke, GN Jr and Sporns, O (1993). Behaviorally based modeling and computational approaches to neuroscience. Ann Rev Neuroscience, 16, 597–623.

Reeke, GN Jr, Sporns, O and Edelman, GM (1990). Synthetic neural modeling: The ‘Darwin’ series of recognition automata. Proc. IEEE 78, 1498–1530.

Scheier, C and Lambrinos, D (1996). Categorization in a real-world agent using haptic exploration and active perception. Proc SAB96, 65–74.

Scheier, C, Pfeifer, R and Kuniyoshi, Y (1998). Embedded neural networks: exploiting constraints. Neural Networks, 11, 1551–1569.

Schlesinger, M (2002). A lesson from robotics: Modeling infants as autonomous agents. In CG Prince, Y Demiris, Y Marom, H Kozima and C Balkenius (eds) Proceedings of the second international workshop on epigenetic robotics: Modeling cognitive development in robotic systems, pp. 133–140. Sweden: Lund University Cognitive Studies.

Schultz, W (1998). Predictive reward signal of dopamine neurons. Journal of Neurophysiology, 80, 1–27.

Schultz, W, Dayan, P and Montague, PR (1997). A neural substrate of prediction and reward. Science, 275, 1593–1599.

Shultz, TR (2003). Computational developmental psychology. Cambridge, MA: MIT Press.

(p.204) Simoncelli, E and Olshausen, B (2001). Natural image statistics and neural representation. Annual Review of Neuroscience, 24, 1193–1215.

Singer, W (1995). Development and plasticity of cortical processing architectures. Science, 270, 758–764.

Sporns, O, Chialvo, D, Kaiser, M, and Hilgetag, CC (2004). Organization, development and function of complex brain networks. Trends in Cognitive Sciences, 8, 418–425.

Sporns, O (2002). Embodied cognition. In M. Arbib (ed.) MIT handbook of brain theory and neural networks, pp. 395–398. Cambridge, MA: MIT Press.

Sporns, O and Alexander, WH (2002). Neuromodulation and plasticity in an autonomous robot. Neural Networks, 15, 761–774.

Sporns, O and Edelman, GM (1993). Solving Bernstein's problem: A proposal for the development of coordinated movement by selection. Child Development, 64, 960–981.

Sporns, O and Pegors, T (2003). Generating structure in sensory data through coordinated motor activity. Proceedings IJCNN 2003, 2796.

Sporns, O and Pegors, T (2004). Information-theoretical aspects of embodied artificial intelligence. In: Embodied Artificial Intelligence, Iida, F., Pfeifer, R., Steels, L., and Kuniyoshi, Y. (eds.), pp. 74–85. Springer-Verlag: Berlin.

Sporns, O, Almassy, N and Edelman, GM (2000). Plasticity in value systems and its role in adaptive behavior. Adaptive Behavior, 8, 129–148.

Sur, M and Leamey, CA (2001). Development and plasticity of cortical areas and networks. Nature Reviews Neuroscience, 2, 251–262.

Suri, RE (2002). TD models of reward predictive responses in dopamine neurons. Neural Networks, 15, 523–533.

Suri, RE and Schultz, W (2001). Temporal difference model reproduces anticipatory neural activity. Neural Computation, 13, 841–862.

Thelen, E and Smith, LB (1994). A dynamic systems approach to the development of cognition and action. Cambridge, MA: MIT Press.

Thomas, MJ, Malenka, RC and Bonci, A (2000). Modulation of long-term depression by dopamine in the mesolimbic system. J Neurosci, 20, 5581–5586.

Tononi, G and Sporns, O (2003). Measuring information integration. BMC Neuroscience, 4, 31.

Tononi, G, Edelman, GM and Sporns, O (1998). Complexity and coherency: Integrating information in the brain. Trends in Cognitive Sciences, 2, 474–484.

Ungless, MA, Whistler, JL, Malenka, RC and Bonci, A (2001). Single cocaine exposure in vivo induces long-term potentiation in dopamine neurons. Nature, 411, 583–587.

Vorel, SR, Liu, X, Hayes, RJ, Spector, JA and Gardner, EL (2001). Relapse to cocaine-seeking after hippocampal theta burst stimulation. Science, 292, 1175–1178.

Weng, J, McClelland, J, Pentland, A, Sporns, O, Stockman, I, Sur, M, Thelen, E (2001). Autonomous mental development by robots and animals. Science, 291, 599–600.

Wise, RA (1996). Addictive drugs and brain stimulation reward. Annual Reviews of Neuroscience, 19, 319–340.

Ziemke, T (2004). Embodied AI as science: Models of embodied cognition, embodied models of cognition, or both? In:F. Iida et al. (eds.), Embodied artificial intelligence, Lecture Notes in Artificial Intelligence 3139, pp. 27–36. Springer: Berlin.