Sparse Distributed Memory as a tool for Conscious Software Agents

 

Ashraf Anwar and Stan Franklin

Institute for Intelligent Systems

The University of Memphis

 

Abstract

 

SDM (Sparse Distributed Memory) is a content addressable, associative memory technique which relies on close memory items tending to be clustered together, with some abstraction and blurring of details. We use the auto-associative version of SDM as an associative memory in the conscious software agent, CMattie. SDM retrieves the behaviors and emotions associated with an incoming percept. This association relies on similar percepts having occured in the past and having been associated with some behaviors and emotions. So, observing some percept later on should trigger into attention the previous behaviors taken and emotions aroused when similar percepts were observed in the past. Each perception register contains some seminar key value like seminar organizer, speaker, date, location, etc… The results obtained so far are good and promising. Some other possible use for Sparse Distributed Memory in CMattie is the disambiguation of each perception register by removal of some inherent noise or misspells.

 

 

Autonomous, Cognitive, and Conscious Agents

 

I find the definition of an autonomous agent in Franklin () one of the most accurate definitions reflecting the broad sense of the word. According to Franklin, an autonomous agent is an agent which is situated in an environment, senses it, acts upon it overtime such that its actions may affect what the agent senses next, and its actions are in pursue of its own agenda. See Franklin (1995, 1997), Franklin and Graesser (1997), and Russel and Norvig (1995) for more detailed discussion and examples.

 

A cognitive agent, on the other hand, is an autonomous agent which -or who- has some of the cognitive capabilities : problem solving, planning. learning, perception, emotions, ...etc (Franklin and Graesser, 1997).

 

A conscious agent is a cognitive one with the extra functionality of consciousness built in. We adopt the definition of consciousness from Baars (1995).

 

According to the above definitions, we have a broad range of autonomous agents ranging from simplest ones, e.g., a thermostat, to the most sophisticated ones, humans. Our agent, Cmattie, is considered to be a conscious agent, which implies that it is cognitive and autonomous as well.

 

 

A Tour with SDM

 

SDM is the work of Kanerva, see Kanerva (1990) for more details and Anwar (1997, chapter 2) for somewhat brief overview of its working. The auto-associative version of SDM is truly an associative memory technique where the contents and the addresses are from the same space and used alternatively, see Kanerva (1990) and (). The inner workings of SDM rely on large binary spaces. The dimension of the space determines how rich is each word. Another important factor is how many actual memory locations are there in the space, number of hard locations. Features are represented as one or more bits. Groups of features are concatenated to form a word which becomes a candidate for writing into SDM. When writing copy of this binary string is placed in all close enough hard locations. When reading, a close enough cue would reach all close enough hard locations and get some sort of aggregate or average out of them. Reading is not always successful. Depending on the cue and the previously written information, among other factors, convergence or divergence during a reading operation may occur. If convergence occurs, the pooled word will be the closest match (with abstraction) of the input reading cue. On the other hand, when divergence occurs, there is no relation -in general- between the input cue and what is retrieved from memory.

 

 

SDM from the Inside Out.

 

The addresses of the locations N ` of a sparse memory are a uniform random sample of the address space N which is made of the 2n possible binary addresses, where n is the dimension of the space. N ¢ will be called set of hard locations to emphasize that they are physical locations. The distance between locations means distance between corresponding addresses.

 

Nearest N ¢-neighbor x¢ to an element (address) x of N, is the most similar element of N ¢ to x. If X Ì N, then X ¢ Ì N ¢ is the set of the nearest N ¢-neighbors of elements of X :

 

X ¢ = { x¢| x Î X }

 

Distance of the nearest location d(x,x¢):

 

N(d) = Pr{ d(x,y) £ d }, for arbitrary points in the binary space x and y

 

N¢(d) = Pr{ d(x,x¢) £ d }

= 1 - [ 1 - N(d) ]N¢

= 1 - [ 1 - N¢ * N(d) / N¢ ]N¢

@ 1 - e-N¢ * N(d),

 

 

N(d) @ -ln[ 1 - N¢(d)] / N¢ , which implies that:

d @ N-1[ -ln(1 - N¢(d)) / N¢]

 

The median of the distance d(x,x¢) = N-1[ -ln(0.5) / N¢].

 

For n=1000 and 10000 hard locations, the median is 424. That median is a good measure for the distance between a random point of N and the hard location nearest to it.

 

The nearest neighbor method

X is random set of 10,000 words of {0,1}1000. There are 106 hard locations to store X in. The goal is to find the stored word that matches a test word the best. We store each word z of X in its nearest hard location .

 

However, to find the best match to a test word z , if we read the nearest occupied hard location z¢, it would not, in general, contain the best match or even probably a good match.

 

z

Fig 1: The distance from z to the location , where the best-matching word z is stored. The "unknown" distance d(z,) is approximately 454 bits.

 

 

For example, in the above figure, z is a test word. z is the element of X most similar to z. d(z , z) = 200 (i.e. they are quite similar ). Assume that d(z , ) = 424 (median distance). Then using the third side of a triangle rule, d(z , ) @ a + b - 2*A*b / n = 454 bits. About 0.0017 N¢ = 1700 hard locations, are within 454 bits of z. About 10-2 of them (17) locations are occupied by elements of X. Hence,

Pr { nearest X¢-neighbor of z = | d(z , z) = 200 } @ 1/17

 

It can be shown using the third side of a triangle rule that two dissimilar addresses u , v can refer to the same hard location, i.e., u¢ = v¢

 

Things become worse when data other than the addresses themselves are stored in memory

(<z,h> instead of <z,z>). The probability of success in our example is 1/17. We cannot even tell, by reading from memory, whether the outcome is a success or failure (which corresponds psychologically to knowing whether one knows).

 

How is it Distributed Memory?

 

Many Storage locations participate in a single write or read operation. So, data will be retrieved on the basis of similarity of address. If <z,h> is a stored address-data pair then reading from an address x that is sufficiently similar to z retrieves a word y that is even more similar to h than x is to z. The similarities are comparable since the address to memory and the data are elements of the same metric space N.

 

How it Works?

 

The storage locations and their addresses are given from the start, and only the contents are modifiable. The threshold of the address decoders can even be fixed.

 

Access Circle, O¢(x), is the set of hard locations O¢(r,x) in the circle O(r,x). In this notation we are omitting the access radius r which is fixed to the permissible distance range of contributing (accessible) locations in Read/Write operations. Note that O¢(x) = N¢ Ç O(x). For example, if r = 0.001, then 0.001 of N (and 0.001 of N¢ on the average) is accessed at once. From table 2.1.1-1, a circle of radius of 451 bits covers 0.001 of the space, and so the access radius r0.001 = 451 bits.

 

Most locations in the access circle are quite far from the center. The location closest to the center is 424 bits on the average (median distance).

 

The average distance from the center to the 1000 locations of the access circle is 448 bits (a circle with 448-bit radius encloses 0.001 / 2= 0.0005 of the space) which is only three bits short of the maximum distance.

 

Access Overlap, I¢(x,y), is the set of hard locations accessible from both x and y, I¢(x,y) = O¢(x) Ç O¢(y). The mean number of hard locations in this access overlap depends on: 1- size of access circle

2- distance d(x,y) between the centers

Content of a location, C(x¢), is the multiset of all the words that have ever been written to it, mapped somehow.

Writing in x¢ means adding the written word, h, to the multiset of words C(x¢) contained in x¢, C(x¢) = C(x¢) y [h]. Writing word h at x means writing h in all the (1000) hard locations accessible from x.

 

Data at x, D(x), is the pooled contents (multiunion) of all locations accessible from x, D(x) = y C(y¢) , y¢ Î O¢(x).

 

 

Table 1: Mean number of hard locations in access overlap of two circles with radii r0.001 = 451 in a 1,000-dimensional memory with 1,000,000 locations.

d is the distance, in bits, between the centers of the two circles.

 

Source: Kanerva (1990)

 

If the word h has been written with the address x, the multiset D(x) -when reading at x- contains |O¢(x) Ç O¢(x)| copies of h, one from each location accessible from both x and x.

 

Reading at x means taking a representative (element of N) of the data at x. Word at x, W(x) is a properly chosen representative of D(x).

 

Finding the best match requires :

1-not too many words have been stored, sparse

2-the first reading address (test pattern) is sufficiently close to the writing address of the target word.

 

Storing the entire data set (104) each in 1000 locations, means that some 107 words are stored in memory. This gives our first estimate for the capacity of a storage location which is 10 words per location.

 

Reading will pool the data of about 1000 locations, yielding a multiset D(x) of about 10,000 words.

 

|X| = 10,000

|O¢(x)| @ 1000

|C(x¢)| @ 10

|D(x)| @ 10,000

C(x¢) = X Ç O(x¢)

 

A representative of the pooled data D(z) - when reading at test word z - is obtained by computing an element of N that is an archetype of the pooled data D(z) but not necessarily an element of it. We will take the average of the words of D(z) (majority rule) . This average is the best representative of D(z) in the sense that it is the word of N with the smallest mean distance to the words of D(z). The ith bit of the average is given by summing over the ith bits of the words in the pooled data and then thresholding with half the size of the pooled data:

Wi(z) = 1 iff S x i ³ |D(z)| /2 , x Î D(z). That representative is a good one as long as the words written in memory are a random sample of N.

 

Convergence to the Best-Matching Word

 

When we read at x, if we have previously written the word x at x, we retrieve 1000 copies of x in addition to about 10,000 copies of other words for a total of 11,000 (10,000 if reading at a random address). However, the other words come mostly in ones or in very small groups, since the intersection of the read circle O¢(x) with the write circle O¢(x), for most x in N and in X, is about 0.001 of O¢(x), or just one hard location. Against such background noise, the weight of 1,000 is sufficient for the retrieval (reconstruction) of x.

Pr {guessing 1 bit correctly} @ 1 - 10-22

Pr {guessing all 1000 bits correctly}

@ (1 - 10-22)1000 @ 1 - 10-19

Thus, we are nearly certain that W(x) = x

 

When we read starting from test word z, the new distance to the target d(W(z), x) depends on the old distance, d(z, x). See figure 2.4.2-1 next page. Iterated reading fails to converge to the best-matching word if the original distance, d(z, x), is too large. In our example, a test word more than 209 bits (critical distance) from the target will not, in general, find its target. The reading sequence will diverge until it becomes near 500 (indifference distance). See the figure next page.

 

Chance convergence might happen if an initially diverging sequence converge to a random word of the data set X. This is characterized by a very long expected time to convergence. Expected time of chance convergence is extremely long. In our example, the probability that a random point of N is within the critical distance of 209 bits of the nearest point of X is something like 10-50, and so the expected time of chance convergence to some point of X is about 1050 iterations.

 

Critical Distance is the distance beyond which divergence is more likely than convergence (209 bits in our example). As more words are stored in memory, the critical distance decreases until it reaches zero, and thereafter it vanishes, meaning that stored words are no longer retrievable, i.e. there is no convergence anywhere.

 

Fig 2: New distance to target as a function of old distance

Source: Kanerva (1990)

 

Fig 3: Converging sequence (x --> x`), and diverging sequence y --> ?

Source: Kanerva (1990)

 

Rates of convergence and divergence decrease as we get closer to the critical distance. When we are sufficiently far from the critical distance, convergence to the target or divergence to random indifferent points is rapid (fewer than ten iterations, as a rule). The comparison of adjacent items of a sequence soon reveals whether the sequence will converge or (initially) diverge.

 

Memory Capacity T is the size of the data set for which the critical distance is zero.

T = N¢ / H(n)

where H(n) = [ F-1 (1/ 21/n) ]2

For n = 1000, H(n) = 10.22 @ 10

Hence, T = N¢ / 10 (one-tenth of the number of hard locations)

See Table 2 below.

 

Table 2: Capacity of sparse distributed memory with N` storage locations.

 

Source: Kanerva (1990)

 

 

Full memory is a memory filled to capacity. When writing word x in a full memory, the probability of reading x at x is, by definition, 0.5. Overloaded Memory is a memory filled beyond capacity. When writing word x in an overloaded memory, the probability of reading x at x is less than 0.5. In both full and overloaded memories, the probability of reading x from a point only one bit far is quite small (words forgotten because of increased noise), and a sequence of successively read words diverges rapidly.

 

Capacity of a storage location can be calculated as follows :

 

Total number of words in memory

= average number of points stored in each hard loc. * number of hard loc.

= (T * p) * N¢ , where access radius is rp

 

Average number of words per location (Capacity) = T * p = (N¢ / 10) * p

= 100 words (p= 0.001)

 

Since the average word of the pooled data can be computed from n-bit sums, a bit location can be realized as a counter that is incremented by 1 to store 1 and decremented by 1 to store 0. A bit location that can store the integers [-50,50] will suffice. The range of the values can be reduced to perhaps as little as [-10,10] by :

1- reducing the size of the write circle

2- not attempting to fill the memory to its capacity

 

After writing word x , each subsequent writing near x will modify some of the 1000 copies of x . A location survives one write operation with probability q = 1 - p = 1 - 0.001 = 0.999. Hence,

Pr { a location survives L write operations } = qL

 

Some Psychological Aspects

 

Indication of knowing that one knows is indicated by fast convergence of read operations.

 

Tip-of-the-tongue state corresponds to being about the critical distance (slow rate of convergence).

 

Rehearsal is done by writing an item many times in memory.

 

A full or overloaded memory could support momentary feelings of familiarity that would fade away rapidly, as if one could not maintain attention.

 

Sparse Distributed Memory Construction

 

Addressing : The memory will be built of addressable storage locations.

A location is activated whenever a read or write address is within a certain number of bits of the location’s address.

 

Storage : A storage location has n counters, one for every bit.

 

Writing : To write 1 in a bit means to increment the counter. To write 0 in a bit means to decrement the counter or to do nothing.

 

Reading : Retrieval is done by pooling the contents of the storage locations activated by the read address, and then finding for every bit whether zeros or ones are in the majority.

 

Address Decoding : A storage location should be accessible from anywhere within rp bits of the location’s address. Then, the linear threshold neurons can be used for address decoding. The threshold for every address-decoder neuron would be fixed to rp units below the decoder address (maximum weighted sum), i.e. S - rp .

 

In computing the average word, the pooled bit sums are compared with a certain threshold which is the mean bit sum (add 1 for one and -1 for zero) over all the data stored in memory if writing 0 is decrementing the counter, or the mean count of ones over all the data stored in memory if writing 0 is doing nothing.

 

Each bit location should have 3 lines : address-decoder selection (in), write (in), and read (out).

 

Since many bit locations are pooled to form a single output bit from memory, they must be connected to a common output line.

 

Writing the data corresponds to using a matching network that takes one input line and distributes it to the same bit locations that are pooled for a single output bit.

 

We can use the same wire for both input and output. Alternatively, we

can use matched pair of separate corresponding input and output lines.

 

Autonomous Learning System Organization

 

Our autonomous learning agent will function independently, interact with its environment and record its interaction to have the potential for learning and adaptation. It will use sparse distributed memory.

 

Binary vectors will stand for patterns of binary features. The mathematics generalize to patterns of multivalued features. The most important thing is that, the number of features must be large.

 

A pattern can be used both as an address and as a datum, a sequence of patterns can be stored as a pointer chain.

 

Addressing the memory need not be exact. The address patterns that have been used as write addresses attract, meaning that reading within the critical distance of such an address retrieves a pattern that is closer to the written pattern than the read address is to the write address. Three to six iterations will usually suffice to retrieve original patterns.

 

When similar patterns (an object viewed from different angles and distances) have been used as write addresses, the individual patterns written with those addresses cannot be recovered exactly. What is recovered, instead, is a statistical average (abstraction) of the patterns written in that neighborhood of addresses. The object is considered to occupy a region of the pattern space with poorly defined boundaries (concept).

 

Modeling the World

 

Many things appear to be learned by nothing more than repeated exposure to them (learn from experience). Learning is model building. We build an internal model of the world and then operate with the model. That modeling is so basic to our nature that we are hardly aware of it.

 

The modeling mechanism constructs objects and individuals. A person is constantly changing and our view of him are different at different times, yet we perceive him as "that person".

 

Operating with the model is like operating with a scale model. The model mimics actions and interactions of objects and individuals. The more experience we have, the more faithfully are the dynamics of the world reproduced by the model.

 

The model simply captures statistical regularities of the world, as mediated by the senses, and is able to reproduce them later. Our world model includes ourselves as a part. We can prepare ourselves for a situation by imagining ourselves in the situation.

 

Subjective experience produced by the outside world is of the same quality as that produced by the internal model of the world. Our internal and external "pictures" merge without our being aware of it. We scan our surroundings for overall cues and fill in much of the detail from the internal model. However, when something unusual happens, we begin to pay attention. We are altered by the discrepancy between the external report of what is happening and the internal report of what should be happening on the basis of the past experience. Moreover, the internal model affects our perception profoundly, without our being aware of it (prejudgments).

 

Storing the World Model in SDM

 

At any given moment, the individual is in some subjective mental state. A flow of these states (sequence) describes the individual’s subjective experience over time. The state space for the world is immense in comparison with that for an individual’s experience.

 

Individual’s sensory information at a moment is represented as a long vector of features. A sequence of such vectors represent the passage of time.

 

Since information supplied by the senses and information supplied by the memory can produce the same subjective experience, they are both fed into some common part of the architecture, the focus.

 

Sequence of patterns in the focus represents the system’s subjective experience about the world over time (See figure 4).

 

Since sequences are stored as pointer chains, the patterns of a sequence are used both as addresses and as data, i.e., Focus = MAR + MDR.

 

 

Fig 4: Senses, Memory, and Focus in SDM

Source: Kanerva (1990)

 

 

 

 

 

 

Fig 5: Organization of an autonomous system using SDM

Source: Kanerva (1990)

 

 

The world model is updated by writing into the memory as follows :

 

1. The pattern held in the focus at time t is used to address the memory, activating a set of memory locations.

 

2. The response read from those locations is the memory prediction of the sensory input at time t+1.

 

 

3. If the prediction agrees with the sensory input, there is no need to adjust the memory, and the read pattern simply becomes the contents of the focus at time t+1.

 

4. If the prediction disagrees with the sensory input, a third correct pattern is computed from them (average) and it becomes the contents of the focus at time t+1. However, before it is used to address the memory at time t+1 , it is written in the locations from which the faulty output was just read (the locations selected at time t).

 

As the correction patterns are written into memory over time, the memory builds a better and better model of the world, constrained only by the senses ability to discriminate and the memory capacity to store information.

 

Including Action in the World Model

 

The system needs to act and learn from its interaction with the world. To act, the system needs motors (effectors). To learn, the system must model its own actions.

 

Learning to perform actions means learning to reproduce sequences of patterns that drive the muscles. Thus, the system’s own actions can be included in the world model by storing motor sequences in memory in addition to sensory sequences.

 

Since the way in and out of the memory is through the focus, the system motors should be driven from the focus.

 

As the system’s subjective experience is based on the information in the focus, deliberate action becomes part of the system’s subjective experience without the need for additional mechanisms (See figure 5).

 

Some components of the focus (50-80%) correspond to and can be controlled by the system sensors. Others (10-20%) drive the system motors. The focus could have components with no immediate external significance (status and preference function). All components of the focus can be controlled by the memory.

 

Retrieving well-behaved sequences from the memory to the motor part of the focus would cause the corresponding actions to be executed by the system.

 

Cued Behavior

 

Assume that 80% of the focus is for sensory input and 20% for motor output. Assume that the stimulus sequence <A,B,C> is to elicit the response sequence <X,Y,Z>, with A triggering action X after one time step and so on.

 

The pattern sequence that needs to be generated in the focus is <Aw, BX, CY, dZ> . In each pair, the first letter stands for sensory-input section and the second for motor-output section of the focus.

 

If <AW, BX, CY, DZ> has been written in memory, and A is presented to the focus through the senses, then BX is likely to be retrieved from the memory into the focus

(d(Aw, AW) £ 0.2 n £ critical distance 209, for n = 1000).

 

This means that action X will be performed at the time at which B is expected (predicted) to be observed.

 

Now, if the sensory report agrees with B, then BX will be used as the next memory address and CY will be retrieved, causing action Y, and so on if agreement persists.

If A controls significantly less than 80% of the focus, or if the cue is not exactly A but a similar pattern A¢, then we may not be sufficiently close to the original write address AW to retrieve BX or something similar to it. To read BX, it is then important that the action part w be similar to W.

 

Now, if W means that the system is paying attention and is waiting for a cue, and w means that the system is performing some other action. AW or A¢W will retrieve BX (the system was waiting for the cue). But, Aw or A¢w will not (the system was not waiting). This means that the system will respond properly to a cue only if it is waiting for the cue, i.e. the response depends on the state at the time the cue is presented.

 

If after BX has been read from memory, the senses input is suppressed, the focus will be controlled entirely by the memory, and the rest of the sequence will be recalled and the actions will be completed. But, if the agent senses feed the sequence <A,B,K,L> where K and L are quite different from C and D (sudden change), then BX will retrieve CY -action Y is executed (inertia)- and senses report K instead of C which was expected to be sensed. The next contents of the focus will be HY instead of CY (H is some combination of C and K that, in general, is quite different from C). Thus DZ will not be retrieved (failure of action Z). This failure can be explained as follows:

 

1. an environment monitoring system ceases to act when the proper cues are no longer present.

 

2. a system that monitors its own actions effects, stops acting when the effects no longer confirm the system

expectations.

 

Since the pattern retrieved from the memory includes an expectation of the action results, the memory can be used to plan actions .

 

The system will initiate the "thought" in the focus and then block off the present (ignore environmental cues and suppress action execution). The memory will then retrieve into the focus the likely consequences of the contemplated actions.

 

Learning to Act

 

The model goodness is judged by how well it predicts the world. When the model predicts incorrectly, it is adjusted. Action correction is much harder than sensory correction. There is no external source feeding correct action sequences into the focus.

 

The action sequences have to be generated internally. They have to be evaluated as to their desirability and should be stored in memory in a way that makes desirable actions likely to be carried out and undesirable actions likely to be avoided.

 

Initial Conditions for Learning

 

We are born with built in preferences (food satisfaction) and dislikes (hot, pain) and instinctive ways to act (automatic reflexes). The learning for such actions is passed to the individual as a part of the individual’s genetic endowment. Given such preferences (desirable states) and dislikes (undesirable states), we can define desirable and undesirable actions according to the states to which they lead. Some patterns in the focus (states) are inherently good or bad with most states being indifferent.

 

Learning to act means that the system will store in memory sequences of actions in a way that increases the likelihood of finding good states and of avoiding bad ones.

 

The system has a scalar preference function to evaluate patterns or subjective states (indirectly evaluate action sequences). The good states have high (positive) preference function values, and the bad states have low (negative) ones, hence the problem becomes an optimization problem.

 

Indifferent states acquire values according to whether they are found on paths to desirable or undesirable states. Thus, learning to act can be looked at as assigning preferences to states that start out as indifferent states.

 

Realizing the Preference Function

 

Each memory location will have a counter for the preference function. The counter value is :

positive for good (desirable) patterns

negative for bad (undesirable) patterns

close to zero for indifferent or as yet undefined patterns.

 

On reading, counters for the preference fn. can be pooled in the same way as pattern component counters. The sum is :

positive if favorable focus pattern

negative if unfavorable focus pattern

zero if indifferent focus pattern.

 

Built-in preference function means that some locations have nonzero function counters from the start, and that such nonzero counters may even be unmodifiable.

 

Trial and Error Learning

 

We assume that the system can block off external input after accepting the initial input (the present situation) and that it can suspend the execution of actions until it has accepted some proposed action sequence.

 

If the present situation strongly resembles a past one, the system will propose an action by recalling memory. Otherwise, it will try a random action.

 

If the state reached after iterating is undesirable, then we have an error, and the system will try another action (backtracking). If the state reached is good, the system has reason to proceed with the proposed action.

 

Trial and error is reasonable if the number of situations and actions (the system state space) is small and simple, or if the proportion of desirable states is large.

 

The efficiency of searching and learning can be improved considerably if good paths are remembered and are used later to find inherently good states.

 

If an action sequence leads to a desirable pattern in the system focus, the positive preference is extended backward with decreasing intensity (as bucket brigade in Holland’s classifier systems). Similarly, negative preference is extended for bad states.

 

Supervised Learning

 

Supervised learning substitutes an artificial (new) stimulus for a natural (old) one. The system already has a natural (old) response for the old stimulus and it has no response for the new stimulus.

 

The trainer presents a new stimulus (e.g., a bell) followed by an old one (food) and the system responds.

 

After sufficient repetition, the new stimulus alone will elicit the old response due to extending the preference.

 

Usually no learning occurs if the old stimulus is presented before the new one. The new stimulus can even take the place of the old stimulus in training another artificial stimulus for the old response.

 

Learning by Imitation

 

The system will use itself to model the behavior of other systems. The system must store an image of the behavior of others. It maps this image into actions of its own.

 

The system must observe the results of its own actions and compare them against its image of the behavior of others (i.e., the system must identify with the role model)

 

An internal reward mechanism -used to perfect the match between the system own behavior and that of the role model- is usually necessary if a system is going to learn by imitation, which, in turn, is primarily responsible for complicated social learning.

 

The Encoding Problem

 

The raw signal arriving at the sense organs is ill suited for building a predictive model. Even if a number of regularities of the world are present in the signal, they appear in far-from-optimal form and are embedded in noise.

 

A sensory system has two functions :

1. to filter out noise

2. to transform relevant information into a form that is useful in building and using the world model (encoding problem).

 

Since patterns stored in memory attract similar patterns, the memory chunks things with similar encodings, forming objects and individuals from them.

 

A sensory system must express the sensory input in features that are relatively insensitive to scale (perturbations of objects), among other things.

 

 

A Cognitive Theory for Consciousness

 

The rule of consciousness in human cognition is vital. It enables and enhances learning, allows for extra resource allocation, and deals with novel situations among other things. Baars theory for consciousness accommodates for many of the features and constraints in human consciousness and cognition, see Baars (1995, 1997). Competing contexts standing for various different goals exist. Players (processes represented as codelets) governed by various contexts also compete to gain access to the playing field where they form candidate coalitions. Only one such coalition can be in consciousness at one time. A spotlight notion is developed to shine upon the conscious coalition forming the conscious experience. So, not all the players in the playing field are in the spotlight. Only those players who are members of the conscious coalition are in the spotlight. There is also a big audience of unconscious players waiting outside the playing field. Once some coalition gets into consciousness, a broadcast of it takes place making it accessible to everyone. This serves to employ more resources by making some of the audience unconscious processes jump into the playing field when they find something relevant in the broadcast coming through consciousness.

 

 

Cmattie, A Clerical Agent

 

Cmattie is a software agent developed to facilitate and perform the seminars email list in the mathematical sciences dept. in the University of Memphis, see (). There was a predecessor of Cmattie, namely Vmattie, which was lacks consciousness among other stuff. For more detials on Vmattie, see (Franklin, Graesser, Olde, Song, and Negatu 1997). Both Vmattie and Cmattie can handle a seminar list in an academic department. They receive various emails from the seminar organizers, speakers, and attendees. They handle their requests, respond to their queries, and compose the weekly department seminar list. While Vmattie is limited in nature in its learning capability as well as lacking some supportive modules, Cmattie is augmented with a wide variety of control structures for autonomous agents.

 

Two main types of memory are used in Cmattie, SDM and a Case Based Memory referred to from now on as CBM. Both memories are used for suggesting actions and emotional status of Cmattie. SDM has precedence over CBM in terms of default extraction. So, when both memories converge on reading, SDM focus registers are given precedence over the ones in CBM. Each of the memories has an input and an output focus to prevent overwriting of input cues since the output from reading is usually different from the input.

 

Another big addition to Cmattie over Vmattie is the Consciousness Apparatus (Spotlight Controller, Coallition Manager Controller, Playing Field, ..etc), see Bogner, and Franklin (1998) for more details.

 

 

SDM in Cmattie

 

SDM is used in Cmattie for two main functions:

 

  1. Suggesting actions to be taken and behaviors to activate as well as the emotional status of the agent based on the coming percepts (values of the attention registers). Suggestion will be based on previous experience and says what should be the status of the agent in terms of behavior activation and emotional status as a response for the observed values in the perception registers (PRs).

    The auto-associative SDM used here relies on the dominance -in size- of the percepts over the behaviors and emotions.

     

  2. Providing for defaults for missing perception registers (PRs) based on previous history.

 

 

How it works?

 

The incoming message is dissected into a group of attention registers (PRs), see Zhang, Franklin, Olde, Graesser, and Wan (1996). Those registers form the first guess about the meaning and the purpose of the message. The PRs are then copied in parallel into two places, SDM focus and input focus of the CBM. Each memory goes into a reading cycle which may converge or diverge. Upon convergence, the contents of each memory is considered, with precedence given to SDM when conflict arises. The result of the reading operation is placed into the output focuses of SDM and CBM respectively. So we have five focuses in all depicted in figure 6 below.

 

When the output from SDM is considered, three different parts of it are there:

 

i. Defaults for PRs which might come from reading SDM with incomplete set of original PRs.

 

ii. Behavior activation (action selection) in response to the percept received.

 

iii. Emotional status for the agent based on the percept received, and the expected results of the action taken in accordance.

 

The various fields retrieved from SDM and CBM are used by different modules in the architecture. For example, the emotional mechanism relies on the emotional status recommendations retrieved from SDM and CBM on determining the next emotional status for Cmattie.

 

 

Results and Statistics

 

The following table gives the percentage of correct action and/or behavior activation as well as emotional status in return for a percept of sufficient length.

 

 

Table 3: Percentage of Correct action selection/behavior activation and emotional status.

 

 

Action/

Behavior

Emotional

Status

Correct

   

Incorrect

   

 

 

 

The following table gives the percentage of correct default supplying of certain perception registers (PRs) in return for a percept of sufficient length.

 

 

Table 4: Percentage of correct default extraction of each PR.

 

PR

Correct

Incorrect

Organizer

   

Speaker

   

Seminar

   

Day

   

Time

   

Room

   

Title

   

 

 

Conclusion

 

SDM proves to be a successful tool for

associative memory. SDM is capable of building previously encountered percepts based on a part of the percept. It can also get defaults for missing PRs. The evaluation of suggested actions and emotional status is satisfactory. However, a complete evaluation when Cmattie starts running will give more light on how well SDM is able to learn percept-action-emotion associations

 

References

 

Albus, James S. (1981). Brains, Behavior, and Robotics. Byte Publications.

 

Albus, James S. (1991). Outline for a Theory of Intelligence. IEEE Transactions on Systems, Man, and Cybernetics, vol 21 no. 3, May/June.

Archibald, Colin, and Kwok, Paul. (1995). Research in Computer and Robot Vision. World Scientific Publishing Co.

 

Baars, Bernard J. (1995). A Cognitive Theory of Consciousness. Cambridge University Press.

 

Baars, Bernard J, (1997), In the Theater of Consciousness. Oxford University Press.

 

Bates, Joseph, Loyall, Bryan, and Reilly, W. Scott. (1991). Broad Agents. CMU.

 

Bates, Joseph, Loyall, Bryan, and Reilly, W. Scott. (1992). An Architecture for Action, Emotion, and Social Behavior. CMU.

 

Boden, Margaret A. (1988). Computer Models of Mind. Cambridge University Press.

 

Callari, Francesco G., and Ferrie, Frank P. (1996). Active Recognition: Using Uncertainty to Reduce Ambiguity. ICPR96.

 

Franklin, Stan. (1995). Artificial Minds. MIT Press.

 

Franklin, Stan, Graesser, Art, Olde, Brent, Song, Hongjun, Negatu, Aregahegn. (1997). Virtual Mattie-an Intelligent Clerical Agent. Institute for Intelligent Systems, The University of Memphis.

 

Franklin, Stan. (1997). Autonomous Agents as Embodied AI. Cybernetics and Systems, special issue on Epistemological Issues in Embedded AI.

 

Franklin, Stan. (1997). Global Workspace Agents. Institute for Intelligent Systems.

 

Franklin, Stan, and Graesser, Art. (1997). Is it an Agent, or just a Program? A Taxonomy for Autonomous Agents. Proceedings of the Third

International Workshop on Agent Theories, Architectures, and Languages, published as Intelligent Agents III, Springer-Verlag, 21-35.

 

Glenberg, Arthur M. (1997). What Memory is for? Behavioral and Brain Sciences.

 

Kanerva, Pentti. (1990). Sparse Distributed Memory. MIT Press.

 

Karlsson, Roland. (1995). Evaluation of a Fast Activation Mechanism for the Kanerva SDM. RWCP Neuro SICS Laboratory.

 

Kosslyn, Stephen M., and Koenig, Olivier. (1992). Wet Mind. Macmillan Inc.

 

Kristoferson, Jan. (1995a). Some Comments on the Information Stored in SDM. RWCP Neuro SICS Laboratory.

 

Kristoferson, Jan. (1995b). Best Probability of Activation and Performance Comparisons for Several Designs of SDM. RWCP Neuro SICS

laboratory.

 

Russell, Stuart, and Norvig, Peter. (1995). Artificial Intelligence A Modern Approach. Prentice-Hall Inc.

 

Scheier, Christian, and Lambrinos, Dimitrios. (1996). Categorization in a Real-World Agent Using Haptic Exploration and Active Perception.

SAB 96 Proceedings.

 

Sjodin, Gunnar. (1995). Convergence and New Operations in SDM. RWCP Neuro SICS Laboratory.

 

Zhang, Zhaohua, Franklin, Stan, Olde, Brent, Graesser, Art, and Wan, Yun. (1996). Natural Language Sensing for Autonomous Agents.

Institute for Intelligent Systems, The University of Memphis.

 

Zuech, Nello, and Miller, Richard K. (1987). Machine Vision. The Fairmont Press.