Constraint Satisfaction Neural Networks

3 downloads 0 Views 263KB Size Report
Sonda Project: Prevention from self and heterodestructive behaviors, Semeion, ... Marcel Dekker, New York, 1997; The Sonda Project: Prevention, Prediction.
Substance Use & Misuse, 33(2), 389–408, 1998

Constraint Satisfaction Neural Networks Massimo Buscema, Dr. Semeion Research Center of Sciences of Communication, Viale di Val Fiorita 88, Rome, Italy

INTRODUCTION A Constraint Satisfaction (CS) Artificial Neural Network (ANN) type can be used to consider and analyze very different problems. The way in which a CS faces and tries to solve different problems becomes clear by knowing its structural and functional characteristics. It is a Circuital ANN: therefore, each unit or node is similar to any other and isn’t characterized by a specific geography. The connections or weights among the different nodes are symmetric; therefore: wij = wji. Furthermore, their reflexive connections don’t exist: wii = 0. Each node can have its own Bias. This generally means that a CS provided with N nodes will have an M number of connections (weights and bias) equal to: 

M

N  ( N  1) 2    

  N 

N  ( N  1) 2

Bias i

wij weights

A CS is an ANN that starts with a trained weights matrix, and updates the values of its own units on the basis of: a) the external inputs to which it is subjected; b) the relational constraints imposed by the weights that characterize it.

LEARNING THROUGH THE BACK PROPAGATION 389

Copyright © Semeion 1998

390

BUSCEMA

The values of the weights matrix, W, that characterize a CS can be generated in different ways. An ANN of a different type can be charged with learning the weights matrix that characterizes all Patterns, which specify the values of the database about which the CS is questioned. Otherwise, one can resort to some more of Bayesian’s traditional equations about the probabilities that characterize the positive and/or negative co-occurrence relation of each node’s couple of the CS, or, alternatively to a reformulation of equations based on Hebb’s hypotheses about the connections among neurons (Hebb, 1949). Utilizing a Back Propagation ANN in order to learn the CS weights has shown itself to be a fairly efficacious method. The procedure is simple. An Autoassociative Back Propagation ANN at maximum gradient of only 2 layers is designed: a layer of Input units and a layer of Output units. Autoassociative means that in this ANN the target vector will be the same to that of Input for every Pattern. The number of Input and Output nodes, which is the same, can be defined through two different strategies:  Strategy A: each field of Data Base (DB) is a node whose specific value will vary in the lattice {0,1} according to the variety of the field; this is the options number that that field expects.  Strategy B: each field option is a node that can be active or passive if in each Record that option is or isn’t present. In this case the ANN will be constituted exclusively by binary Inputs [0,1]. In this second case the total number of ANN’s Input and Output nodes is given by the sum of all options of every field expected in the whole DB. Strategy B is the one advisable when the Back Propagation ANN is needed to generate the weights for a CS. In fact, this option is “heavier” in time for computation than the first: it will be an ANN with more nodes and with more complex modelling. Nevertheless, this allows one to clearly consider through the CS the dynamic of each DB option on all the others. The two options are synthesizable as follows: Nf

 codification a [values {0,1}]:

NumInput  NumOutput 

 fieldi i

Nop

 codification b [values [0,1]]:

NumInput  NumOutput 

op j j

where: NumOutput = Outputs Number; NumInput = Inputs Number; field = Vector of DB’s fields; op = Vector of all options of each DB’s field; Nf = Total number of the fields; Nop = Total number of the options.

CONSTRAINT SATISFACTION NEURAL NETWORKS

391

Once designed, the ANN will have as a learning Pattern all DB Records on which it is intended to operate. The learning algorithm will be ANN’s Back Propagation classical algorithm, but provided with some heuristic suggestions deduced from the experimentations carried out at Semeion’s Research Center. a. Forward Algorithm 

 ui  f ( Neti )  f  

N

 j

 u j  wij  Biasi  

Suggestion 1: it has been verified that to generate a useful weights matrix for the CS network, it is advisable that the random initialization of space R at the beginning of the learning has to be very small. In practice: 

R 

1 NumInput

;

furthermore, it is suggested to put all Bias = 0.0 and not to randomize them. Suggestion 2: it has been verified that the most efficacious transfer function f (Neti) is that of the classic sigmoid; therefore: 

f ( Neti ) 

1 1 e

 Neti

where e = 2.71828182459.

The function of the Sine makes the quantitative relations among the nodes ambiguous. The function of the Hyperbolic Tangent stresses the weights matrix too hard. The Archtangent is too soft on the strong differences among Records. b. Backward Algorithm 

outi ( n)  (ti  ui )  f (ui )



SelfMomentumij ( n)  wij ( n1)  outi ( n) 

1 0.5  wij

where:wij= absolute value of connection wij (see Back Propagation on this

392

BUSCEMA

volume). 

wij ( n)  SelfMomentumij ( n)  outi ( n)  u j  Ratei



bi ( n )  Biasi ( n1)  outi ( n ) 



Biasi ( n1)  Biasi ( n)  bi ( n)



wij ( n1)  wij ( n)  wij ( n)  u j

1 1  Biasi ( n )

 outi ( n )  Ratei

Suggestion 3: it is useful to put the Rate on very low values (Rate < 1). The learning process will be longer but more precise, and the measure’s weights will be smaller. Suggestion 4: it is useful not to allow an ANN to correct the reflexive weights (i = j); the learning will be more complex and long, but the generated connections matrix will be more “refined” and then more efficacious when it is adapted as a CS weights matrix. After having concluded the learning step, it is necessary to translate the weights matrix W of Back Propagation ANN into a new matrix, NewW of CS. In this translation the reflexive weights of Back Propagation ANN will be lost, while the bias values will remain the same. Otherwise, the bidirectional connections of the Back Propagation will be reduced to symmetric connections by calculating the medium value: 1.

New wii  0.0

2.

New wij 

3.

New w ji  New wij

4.

Biasi  Biasi

wij  w ji 2

At this point the weights of the CS are defined.

LEARNING THROUGH SYMMETRIC BACK PROPAGATION This learning technique is similar to the BP with Rule previously described. The only difference between the two techniques consists in the

CONSTRAINT SATISFACTION NEURAL NETWORKS

393

correction of their connections values. In classic BP the correction takes place without considering the symmetry existing among the weights. In this case, indeed, the new weights are directly updated in the learning process, by considering the error variation in respect to the couples of the symmetric weights:

1

 out  u



wij 



wij  w ji

2

i

 out j  ui

j



GENERATION OF THE WEIGHTS THROUGH THE CO-OCCURRENCE EQUATIONS A more rapid but less efficacious system for generating CS’s weights matrix consists in using Bayesian’s equations of the probability of the positive and/or negative co-occurrence of all CS nodes couples in all the Records. The reference equation for generating the weights matrix is the following: 

wij  w ji   log ( n)

 p x

   1  p x

  0

p xi  0 and x j  1  p xi  1 and x j  0 i

 1 and x j

i

 0 and x j

where xi and xj are the i-th and j-th node of CS and p is the co-occurrence probability of a certain event. For Bias, in the same way: 

Biasi   log

p( xi  0) p( x j  1)

These equations are utilizable for ANNs whose nodes have been designed both with Strategy A (Nodes Number = Summation of DB’s Record’s Fields) and with Strategy B (Nodes Number = Summation of all options of every DB’s Record’s field). In fact, if we indicate the 4 different co-occurrence probabilities defined in the weight’s equations in the following way: 

p1: ( xi  0 and x j  1)



p2: ( xi  1 and x j  0)

394

BUSCEMA



p3: ( xi  1 and x j  1)



p1: ( xi  0 and x j  0)

then the specific probability of each node’s couple could be calculated in the following way:





M



p1ij 

m1

M

1  xmi  xm j M



p2 ij 

M

 xm  xm



p3ij 

i

m1

1  xm   xm j

m1

i

M

1  xm   1  xm  M

j



M

p4ij 

m1

i

j

M

At this point the weights matrix W is calculable:  wij  w ji   log ( n )

p1ij  p2ij

if (i = j) then wij = 0.0

p3ij  p4ij

In the same way the Bias calculation can be carried out: M



M

1  xi 

 xi

m1



p5i 



Biasi   log ( n )



M

p 6i 

m1

M

p5i p6i

(For practical reasons, when a certain co-occurrence probability is 0, it is better to assume an artificial value of type 0.00001). Through this procedure the whole weights matrix and the CS’s bias vector are generated. This is also the case in which the nodes value of each model is a fuzzy value, closed in the space {0,1}. However, in our experience the weights generated by a Back Propagation ANN have demonstrated to be more efficacious and selective.

CONSTRAINT SATISFACTION NEURAL NETWORKS

395

GENERATION OF WEIGHTS THROUGH THE HE-HO RULE At Semeion we have defined “He-Ho’s Equations” as a specific way to calculate the probability (net or fuzzy) of the co-occurrence among couples of variables in an ANN. This equation is inspired by a mixture between the famous Hebb rule (1949) of the temporal optimization of synaptic relations among neurons, and the equally famous Hopfield Rule (Hopfield, 1982, 1984); the He-Ho Rule is derived from these two laws. Given xi and xj variables, the master equation from which we started is the following:  4 xi x j  xi  x j  0 More explicitly it appears in this way: 





x j   xi  1  xi  x j  1  2 xi x j  0

It is interesting to compare the results generated through the He-Ho Rule with those obtainable from Hebb’s rule, Hopfield’s rule and the so-called Anti-Hebb’s rule. In the following table four limited cases of variables xi and xj (this is 0 and 1) are considered. xi

0 0 1 1 Nodes Value

xj Hebb Rule * Anti-Hebb Rule Hopfield Rule He-Ho Rule

0 0 0 1 0

1 0 0 -1 -1

0 0 -1 -1 -1

1 1 1 1 2

Value of their Connection

(*) For the Hebb’s and the Anti-Hebb’s Rule, xi is an Output node and xj an Input node.

Given M Patterns, the weights matrix wij will be defined by the He-Ho Rule in the following way: M

4 xi x j  xi  x j 

wij 

m

M

396

BUSCEMA

The He-Ho Rule has been demonstrated to be an interesting method in generating the weights matrix of a CS.

ALGORITHM OF CONSTRAINT SATISFACTION NETWORK At each cycle the updating algorithm of the CS nodes is very elementary. Its philosophy is the following: it is assumed that each CS node is equivalent to a hypothesis. The weights that connect the nodes are then the solidarity, contradiction and fuzzy indifference relations between every possible couple of hypotheses. Consequently the bias of each node represents the fuzzy inclination of every hypothesis to contract, generally, solidarity or exclusion relations with the other hypotheses. The CS aims to maximize the activation grade of each of its hypotheses (node) in respect to the constraints that the relations between each hypothesis and any other (weights and bias) impose upon them. This means that a CS provided with 3 nodes presents:  3 hypotheses;  3 relations among different hypotheses;  3 thresholds, one for each hypothesis;  23 combinations of different answers, if each hypothesis could assume only the values 0 or 1. If each original combination (disposition) of a binary answer (0 or 1) were a cube’s edge, all the infinite solutions of CS would be included in the volume of a tridimensional cube (in fact, each node of the CS can assume values included between 0 and 1, limits included):

110 010

111 011

100 000

101 001

CONSTRAINT SATISFACTION NEURAL NETWORKS

397

The updating algorithm of the CS units tends to find a closer solution to the edge “111”, but it considers the weights that connect each node to any other. Of course, if the nodes number of the CS is 50, the space of solution is that of a 50th dimension hypercube; this is provided with 250 different edges. A practical utilization of the CS consists of assuming all its nodes as units, directly manipulatable from the external, that is, as an input unit. This gives to the external an arbitrary value to one or more hypotheses. This value indicates the consistency that we intend to give to that hypothesis. This is done with the aim to consider which other hypotheses will be able to activate themselves having fixed certain conditions from the external. In this case the updating algorithm of the CS units will always try to maximize the activation degree of each of its nodes. But this time its work won’t be simply constrained by the weights that connect the different hypotheses, but also from one or some of the external inputs that arbitrarily have been activated and are maintained actively during the work of the CS. This procedure allows one to test how different hypotheses groups are optimized among them, considering the whole context in which they live. The updating algorithm of the CS units is composed of 4 steps:  calculation of the NetInput that arrives to each unit;  calculation of the updating Delta of each unit;  updating of each unit;  calculation of the reached maximization grade (goodness). The NetInput calculation to each unit is calculated in the following way: N



Net i 

u j  wij  Biasi  InputExti j

In reality, parameters are often utilized which are included between 0 and 1 in order to scale the strength both of the internal and external NetInput. In fact, if we define as intr the internal NetInput scaling and as estr the external NetInput scaling, then the previous equation will become: 

 Net i  intr   

N



j



u j  wij  Biasi   estr   InputExti 

Experience has currently taught us that good values for these two parameters are given by the two following equations:

398



BUSCEMA

intr 

1

; estr 

N

1 N

where N = nodes number Nevertheless, at the moment we are in a questionable area. In order to modulate and/or contain the minimum values of the NetInput, other systems can be utilized in combination or alternation to these two parameters. The most elementary method consists in normalizing, in a linear way, the maximum and the minimum of its values within predefined limits. There is also the possibility to manipulate the NetInput through a semilinear function. A transfer function of the CS NetInput that we have experimented with, giving satisfactory results, is the hyperbolic tangent: 

Neti 

Net i

e

Net i

e

e e

 Net i

 Net i

In this case the values of each NetInput vary in a logistic way between -1 and +1. Obviously, this function’s results can be normalized with some of the already considered equations. The equation for the calculation of each unit’s updating Delta is the following: 

if ( Neti  0)

 i  Neti  (1  ui )

else  i  Neti  ui At this point it is possible to update the units: 

ui ( n1)  ui ( n)   i

The double branch of the penultimate equation must not “scandalize”. It is needed practically so that the units don’t exceed the 1 limit of the lattice {0, 1}, to the interior of which it has been decided to make them move. The degree of goodness of the solution that the CS finds at every cycle is defined Goodness, G(n), where n is the actual cycle; in practice: 

G( n) 

 wij  ui (n)  u j (n)   Biasi   InputExti  ui (n) i

j

i

j

The question to consider is how much each node contributes to the

CONSTRAINT SATISFACTION NEURAL NETWORKS

399

maximization of the CS, and to the respect of the external constraints (InputExt) established in the simulation. It is evident that the Goodness values will be values in a floating point, because the CS moves in the interior of the n-dimensional volume designed by its nodes (for a closer consideration, compare Rumelhart, 1986; McClelland, 1988).

THE HIDDEN UNITS OF THE CONSTRAINT SATISFACTION NETWORK In a CS, the hidden units are units not directly manipulatable externally. That is, they can’t receive input values from the experimenter, but they can easily react to the NetInput produced on them from the other CS units; example: U1

U2 H1

U3 where: Ui = Input/Output Units of CS; Hi = Hidden Units of CS. The hidden units of the CS have a similar structure but a different function with respect to the hidden units of a Feed Forward ANN. The reference equation for their NetInput calculation is the following: 

 Net i   

N



j



 u j  wij  Bias j   intr

The only difference between these units and the units provided with inputs in a CS is that the first parameter of the external input can’t be considered (InputExt = 0). Often it is useful to construct a CS provided with hidden units in order to codify the Records of a DB. This means to assign to each Record a hidden unit. The problem consists then in how to connect the hidden units among them and with all the other units of the CS.

400

BUSCEMA

We have seen that the solution of the second point also offers a solution to the first. There are more procedures for the choice of the weights’ value between each visible unit of the CS with each hidden unit. Here are two: Procedure 1: the simple transposition In this case, each hidden unit that represents a Record is connected with all the visible units by a weight, whose value is similar to the value that each visible unit had in the Pattern which corresponded to that Record. Example: Table R Fields  Subject 1  Records  Subject 2   Subject 3

   Male Female 20 years old 30 years old 40 years old 1 0 1 0 0 0 1 0 1 0 0

1

0

0

1

In such an elementary case we can go on in the following way: a. Construction of an Autoassociative Feed Forward ANN with 2 layers in order to define the weights that connect the 5 options among them. It would be an ANN provided with: 5 Input, 5 Output and 3 Patterns to learn; that will generate after the learning process: 20 weights (wij) and 5 bias (Bias i ). M

F

20

30

40

M

F

20

30

40

b. Reduction of the weights obtained at a new set of symmetric weights that connect all nodes. In this way:  if (i  j ) then

  wij  w ji  wij  ; wji  wij  2  

CONSTRAINT SATISFACTION NEURAL NETWORKS

401

while the bias of the CS remains the same as the Feed Forward bias. In this way the real weights of the CS becomes 10, plus 5 bias. c. Construction of the hidden units through the simple transposition. Because the DB Records are 3, the hidden units to be added to CS will be 3. This is defined as a weights matrix WH that connects each hidden unit with each of the 5 visible units. The “Table R” becomes the content of the weights matrix WH with the unique nominal difference that each table row has the weights value WH which that hidden unit, which represents a specific Record, entertains with the 5 visible units of the CS. The bias of the 3 hidden units whether are put equal to zero or they can be computed through one of the co-occurance equations previously analyzed. Procedure 2: the weighed transposition In order to activate this procedure the same steps of the previous procedure are followed. Nevertheless, at the last step, Table R isn’t directly identified with the weights matrix WH. If we identify Table R with a matrix called T R, then we will have that the following equation regulates the values passage from the matrix T R to that of the weights WH: 

wijH  f

m

T  R

ij

The function f m is a function that writes the matrix values T R again according to the value of the weights matrix W, which interconnects the visible units of the CS among them. Then: 

wijH  f

where: 

Scale 

m

 T   Scale  T R

ij

high  low Max  Min

R

ij

 Offset ;



Offest 

Max  low  Min  high Max  Min

Parameters Max and Min indicate the maximum and minimum input values of function f m, while parameters high and low indicate the maximum and minimum Output values of function f m. The problem consists in how to determine these 4 parameters. The problem is already solved for the Input values Min and Max. Each value of T R can vary between 0 and 1 and then Max = 1.0 and Min = 0.0. There are at least three options for the values of the parameters high and low:

402

BUSCEMA

 

high  Max wij

a. Maximization of weights W:

low  high

Here, we choose the absolute value higher than the weights matrix W that connects the visible units, and we assume it to be the maximum border and its inverse to be the minimum border of the new weights WH.

 

high  Min wij

b. Minimization of weights W:

wij  0.0

low  high

The procedure is the same as the previous one with the only difference being that the exit borders of function f m are represented by the weights matrix W that is smaller and different from zero. c. Mean Weighting of weights W: N

N

i

j

high  x low  high

  wij where: x 

N2

with i  j

In this case the mean of the absolute values of the weights matrix W is calculated, leaving out the principal diagonal of the matrix W from this calculation. Hundreds of experiments have been carried out at Semeion with both of these procedures. Our experiments have convinced us that the weighed transposition procedure has been demonstrated to be the best strategy to the varying of the fields number and of the Records number of any DB. The reason for this could be that in the simple transposition, the hidden units function as passive units with respect to the visible nodes which correspond to properties that the record represented by the hidden unit doesn’t have; or they exercise a moderately excitatory strength over those visible units that the record that they represent has. In contrast, in the weighed transposition, the hidden units act more or less inhibitively. This should allow a greater filter capacity on the hidden and visible units when the CS has to manage particularly complex DBs. This is both of great dimension and contain records that present very fuzzy diversifications. At this point in time, we don’t feel a need to give an opinion about the

CONSTRAINT SATISFACTION NEURAL NETWORKS

403

efficacy of the three proposed options to make the weighed transposition. From our point of view they are three real options. Each one can be useful to analyze the answers that the CS provides to its hidden units’ different intensities of influence on the other visible units. The analysis of these two procedures permit deducing different ways in order to connect the hidden units among them with pertinent weights. Choosing the simple transposition, it is suitable to conceptualize the hidden units as a unique units pool being in competition among them. In this case the weights matrix WHH, which interconnects them, should be filled entirely of values that are the inverse of the maximum value which each visible node can have in input. Therefore: 

HH wkp  1

where: wkp = wpk and k  p. A similar solution has been adopted by McClelland and Rumelhart for IAC ANNs (Interactive Activation and Competition; compare McClelland, 1988; Rumelhart, 1986). Otherwise, in the case in which we utilize a weighed transposition, four options can be utilized; the first 3 are similar to the three already described: Maximization, Minimization and Mean Weighting of the weights matrix W:

 

{Maximized Competition}

 

{Minimized Competition}

a.

  HH wkp   Maxkp wij 

b.

  HH wkp   Minkp wij 

c.

HH wkp

      

N

N

i

j



  wij  N

2

   

with i  j

{Weighted Competition}

The fourth option consists in annuling the competition among the hidden units of the CS: d.

HH wkp  0.0

{Null Competition}

404

BUSCEMA

These four options are also useful as a filter system through the distribution of the CS answers to the different questions.

FINAL CONSIDERATIONS The CS are ANNs that try to maximize their nodes activation starting from a whole of constraints. The first step consists in understanding which problems are a suitable fit for being treated with this type of ANNs; more precisely, how any problem must be treated in order to be analyzed with a CS. It could be asserted that each problem of resources optimization is a problem to be explored with a CS. The efficacy of the solutions that an ANN is able to produce depends on a series of factors. The first of these factors is that of the data representativity in the simulation model; so that this has place, it is necessary to atomize the original problem into the smallest of components as is possible, that will represent the whole of atomic hypotheses of the problem itself. In addition to the atomization principle already known in other contexts (Buscema, 1994a), it is also useful to take into consideration the principle of data variety. The hypotheses that will be part of the CS must not only be those that are considered “more incisive” in defining the problem, but also all those hypotheses that appear in the problem space-time even if they are considered not to be determinant (Buscema, 1994a). The ANN utilized in order to individuate the weights of the CS will be the one that will establish the more or less strong significance of each hypothesis with respect to any other. In sum, the data representativity consists in predisposing in the CS the most complete nodes model possible of the atomic variables that defines the real problem. The second factor is the efficacy of CS that depends on the weights generation system with which the CS is implemented. In this sense the learning of the weights matrix carried out through an Autoassociative Feed Forward ANN has demonstrated in our experimentation to currently be the most efficacious system. Actually, we are considering the efficacy that a weights matrix could have generated by a Self-Reflexive ANN on the CS’s behaviour (Buscema, 1995). The third factor which we believe is fundamental for the functioning of a CS is the type of updating algorithm of the units which is utilized. In these pages we have presented the classic algorithm proposed by Rumelhart and collaborators (Rumelhart, 1986; McClelland, 1988). Notwithstanding the

CONSTRAINT SATISFACTION NEURAL NETWORKS

405

evident goodness of this procedure, it would be useful to give a more rigorous methodological statute to some of the parameters that are utilized. For example: what kind of relation exists between the parameter that filters the NetInput coming from the CS units and the parameter that filters the values of the external input coming from the environment? Moreover, it seems evident that the first of these parameters can be connected to the nodes quantity of the CS and to the values type of its weights matrix. In other words, it would be useful that a CS could automatically set its parameters in accord to the situations in which it has to operate without being regulated every time from the external. The fourth factor that we consider to be critical in deciding the performance’s quality of a CS concerns the weights matrix that connects the CS hidden units among them and with the other visible units. In this connection we tried to plan a new algorithm which has already proved to give interesting results. However, we are sure that this way can be improved. In fact, we are experimenting with the possibility of interpreting the CS hidden units as units which assume a precise geography one to another at each cycle. Perhaps, in this way, it will be possible to divide the competitive influence that characterizes them in function to the distance that each of these units occupies with respect to the other at each cycle. It would be an interesting case of ANNs with hidden units characterized by a dynamic geography. It is useful to also emphasize the limits which we have noted through our experiments in the CS: a. The monotonicity of the solutions: means that some activation consequences are less contractual with the units excluded from the activations from which they result. In brief: it is as if in its “scaling” the optimal solution, the CS would make binary choices exploring in deepth a theoretical tree’s graph that expects all the possible combinations among its nodes. If a certain choice shows itself as wrong (less maximized), the CS will be able to return to its steps and to try another one, but the various branches of the graph tend to remain alternative among them. b. The radicalness of the solutions: it is an ANN that tends to drive its nodes to the maximum value of excitation; this has as consequence that the nonexcited nodes are driven towards the minimum values. In reality, this is implicit in its own algorithm and it also makes part of its aim. Nevertheless, two collateral effects are visible: in a CS, a packet of cigarettes can activate the frame in an office, perfectly furnished, with the same prediction and velocity of how it would be possible to do by activating in input a specific object of an office, such as a writing-desk.

406

BUSCEMA

This can take place if the packet of cigarettes is very associated with the writing-desk, but if it isn’t quite associated with other objects typical of an office. The reason for this is simple: the packet of cigarettes activates the object “writing-desk”, which “thinks” then to activate the rest. This creates an incapacity of the CSs to distinguish the salient features from the occasional ones of any model. The second undesired effect consists of the fact that the different pertinent strength of each node, with respect to an external stimulation, is visible only during the answer process of the CS and not more when the CS has reached its local maximum. Generally, at that point all excited nodes have reached the maximum activation value. This is useful for certain experiments, and less useful for others. c. The apparent learning: means that the CS hasn’t the possibility to modify its own structure in relation to the solutions that it generates. It is an ANN that has already learned but during the recall phase or answering process, it doesn’t learn from its answers. Nevertheless, it is possible to foresee, since from now, the way to obviate to this limit already solved for other types of ANNs. The most simple solution, for example, could consist in inserting the answers again in learning, through the Back Propagation, which the CS provides at each cycle. This would allow a recalibration of its weights matrix on the basis of the answers that have been generated till that moment.

REFERENCES BUSCEMA, 1994a: M. Buscema, Squashing Theory. Modello a Reti Neurali per la Previsione dei Sistemi Complessi, Collana Semeion, Armando, Rome, 1994 [Squashing Theory: A Neural Network Model for Prediction of Complex Systems, Semeion Collection by Armando Publisher]. BUSCEMA, 1994b: M. Buscema, Constraint Satisfaction Networks, in M. Buscema, G. Didoné, and M. Pandin, Reti Neurali AutoRiflessive. Teoria, Metodi, Applicazioni e Confronti, Quaderni di Ricerca Semeion, Armando, Rome, n. 1, pp. 93–126 [Self-Reflexive Networks. Theory, Methods, Applications and Comparison, Semeion Research-book by Armando Publisher, n. 1]. BUSCEMA, 1995: M. Buscema, Self-Reflexive Networks. Theory, Topology, Applications, in Quality & Quantity, Kluwer Academic Publishers, Dordtrecht, The Netherlands, vol. 29(4), 339–403, November 1995. HEBB, 1949: D. O. Hebb, The Organization of Behavior, Wiley, New York. HOPFIELD, 1982: J. J. Hopfield, Neural Networks and Physical Systems with Emergent Collective Computational Abilities, Proceedings of the National

CONSTRAINT SATISFACTION NEURAL NETWORKS

407

Academy of Sciences, USA, 79: pp. 2554–2558, 1982. HOPFIELD, 1984: J. J. Hopfield, Neurons with Graded Response have Collective Computational Properties like those of Two-State Neurons, Proceedings of the National Academy of Sciences, USA, 81, pp. 3088–3092. McCLELLAND, 1988: J. L. McClelland and D. E. Rumelhart, Explorations in Parallel Distributed Processing, The MIT Press, Cambridge, MA, 1988. RUMELHART, 1986: D. E. Rumelhart, P. Smolensky, J. L. McClelland, and G. E. Hinton, Schemata and Sequential Thought Processes in PDP Models, in J. L. McClelland and D. E. Rumelhart (eds.), PDP, Exploration in the Microstructure of Cognition, The MIT Press, Cambridge, MA, 1986, Vol. II.

408

BUSCEMA

THE AUTHOR Massimo Buscema, Dr., computer scientist, expert in artificial neural networks and adaptive systems. He is the founder and Director of the Semeion Research Center of Sciences of Communication, Rome, Italy. He is formerly Professor of Science of Communication, University of Charleston, Charleston, West Virginia, USA, and Professor of Computer Science and Linguistics at the State University of Perugia, Perugia, Italy. He is a member of the Editorial Board of Substance Use & Misuse, a faculty member of the Middle Eastern Summer Institute on Drug Use, co-editor of the Monograph Series Uncertainty and co-creator and codirector of The Mediterranean Institute. He is consultant of Scuola Tributaria Vanoni (Ministry of Finance), Ufficio Italiano Cambi (Bank of Italy), ENEA (Public Oil Company), Sopin Group (Computer Science Corporation) and many Italian Regions. He has published books and articles; among them: Prevention and Dissuasion, EGA, Turin, 1986; Expert Systems and Complex Systems, Semeion, Rome, 1987; The Brain within the Brain, Semeion, Rome, 1989; The Sonda Project: Prevention from self and heterodestructive behaviors, Semeion, Rome, 1992; Gesturing Test: A Model of Qualitative Ergonomics ATA, Bologna, 1992; The MQ Model: Neural Networks and Interpersonal Perception, Armando, Rome, 1993; Squashing Theory: A Neural Networks Model for Prediction of Complex Systems, Armando, Rome, 1994; Self-Reflexive Networks: Theory, Topology, Application, Quality & Quantity, 29, Kluver Academic Publishers, Dordrecht, Holland; Idee da Buttare, Edizioni Sonda, Turin, 1994; Artificial Neural Networks and Finance, Armando, Rome, 1997; A General Presentation of Artificial Neural Networks, in Substance Use & Misuse, 32(1), Marcel Dekker, New York, 1997; The Sonda Project: Prevention, Prediction and Psychological Disorder, in Substance Use & Misuse, 32(9), Marcel Dekker, New York, 1997.