S1 Appendix - PLOS

0 downloads 0 Views 164KB Size Report
We will prove that if p + b 6= 1, then there is a transition effect on the results of model-based agents. 438. As explained in the Methods, if each initial-state action ...
437

S1 Appendix

438

We will prove that if p + b 6= 1, then there is a transition e↵ect on the results of model-based agents.

439

As explained in the Methods, if each initial-state action transitions to a di↵erent final state with the

440

same probability, then the probability P (left|si ) of choosing left at the initial state si is given by

P (left|si ) =

441

where K

442

parameter.

1 1 + exp[ K(p

b)]

= logit

1

K(p

b),

(17)

0 is a constant that depends on the transition probabilities and the exploration-exploitation

443

According to the model-based reinforcement learning rule (Equation 14), if the agent chooses left,

444

then experiences a common transition to pink and receives 1 reward, the stay probability pstay (of

445

choosing left again in the next trial) is given by

pstay = logit

446

K[(1

1

K[p

(18)

(1

↵)b

↵];

(19)

1

K[(1

↵)p

b];

(20)

and if the agent experiences a rare transition to blue and receives 0 rewards, pstay is given by

pstay = logit

449

b];

if the agent experiences a common transition to pink and receives 0 rewards, pstay is given by

pstay = logit

448

↵)p + ↵

if instead the agent experiences a rare transition to blue and receives 1 reward, pstay is given by

pstay = logit

447

1

1

K[p

(1

↵)b].

(21)

The logistic regression model, on the other hand, determines pstay as a function xr (xr = +1 for 1

450

reward, xr =

1 for 0 rewards in the previous trial) and xt (xt = +1 for a common transition, x =

451

for a rare transition in the previous trial):

pstay = logit

1

(

0

+

r xr

18

+

t xt

+

r⇥t xr xt ).

1

(22)

Since logit

1

is a one-to-one function, this implies that

K[(1 K[p

↵)p + ↵

0,

r,

0

+

r

+

r

(1

↵)b

↵] =

0

K[(1

↵)p

b] =

0

r

↵)b] =

0

r

K[p

Solving this system for

b] =

t,

and

(1

r⇥t

0

r t

r⇥t

+

+

r⇥t ,

(23)

t

r⇥t ,

(24)

t

r⇥t ,

(25)

r⇥t .

(26)

t

t

+

+

yields

⇣ =K 1 = 0,

↵⌘ (p 2

↵ = K (1 2 ↵ =K , 2

p

b),

b) ,

(27) (28) (29) (30)

452

which implies that if ↵ > 0, K > 0 and p + b 6= 1, then

453

left, but the same can be proved if the agent chose right, as in this example “left,” “right,” “pink,”

454

and “blue” are arbitrary.

19

t

6= 0. This proof assumes that the agent chose