Abraham NunesThe mind is what the brain does.
https://abrahamnunes.github.io/
Sun, 06 Aug 2017 23:14:50 +0000Sun, 06 Aug 2017 23:14:50 +0000Jekyll v3.4.5Dalhousie Themed Plots in R<p>I created a Beamer template (modification of M-Theme) with Dalhousie themed colours. To further embrace the Dalhousie style guide in my presentation materials, I have used the <code class="highlighter-rouge">scale_fill_manual()</code> function in R’s <code class="highlighter-rouge">ggplot2</code> to theme the plots with Dalhousie colours. I still don’t think Python’s <code class="highlighter-rouge">ggplot</code> implementation is up to snuff, and matplotlib is, well, not <code class="highlighter-rouge">ggplot2</code>… As such, even though I do most of my work in python, I still import all of my data into R and make use of my favourite plotting library, courtesy of Hadley.</p>
<p>Here is a simple snippet…</p>
<div class="language-r highlighter-rouge"><pre class="highlight"><code><span class="w">
</span><span class="n">dal_colors</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s1">'#00bfff'</span><span class="p">,</span><span class="w"> </span><span class="c1"># Blue
</span><span class="w"> </span><span class="s1">'#ee0701'</span><span class="p">,</span><span class="w"> </span><span class="c1"># Red
</span><span class="w"> </span><span class="s1">'#3ba86b'</span><span class="p">,</span><span class="w"> </span><span class="c1"># Green
</span><span class="w"> </span><span class="s1">'#8b008b'</span><span class="p">,</span><span class="w"> </span><span class="c1"># Purple
</span><span class="w"> </span><span class="s1">'#fbe122'</span><span class="p">)</span><span class="w"> </span><span class="c1"># Yellow
</span><span class="w">
</span><span class="n">d</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">data.frame</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s1">'A'</span><span class="p">,</span><span class="w"> </span><span class="s1">'B'</span><span class="p">,</span><span class="w"> </span><span class="s1">'C'</span><span class="p">,</span><span class="w"> </span><span class="s1">'D'</span><span class="p">,</span><span class="w"> </span><span class="s1">'E'</span><span class="p">),</span><span class="w">
</span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="m">2</span><span class="p">,</span><span class="w"> </span><span class="m">3</span><span class="p">,</span><span class="w"> </span><span class="m">4</span><span class="p">,</span><span class="w"> </span><span class="m">5</span><span class="p">),</span><span class="w">
</span><span class="n">colr</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">dal_colors</span><span class="p">)</span><span class="w">
</span><span class="n">ggplot</span><span class="p">(</span><span class="n">d</span><span class="p">,</span><span class="w"> </span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="o">=</span><span class="n">y</span><span class="p">,</span><span class="w"> </span><span class="n">fill</span><span class="o">=</span><span class="n">x</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">geom_bar</span><span class="p">(</span><span class="n">stat</span><span class="o">=</span><span class="s1">'identity'</span><span class="p">,</span><span class="w"> </span><span class="n">colour</span><span class="o">=</span><span class="s1">'black'</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">geom_text</span><span class="p">(</span><span class="n">aes</span><span class="p">(</span><span class="n">label</span><span class="o">=</span><span class="n">colr</span><span class="p">),</span><span class="w"> </span><span class="n">nudge_y</span><span class="o">=</span><span class="m">0.1</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">scale_fill_manual</span><span class="p">(</span><span class="n">values</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">dal_colors</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">theme_light</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">ggtitle</span><span class="p">(</span><span class="s2">"Dalhousie Themed Bar Plot"</span><span class="p">)</span><span class="w">
</span></code></pre>
</div>
<p>…and the resulting plot:</p>
<p><img src="/figures/dalplot.png" alt="dalhousie themed plot" /></p>
<p>Of course, one could simply use the abovementioned colour codes to create a colormap in other visualization libraries. I may do so for matplotlib at some point.</p>
<p>Another useful thing to look up would be the best continuous colour map to use that would fit in with the dalhousie theme. I personally like viridis, and it might work well with the dalhousie theme as is.</p>
Sun, 06 Aug 2017 00:00:00 +0000
https://abrahamnunes.github.io/2017/08/06/Dal-Plotting-Colors.html
https://abrahamnunes.github.io/2017/08/06/Dal-Plotting-Colors.htmlvisualizationThe development of model-based control as it relates to fluid intelligence<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]}
});
</script>
<script type="text/javascript" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<p>I am quite interested in the natural history of behavioural control over the lifespan, and have been excited to see a couple of studies about this lately. The most recent paper I read on the topic is by Potter et al. (1) which just came up in my newsfeed. I’ve made some notes about it here.</p>
<p>Potter et al. (1) sought to determine whether age-related changes in statistical learning, working memory, and fluid reasoning contributed to the positive association between age and model-based (MB) control, which had been shown by Decker et al. (2).</p>
<h2 id="methods">Methods</h2>
<h3 id="subjects">Subjects</h3>
<table>
<thead>
<tr>
<th>Group</th>
<th>Definition</th>
<th>N</th>
<th>N Post-Exclusion</th>
</tr>
</thead>
<tbody>
<tr>
<td>Children</td>
<td>Age 9-12 years</td>
<td>22</td>
<td>19</td>
</tr>
<tr>
<td>Adolescents</td>
<td>Age 13-17 years</td>
<td>23</td>
<td>22</td>
</tr>
<tr>
<td>Adults</td>
<td>Age 18-25 years</td>
<td>24</td>
<td>23</td>
</tr>
</tbody>
</table>
<p>There were some additional missing data for the remaining subjects.</p>
<ul>
<li>One child did not have statistical learning data acquired</li>
<li>WASI matrix-reasoning not completed for 1 adolescent and 2 adults</li>
<li>14 children, 17 adolescents, and 18 adults also completed listening recall subtest of Automated Working Memory Assessment</li>
</ul>
<h3 id="tasks">Tasks</h3>
<table>
<thead>
<tr>
<th>Domain Tested</th>
<th>Task</th>
</tr>
</thead>
<tbody>
<tr>
<td>Reinforcement learning</td>
<td>Two-step task</td>
</tr>
<tr>
<td>Statistical learning</td>
<td><em>See Below</em></td>
</tr>
<tr>
<td>Fluid reasoning</td>
<td>WASI matrix reasoning and vocabulary</td>
</tr>
</tbody>
</table>
<h4 id="two-step-task">Two-step task</h4>
<ul>
<li>The authors used an interesting modification of the two-step task which has a story-line that facilitates completion by children. If I am not mistaken, this may be the same one available through Wouter Kool’s <a href="https://github.com/wkool/tradeoffs"><code class="highlighter-rouge">tradeoffs</code></a> repo.</li>
<li>The task structure is the usual one for the two-step task (3), with a couple of modifications:
<ul>
<li>Only 150 trials were done, rather than the usual 201
<ul>
<li><strong>This may have important implications for model-fitting</strong></li>
</ul>
</li>
</ul>
</li>
<li>Although neuroimaging data were not reported, the task was completed in an fMRI scanner
<ul>
<li><strong>I am unclear about why the neuroimaging data were not reported</strong></li>
</ul>
</li>
</ul>
<h5 id="models">Models</h5>
<p>The model fit to the behavioural data by Potter et al. (1) consists of an observation model</p>
<script type="math/tex; mode=display">P(a_t | \mathcal{Q}_t (s_t, a_t; w, \alpha, \lambda), \beta) = \frac{e^{\beta \mathcal{Q}_t (s_t, a_t; w, \alpha, \lambda) + p\cdot\mathrm{rep}(a_t)}}{\sum_{a' \in \mathcal{A}} e^{\mathcal{Q}_t (s_t, a'; w, \alpha, \lambda)+ p\cdot\mathrm{rep}(a')}},</script>
<p>where $\beta$ is the inverse softmax temperature (decision randomness) and $\mathrm{rep}(a_t)$ is an indicator function taking value 1 if $a_t$ is the same action taken at the same step of the last trial (i.e. $I[a_t = a_{t-1}]$) weighted by a ‘perseveration’ parameter $p$. The function $\mathcal{Q}_t(s_t, a_t; w, \alpha, \lambda)$ is the hybrid learming model with MB and model-free (MF) components weighted by a parameter $w$</p>
<script type="math/tex; mode=display">\mathcal{Q}_t(s_t, a_t; w, \alpha, \lambda) = w \mathcal{Q}_t^{MB}(s_t, a_t) + (1-w) \mathcal{Q}_t^{MF}(s_t, a_t; \alpha, \lambda).</script>
<p>The MF component $\mathcal{Q}_t^{MF}$ is the SARSA($\lambda$) rule</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{aligned}
\mathcal{Q}_t^{MF}(s_t^{(Step\,2)}, a_t^{(Step\,2)}; \alpha, \lambda) & = \mathcal{Q}_{t-1}^{MF}(s_t^{(Step\,2)}, a_t^{(Step\,2)}; \alpha, \lambda) + \alpha \Big(r_t - \mathcal{Q}_{t-1}^{MF}(s_t^{(Step\,2)}, a_t^{(Step\,2)}; \alpha, \lambda) \Big) \\
\mathcal{Q}_t^{MF}(s_t^{(Step\,1)}, a_t^{(Step\,1)}; \alpha, \lambda) & = \mathcal{Q}_{t-1}^{MF}(s_t^{(Step\,1)}, a_t^{(Step\,1)}; \alpha, \lambda) + \alpha \Big(\lambda \mathcal{Q}_{t-1}^{MF}(s_t^{(Step\,2)}, a_t^{(Step\,2)}; \alpha, \lambda) - \mathcal{Q}_{t-1}^{MF}(s_t^{(Step\,1)}, a_t^{(Step\,1)}; \alpha, \lambda) \Big),
\end{aligned} %]]></script>
<p>where $\alpha$ is the learning rate and $\lambda$ is an eligibility trace parameter that governs the amount by which value at the second step of trial $t$ backs up to the first step of trial $t$. <em>_I am unclear whether the authors decayed the previous MF values by $1-\alpha$, i.e. $(1-\alpha)\mathcal{Q}</em>{t-1}^{MF} + \cdots$.</p>
<p>The MB learning rule was not explicitly reported by Potter et al. (1), but is generally take the form of Bellman’s equation:</p>
<script type="math/tex; mode=display">\forall a_t \in \mathcal{A}, \;\; \mathcal{Q}_t^{MB}(s_t^{(Step\,1)}, a_t) = \sum_{s_t^{(Step\,2)} \in \mathcal{S}^{(Step\,2)}} \mathcal{T}(s_{t}^{(Step\,2)}|s_t^{(Step\,1)}, a_t) \, \max_{a' \in \mathcal{A}} \mathcal{Q}_{t-1}^{MF}(s_t^{(Step\,2)}, a'; \alpha, \lambda),</script>
<p>where $\mathcal{T}$ is a transition matrix. <strong>I am unclear whether the transition matrix was specified or learned by the model</strong>. In the case of 150 trials, this may not be a trivial matter. However, I suspect that since the authors had subjects perform 50 trials before the main task, that the MB learning rule did not include a module for learning the transition probabilities.</p>
<h5 id="model-fitting">Model-Fitting</h5>
<p>This was done in the same hierarchical Bayesian fashion as Daw et al. (3), which used empirical priors for the parameters, as follows:</p>
<table>
<thead>
<tr>
<th>Parameters</th>
<th>Empirical Prior Distribution</th>
</tr>
</thead>
<tbody>
<tr>
<td>$\alpha, \gamma, w$</td>
<td>$Beta(a=1.1, b=1.1)$</td>
</tr>
<tr>
<td>p</td>
<td>$\mathcal{N}(\mu=0, \sigma=1)$</td>
</tr>
<tr>
<td>$\beta$</td>
<td>$Gamma(a=1.2, b=5)$</td>
</tr>
</tbody>
</table>
<p>The use of empirical priors differs from approaches such as that by Huys et al. (4), which computes the MAP estimates of parameters and MLE of hyperparameters jointly using expectation maximization. This is the approach employed in our group’s <a href="https://abrahamnunes.github.io/fitr"><code class="highlighter-rouge">fitr</code></a> toolbox.</p>
<h5 id="model-comparison">Model-Comparison</h5>
<p>The authors did not fit multiple models to their data, and so model-comparison was not done. Since the models fit to the two-step task are similar in form, it is unclear whether this is of great consequence. However, with appropriate model-selection procedures that guard against overfitting, they may have been able to improve the fit of their models by including multiple variations on the form described above.</p>
<h5 id="theory-free-analysis">Theory-Free Analysis</h5>
<p>The authors also conducted the usual linear mixed-effects regression using the first-stage “stay vs. switch” as the response variable. In this case, one looks for an interaction between reward at the last trial (1 or 0), and whether the transition at the last trial was common or rare.</p>
<h4 id="statistical-learning-task">Statistical Learning Task</h4>
<p>This task involves twelve stimuli which are grouped into four “triplets” (i.e. each triplet has three stimuli which belong to it, but not to any of the other triplets). For example triplets may include (A,B,C), (D,E,F), (G,H,I), (J,K,L). The subjects initially have a familiarization phase where the stimuli are presented in sequence, one by one. However, the subjects are not aware of the triplet structure, and stimuli are presented in a fixed inter-triplet order, with triplets being interleaved. For example, a sequence may have included <em>A$\to$B$\to$C$\to$G$\to$H$\to$I$\to$D$\to$E$\to$F…</em> which one can see preserves the order of symbols within their assigned triplet, but interleaves the triplets themselves. Again, the subjects are unaware of this structure.</p>
<p>In the second paprt of this task, subjects are presented with 32 trials of two triplets (with all symbols displayed at once). However, at this point, one of the triplets being presented were never observed before. For example (D, E, F) and (K, H, A). It may be the case that the never-before seen triplet included totally new symbols, but this was not specified in the paper. Subjects at this stage were required to identify which triplet was more familiar to them.</p>
<h4 id="fluid-reasoning-task">Fluid Reasoning Task</h4>
<p>Thse included matrix reasoning (fluid reasoning) and vocabulary sections (crystallized intelligence) of the Weschler Abbreviated Scale of Intelligence. The latter was included to determine whether any observed effects of fluid reasoning were due to a more broadly constructed concept of intelligence.</p>
<h4 id="working-memory-task">Working Memory Task</h4>
<p>This task was the listening-recall subtest of the Automated Working Memory Assessment. In this task, subjects are read 8 single sentences and 7 pairs of sentences. At the end of each sentence, the subject states whether the sentence was true or false, then repeats the last word of the sentence. For the section with sentence pairs, the subject reports whether each sentence is true or false immediately after the respective sentence is read, but recalls the last word of each sentence once both sentences have been read. The score is computed by pooling both processing (i.e. correctly identifying true vs. false) and recall portions. Overall, this task is meant to assess the ability to hold information in memory (for recall) despite interference (processing).</p>
<h4 id="mediation-analysis">Mediation Analysis</h4>
<p>I am acutually unfamiliar with this type of analysis, and will edit this post once I learn a bit more about it.</p>
<h2 id="results">Results</h2>
<h3 id="two-step-task-models">Two-step task models</h3>
<p>The distribution of parameter estimates for all subjects were as follows:</p>
<table>
<thead>
<tr>
<th>Parameter</th>
<th>Median</th>
<th>IQR</th>
</tr>
</thead>
<tbody>
<tr>
<td>w</td>
<td>0.52</td>
<td>0.16-0.71</td>
</tr>
<tr>
<td>$\alpha$</td>
<td>0.42</td>
<td>0.05-0.76</td>
</tr>
<tr>
<td>$\lambda$</td>
<td>0.62</td>
<td>0.29-0.90</td>
</tr>
<tr>
<td>$\beta$</td>
<td>2.91</td>
<td>2.40-4.02</td>
</tr>
<tr>
<td>$p$</td>
<td>0.14</td>
<td>-0.04-0.36</td>
</tr>
</tbody>
</table>
<p>The authors found that MB control increased with age (correlation of age group with MB weight parameter $w$ was 0.30, p=0.01). This was also shown through the linear mixed-effects regression, where all age groups showed a significant main effect of prior-trial reward (which indicates MF control), but only adolescents and adults showed significant main effects of reward-transition interaction (which indicates use of MB control). That adults and adolescents used MB control was also reflected in analysis of reaction times following rare transitions. The idea here is that if one is ‘surprised’ by a rare transition, his or her reaction time at the second stage will slow. Thus, slower reaction times at the second stage choice may reflect increased use of MB control.</p>
<p>Importantly, all groups showed equal explicit understanding of the transition structure when asked specific questions about the transition probabilities.</p>
<h3 id="statistical-learning">Statistical learning</h3>
<p>All groups demonstrated performance on this task at above-chance levels, with accuracy improving with age. However, statistical learning did not mediate the relationship between age and MB control. The authors indicated that this may have been related to small sample size.</p>
<p>The authors commented—quite reasonably—that statistical learning may be involved in the process of learning a cognitive model of the task. To determine whether this was the case, they compared the second-stage reaction times following rare transitions against the statistical learning measures, finding a positive relationship (r=0.40, p=0.0008). However, it stands that in the study by Pearson et al. (1), statistical learning had no mediating influence on the increased use of MB control with age.</p>
<h3 id="fluid-reasoning">Fluid Reasoning</h3>
<p>Fluid reasoning</p>
<ul>
<li>Increased with age (r=0.53, p$<0.0001$)</li>
<li>Correlated with MB control parameter $w$ (r=0.41, p=0.0001)</li>
<li>Fully mediated the relationship between age and MB control
<ul>
<li><strong>I need to look at this type of analysis more closely to learn exactly what this means</strong></li>
<li>The mediating role of fluid reasoning was robust to testing against crystallized intelligence, which despite showing a significant relationships with both age and MB control, did not mediate the relationship between age and MB control.</li>
</ul>
</li>
<li>Fully mediated the relationship between (A) statistical learning and MB control, and (B) age and statistical learning in a directionally specific fashion (i.e. statistical reasoning did not mediate the relationship between fluid reasoning and MB control)</li>
</ul>
<p>The authors concluded that (A) the age related effect of fluid reasoning was a specific mediator (independent of general intelligence) of the relationship between age and MB control, (B) that fluid reasoning may account for the relationship between statistical learning ability and MB control, and (C) that fluid reasoning is an important factor in the development of statistical learning with age.</p>
<h3 id="working-memory">Working Memory</h3>
<ul>
<li>Unfortunately, 45\% of the subjects reached ceiling level performance on the working memory task, limiting the authors’ analysis of the effects of working memory on MB control
<ul>
<li>Notwithstanding, they still found a positive correlation of working memory task performance and the MB control parameter $w$ (r=0.31, p=0.03)</li>
<li>Working memory performance did not significantly correlate with age</li>
</ul>
</li>
</ul>
<h2 id="my-remarks-and-implications">My Remarks and Implications</h2>
<p>Fluid reasoning, and not crystallized intelligence, is likely an important component of—or contributor to—MB control. To this end, we must consider the utility of using certain common IQ tests, such as the North American Adult Reading Test (7) as covariates in analyses of MB and MF control modeling</p>
<p>Working memory is likely important for MB control (8), but we must be careful to use probes that can account for ceiling effects. To this end, using a task such as Operation Span (OPSPAN; Refs. 9, 10), which can be extended in terms of recall and processing demands (i.e. longer sequences) may be beneficial to avoid ceiling effects.</p>
<p>The finding that children were similarly able to appreciate the transition dynamics in the task, as reflected by both their answers to questions about transition structure as well as their reaction time data, compared to adults is interesting in light of their lesser use of MB control. Potter et al. (1) suggested that this was related to a reduced ability to use that knowledge in decision making. I wonder whether this may be at all associated with the pattern of developmental myelination of corticostriatal projections. Studying a similar task after diffusion tensor imaging may be of interest in the future.</p>
<p>MB control involves building an internal representation of environmental state dynamics and using that model to traverse potential sequences of states and actions during decision-making. Control of the balance between MB and MF learning, represented by the parameter $w$ has been called “arbitration,” (5) although the mechanisms of this arbitration are unclear. It is possible that the balance of MB and MF control is determined by the precision of estimates from the MB and MF systems (6) in which case an individual’s ability to (A) maintain an accurate representation of the environmental transition model and (B) implement that model efficiently “online” during decision-making will significantly influence the expression of MB control. It may be the case that fluid reasoning influences the ability of “model traversal” during decision-making. The relationship with working memory, however, would be of greater interpretability, since traversing a decision tree would require accurate maintenance of a representation specifying the current path taken down that tree. This would be necessary to back-up value accurately.</p>
<h2 id="references">References</h2>
<ol>
<li>Potter, T. C., Bryce, N. V, & Hartley, C. A. (2016). Cognitive components underpinning the development of model-based learning. Developmental Cognitive Neuroscience, http://doi.org/10.1016/j.dcn.2016.10.005</li>
<li>Decker, J. H., Otto, A. R., Daw, N. D., & Hartley, C. A. (2016). From Creatures of Habit to Goal-Directed Learners: Tracking the Developmental Emergence of Model-Based Reinforcement Learning. Psychological Science, 27(6), 848–858. http://doi.org/10.1177/0956797616639301</li>
<li>Daw, N. D., Gershman, S. J., Seymour, B., Dayan, P., & Dolan, R. J. (2011). Model-based influences on humans’ choices and striatal prediction errors. Neuron, 69(6), 1204–1215. http://doi.org/10.1016/j.neuron.2011.02.027</li>
<li>Huys, Q. J. M., Cools, R., Gölzer, M., Friedel, E., Heinz, A., Dolan, R. J., & Dayan, P. (2011). Disentangling the roles of approach, activation and valence in instrumental and pavlovian responding. PLoS Computational Biology, 7(4). http://doi.org/10.1371/journal.pcbi.1002028</li>
<li>O’Doherty, J. P., Lee, S. W., & McNamee, D. (2015). The structure of reinforcement-learning mechanisms in the human brain. Current Opinion in Behavioral Sciences, 1(April 2016), 94–100. http://doi.org/10.1016/j.cobeha.2014.10.004</li>
<li>Wan Lee, S., Shimojo, S., & O’Doherty, J. P. (2014). Neural Computations Underlying Arbitration between Model-Based and Model-free Learning. Neuron, 81(3), 687–699. http://doi.org/10.1016/j.neuron.2013.11.028</li>
<li>Uttl, B. (2002). North American Adult Reading Test: age norms, reliability, and validity. Journal of Clinical and Experimental Neuropsychology, 24(8), 1123–37. http://doi.org/10.1076/jcen.24.8.1123.8375</li>
<li>Otto, A. R., Raio, C. M., Chiang, A., Phelps, E. A., & Daw, N. D. (2013). Working-memory capacity protects model-based learning from stress. Proceedings of the National Academy of Sciences of the United States of America, 110(52), 20941–20946. http://doi.org/10.1073/pnas.1312011110</li>
<li>Conway, A. R. a, Kane, M. J., & Al, C. E. T. (2005). Working memory span tasks : A methodological review and user ’ s guide. Psychonomic Bulletin & Review, 12(5), 769–786. http://doi.org/10.3758/BF03196772</li>
<li>Unsworth, N., Heitz, R. P., Schrock, J. C., & Engle, R. W. (2005). An automated version of the operation span task. Behavior Research Methods, 37(3), 498–505. http://doi.org/10.3758/BF03192720</li>
</ol>
Mon, 14 Nov 2016 00:00:00 +0000
https://abrahamnunes.github.io/2016/11/14/MB-Control-Fluid-Intelligence.html
https://abrahamnunes.github.io/2016/11/14/MB-Control-Fluid-Intelligence.htmlreinforcement_learningcomputational_psychiatrycomputational_neuroscienceThe development of model-based control as it relates to fluid intelligence<p>I am quite interested in the natural history of behavioural control over the lifespan, and have been excited to see a couple of studies about this lately. The most recent paper I read on the topic is by Potter et al. (1) which just came up in my newsfeed. I’ve made some notes about it here.</p>
<p>Potter et al. (1) sought to determine whether age-related changes in statistical learning, working memory, and fluid reasoning contributed to the positive association between age and model-based (MB) control, which had been shown by Decker et al. (2).</p>
<h2 id="methods">Methods</h2>
<h3 id="subjects">Subjects</h3>
<table>
<thead>
<tr>
<th>Group</th>
<th>Definition</th>
<th>N</th>
<th>N Post-Exclusion</th>
</tr>
</thead>
<tbody>
<tr>
<td>Children</td>
<td>Age 9-12 years</td>
<td>22</td>
<td>19</td>
</tr>
<tr>
<td>Adolescents</td>
<td>Age 13-17 years</td>
<td>23</td>
<td>22</td>
</tr>
<tr>
<td>Adults</td>
<td>Age 18-25 years</td>
<td>24</td>
<td>23</td>
</tr>
</tbody>
</table>
<p>There were some additional missing data for the remaining subjects.</p>
<ul>
<li>One child did not have statistical learning data acquired</li>
<li>WASI matrix-reasoning not completed for 1 adolescent and 2 adults</li>
<li>14 children, 17 adolescents, and 18 adults also completed listening recall subtest of Automated Working Memory Assessment</li>
</ul>
<h3 id="tasks">Tasks</h3>
<table>
<thead>
<tr>
<th>Domain Tested</th>
<th>Task</th>
</tr>
</thead>
<tbody>
<tr>
<td>Reinforcement learning</td>
<td>Two-step task</td>
</tr>
<tr>
<td>Statistical learning</td>
<td><em>See Below</em></td>
</tr>
<tr>
<td>Fluid reasoning</td>
<td>WASI matrix reasoning and vocabulary</td>
</tr>
</tbody>
</table>
<h4 id="two-step-task">Two-step task</h4>
<ul>
<li>The authors used an interesting modification of the two-step task which has a story-line that facilitates completion by children. If I am not mistaken, this may be the same one available through Wouter Kool’s <a href="https://github.com/wkool/tradeoffs"><code class="highlighter-rouge">tradeoffs</code></a> repo.</li>
<li>The task structure is the usual one for the two-step task (3), with a couple of modifications:
<ul>
<li>Only 150 trials were done, rather than the usual 201
<ul>
<li><strong>This may have important implications for model-fitting</strong></li>
</ul>
</li>
</ul>
</li>
<li>Although neuroimaging data were not reported, the task was completed in an fMRI scanner
<ul>
<li><strong>I am unclear about why the neuroimaging data were not reported</strong></li>
</ul>
</li>
</ul>
<h5 id="models">Models</h5>
<p>The model fit to the behavioural data by Potter et al. (1) consists of an observation model</p>
<script type="math/tex; mode=display">P(a_t | \mathcal{Q}_t (s_t, a_t; w, \alpha, \lambda), \beta) = \frac{e^{\beta \mathcal{Q}_t (s_t, a_t; w, \alpha, \lambda) + p\cdot\mathrm{rep}(a_t)}}{\sum_{a' \in \mathcal{A}} e^{\mathcal{Q}_t (s_t, a'; w, \alpha, \lambda)+ p\cdot\mathrm{rep}(a')}},</script>
<p>where $\beta$ is the inverse softmax temperature (decision randomness) and $\mathrm{rep}(a_t)$ is an indicator function taking value 1 if $a_t$ is the same action taken at the same step of the last trial (i.e. $I[a_t = a_{t-1}]$) weighted by a ‘perseveration’ parameter $p$. The function $\mathcal{Q}_t(s_t, a_t; w, \alpha, \lambda)$ is the hybrid learming model with MB and model-free (MF) components weighted by a parameter $w$</p>
<script type="math/tex; mode=display">\mathcal{Q}_t(s_t, a_t; w, \alpha, \lambda) = w \mathcal{Q}_t^{MB}(s_t, a_t) + (1-w) \mathcal{Q}_t^{MF}(s_t, a_t; \alpha, \lambda).</script>
<p>The MF component $\mathcal{Q}_t^{MF}$ is the SARSA($\lambda$) rule</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{aligned}
\mathcal{Q}_t^{MF}(s_t^{(Step\,2)}, a_t^{(Step\,2)}; \alpha, \lambda) & = \mathcal{Q}_{t-1}^{MF}(s_t^{(Step\,2)}, a_t^{(Step\,2)}; \alpha, \lambda) + \alpha \Big(r_t - \mathcal{Q}_{t-1}^{MF}(s_t^{(Step\,2)}, a_t^{(Step\,2)}; \alpha, \lambda) \Big) \\
\mathcal{Q}_t^{MF}(s_t^{(Step\,1)}, a_t^{(Step\,1)}; \alpha, \lambda) & = \mathcal{Q}_{t-1}^{MF}(s_t^{(Step\,1)}, a_t^{(Step\,1)}; \alpha, \lambda) + \alpha \Big(\lambda \mathcal{Q}_{t-1}^{MF}(s_t^{(Step\,2)}, a_t^{(Step\,2)}; \alpha, \lambda) - \mathcal{Q}_{t-1}^{MF}(s_t^{(Step\,1)}, a_t^{(Step\,1)}; \alpha, \lambda) \Big),
\end{aligned} %]]></script>
<p>where $\alpha$ is the learning rate and $\lambda$ is an eligibility trace parameter that governs the amount by which value at the second step of trial $t$ backs up to the first step of trial $t$. <em>_I am unclear whether the authors decayed the previous MF values by $1-\alpha$, i.e. $(1-\alpha)\mathcal{Q}</em>{t-1}^{MF} + \cdots$.</p>
<p>The MB learning rule was not explicitly reported by Potter et al. (1), but is generally take the form of Bellman’s equation:</p>
<script type="math/tex; mode=display">\forall a_t \in \mathcal{A}, \;\; \mathcal{Q}_t^{MB}(s_t^{(Step\,1)}, a_t) = \sum_{s_t^{(Step\,2)} \in \mathcal{S}^{(Step\,2)}} \mathcal{T}(s_{t}^{(Step\,2)}|s_t^{(Step\,1)}, a_t) \, \max_{a' \in \mathcal{A}} \mathcal{Q}_{t-1}^{MF}(s_t^{(Step\,2)}, a'; \alpha, \lambda),</script>
<p>where $\mathcal{T}$ is a transition matrix. <strong>I am unclear whether the transition matrix was specified or learned by the model</strong>. In the case of 150 trials, this may not be a trivial matter. However, I suspect that since the authors had subjects perform 50 trials before the main task, that the MB learning rule did not include a module for learning the transition probabilities.</p>
<h5 id="model-fitting">Model-Fitting</h5>
<p>This was done in the same hierarchical Bayesian fashion as Daw et al. (3), which used empirical priors for the parameters, as follows:</p>
<table>
<thead>
<tr>
<th>Parameters</th>
<th>Empirical Prior Distribution</th>
</tr>
</thead>
<tbody>
<tr>
<td>$\alpha, \gamma, w$</td>
<td>$Beta(a=1.1, b=1.1)$</td>
</tr>
<tr>
<td>p</td>
<td>$\mathcal{N}(\mu=0, \sigma=1)$</td>
</tr>
<tr>
<td>$\beta$</td>
<td>$Gamma(a=1.2, b=5)$</td>
</tr>
</tbody>
</table>
<p>The use of empirical priors differs from approaches such as that by Huys et al. (4), which computes the MAP estimates of parameters and MLE of hyperparameters jointly using expectation maximization. This is the approach employed in our group’s <a href="https://abrahamnunes.github.io/fitr"><code class="highlighter-rouge">fitr</code></a> toolbox.</p>
<h5 id="model-comparison">Model-Comparison</h5>
<p>The authors did not fit multiple models to their data, and so model-comparison was not done. Since the models fit to the two-step task are similar in form, it is unclear whether this is of great consequence. However, with appropriate model-selection procedures that guard against overfitting, they may have been able to improve the fit of their models by including multiple variations on the form described above.</p>
<h5 id="theory-free-analysis">Theory-Free Analysis</h5>
<p>The authors also conducted the usual linear mixed-effects regression using the first-stage “stay vs. switch” as the response variable. In this case, one looks for an interaction between reward at the last trial (1 or 0), and whether the transition at the last trial was common or rare.</p>
<h4 id="statistical-learning-task">Statistical Learning Task</h4>
<p>This task involves twelve stimuli which are grouped into four “triplets” (i.e. each triplet has three stimuli which belong to it, but not to any of the other triplets). For example triplets may include (A,B,C), (D,E,F), (G,H,I), (J,K,L). The subjects initially have a familiarization phase where the stimuli are presented in sequence, one by one. However, the subjects are not aware of the triplet structure, and stimuli are presented in a fixed inter-triplet order, with triplets being interleaved. For example, a sequence may have included <em>A$\to$B$\to$C$\to$G$\to$H$\to$I$\to$D$\to$E$\to$F…</em> which one can see preserves the order of symbols within their assigned triplet, but interleaves the triplets themselves. Again, the subjects are unaware of this structure.</p>
<p>In the second paprt of this task, subjects are presented with 32 trials of two triplets (with all symbols displayed at once). However, at this point, one of the triplets being presented were never observed before. For example (D, E, F) and (K, H, A). It may be the case that the never-before seen triplet included totally new symbols, but this was not specified in the paper. Subjects at this stage were required to identify which triplet was more familiar to them.</p>
<h4 id="fluid-reasoning-task">Fluid Reasoning Task</h4>
<p>Thse included matrix reasoning (fluid reasoning) and vocabulary sections (crystallized intelligence) of the Weschler Abbreviated Scale of Intelligence. The latter was included to determine whether any observed effects of fluid reasoning were due to a more broadly constructed concept of intelligence.</p>
<h4 id="working-memory-task">Working Memory Task</h4>
<p>This task was the listening-recall subtest of the Automated Working Memory Assessment. In this task, subjects are read 8 single sentences and 7 pairs of sentences. At the end of each sentence, the subject states whether the sentence was true or false, then repeats the last word of the sentence. For the section with sentence pairs, the subject reports whether each sentence is true or false immediately after the respective sentence is read, but recalls the last word of each sentence once both sentences have been read. The score is computed by pooling both processing (i.e. correctly identifying true vs. false) and recall portions. Overall, this task is meant to assess the ability to hold information in memory (for recall) despite interference (processing).</p>
<h4 id="mediation-analysis">Mediation Analysis</h4>
<p>I am acutually unfamiliar with this type of analysis, and will edit this post once I learn a bit more about it.</p>
<h2 id="results">Results</h2>
<h3 id="two-step-task-models">Two-step task models</h3>
<p>The distribution of parameter estimates for all subjects were as follows:</p>
<table>
<thead>
<tr>
<th>Parameter</th>
<th>Median</th>
<th>IQR</th>
</tr>
</thead>
<tbody>
<tr>
<td>w</td>
<td>0.52</td>
<td>0.16-0.71</td>
</tr>
<tr>
<td>$\alpha$</td>
<td>0.42</td>
<td>0.05-0.76</td>
</tr>
<tr>
<td>$\lambda$</td>
<td>0.62</td>
<td>0.29-0.90</td>
</tr>
<tr>
<td>$\beta$</td>
<td>2.91</td>
<td>2.40-4.02</td>
</tr>
<tr>
<td>$p$</td>
<td>0.14</td>
<td>-0.04-0.36</td>
</tr>
</tbody>
</table>
<p>The authors found that MB control increased with age (correlation of age group with MB weight parameter $w$ was 0.30, p=0.01). This was also shown through the linear mixed-effects regression, where all age groups showed a significant main effect of prior-trial reward (which indicates MF control), but only adolescents and adults showed significant main effects of reward-transition interaction (which indicates use of MB control). That adults and adolescents used MB control was also reflected in analysis of reaction times following rare transitions. The idea here is that if one is ‘surprised’ by a rare transition, his or her reaction time at the second stage will slow. Thus, slower reaction times at the second stage choice may reflect increased use of MB control.</p>
<p>Importantly, all groups showed equal explicit understanding of the transition structure when asked specific questions about the transition probabilities.</p>
<h3 id="statistical-learning">Statistical learning</h3>
<p>All groups demonstrated performance on this task at above-chance levels, with accuracy improving with age. However, statistical learning did not mediate the relationship between age and MB control. The authors indicated that this may have been related to small sample size.</p>
<p>The authors commented—quite reasonably—that statistical learning may be involved in the process of learning a cognitive model of the task. To determine whether this was the case, they compared the second-stage reaction times following rare transitions against the statistical learning measures, finding a positive relationship (r=0.40, p=0.0008). However, it stands that in the study by Pearson et al. (1), statistical learning had no mediating influence on the increased use of MB control with age.</p>
<h3 id="fluid-reasoning">Fluid Reasoning</h3>
<p>Fluid reasoning</p>
<ul>
<li>Increased with age (r=0.53, p$<0.0001$)</li>
<li>Correlated with MB control parameter $w$ (r=0.41, p=0.0001)</li>
<li>Fully mediated the relationship between age and MB control
<ul>
<li><strong>I need to look at this type of analysis more closely to learn exactly what this means</strong></li>
<li>The mediating role of fluid reasoning was robust to testing against crystallized intelligence, which despite showing a significant relationships with both age and MB control, did not mediate the relationship between age and MB control.</li>
</ul>
</li>
<li>Fully mediated the relationship between (A) statistical learning and MB control, and (B) age and statistical learning in a directionally specific fashion (i.e. statistical reasoning did not mediate the relationship between fluid reasoning and MB control)</li>
</ul>
<p>The authors concluded that (A) the age related effect of fluid reasoning was a specific mediator (independent of general intelligence) of the relationship between age and MB control, (B) that fluid reasoning may account for the relationship between statistical learning ability and MB control, and (C) that fluid reasoning is an important factor in the development of statistical learning with age.</p>
<h3 id="working-memory">Working Memory</h3>
<ul>
<li>Unfortunately, 45\% of the subjects reached ceiling level performance on the working memory task, limiting the authors’ analysis of the effects of working memory on MB control
<ul>
<li>Notwithstanding, they still found a positive correlation of working memory task performance and the MB control parameter $w$ (r=0.31, p=0.03)</li>
<li>Working memory performance did not significantly correlate with age</li>
</ul>
</li>
</ul>
<h2 id="my-remarks-and-implications">My Remarks and Implications</h2>
<p>Fluid reasoning, and not crystallized intelligence, is likely an important component of—or contributor to—MB control. To this end, we must consider the utility of using certain common IQ tests, such as the North American Adult Reading Test (7) as covariates in analyses of MB and MF control modeling</p>
<p>Working memory is likely important for MB control (8), but we must be careful to use probes that can account for ceiling effects. To this end, using a task such as Operation Span (OPSPAN; Refs. 9, 10), which can be extended in terms of recall and processing demands (i.e. longer sequences) may be beneficial to avoid ceiling effects.</p>
<p>The finding that children were similarly able to appreciate the transition dynamics in the task, as reflected by both their answers to questions about transition structure as well as their reaction time data, compared to adults is interesting in light of their lesser use of MB control. Potter et al. (1) suggested that this was related to a reduced ability to use that knowledge in decision making. I wonder whether this may be at all associated with the pattern of developmental myelination of corticostriatal projections. Studying a similar task after diffusion tensor imaging may be of interest in the future.</p>
<p>MB control involves building an internal representation of environmental state dynamics and using that model to traverse potential sequences of states and actions during decision-making. Control of the balance between MB and MF learning, represented by the parameter $w$ has been called “arbitration,” (5) although the mechanisms of this arbitration are unclear. It is possible that the balance of MB and MF control is determined by the precision of estimates from the MB and MF systems (6) in which case an individual’s ability to (A) maintain an accurate representation of the environmental transition model and (B) implement that model efficiently “online” during decision-making will significantly influence the expression of MB control. It may be the case that fluid reasoning influences the ability of “model traversal” during decision-making. The relationship with working memory, however, would be of greater interpretability, since traversing a decision tree would require accurate maintenance of a representation specifying the current path taken down that tree. This would be necessary to back-up value accurately.</p>
<h2 id="references">References</h2>
<ol>
<li>Potter, T. C., Bryce, N. V, & Hartley, C. A. (2016). Cognitive components underpinning the development of model-based learning. Developmental Cognitive Neuroscience, http://doi.org/10.1016/j.dcn.2016.10.005</li>
<li>Decker, J. H., Otto, A. R., Daw, N. D., & Hartley, C. A. (2016). From Creatures of Habit to Goal-Directed Learners: Tracking the Developmental Emergence of Model-Based Reinforcement Learning. Psychological Science, 27(6), 848–858. http://doi.org/10.1177/0956797616639301</li>
<li>Daw, N. D., Gershman, S. J., Seymour, B., Dayan, P., & Dolan, R. J. (2011). Model-based influences on humans’ choices and striatal prediction errors. Neuron, 69(6), 1204–1215. http://doi.org/10.1016/j.neuron.2011.02.027</li>
<li>Huys, Q. J. M., Cools, R., Gölzer, M., Friedel, E., Heinz, A., Dolan, R. J., & Dayan, P. (2011). Disentangling the roles of approach, activation and valence in instrumental and pavlovian responding. PLoS Computational Biology, 7(4). http://doi.org/10.1371/journal.pcbi.1002028</li>
<li>O’Doherty, J. P., Lee, S. W., & McNamee, D. (2015). The structure of reinforcement-learning mechanisms in the human brain. Current Opinion in Behavioral Sciences, 1(April 2016), 94–100. http://doi.org/10.1016/j.cobeha.2014.10.004</li>
<li>Wan Lee, S., Shimojo, S., & O’Doherty, J. P. (2014). Neural Computations Underlying Arbitration between Model-Based and Model-free Learning. Neuron, 81(3), 687–699. http://doi.org/10.1016/j.neuron.2013.11.028</li>
<li>Uttl, B. (2002). North American Adult Reading Test: age norms, reliability, and validity. Journal of Clinical and Experimental Neuropsychology, 24(8), 1123–37. http://doi.org/10.1076/jcen.24.8.1123.8375</li>
<li>Otto, A. R., Raio, C. M., Chiang, A., Phelps, E. A., & Daw, N. D. (2013). Working-memory capacity protects model-based learning from stress. Proceedings of the National Academy of Sciences of the United States of America, 110(52), 20941–20946. http://doi.org/10.1073/pnas.1312011110</li>
<li>Conway, A. R. a, Kane, M. J., & Al, C. E. T. (2005). Working memory span tasks : A methodological review and user ’ s guide. Psychonomic Bulletin & Review, 12(5), 769–786. http://doi.org/10.3758/BF03196772</li>
<li>Unsworth, N., Heitz, R. P., Schrock, J. C., & Engle, R. W. (2005). An automated version of the operation span task. Behavior Research Methods, 37(3), 498–505. http://doi.org/10.3758/BF03192720</li>
</ol>
Sun, 13 Nov 2016 00:00:00 +0000
https://abrahamnunes.github.io/2016/11/13/modelbased-rl-development-potter.html
https://abrahamnunes.github.io/2016/11/13/modelbased-rl-development-potter.htmlreinforcement_learningcomputational_psychiatrycomputational_neuroscienceQuick and Dirty Overview of the Bellman Equation<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]}
});
</script>
<script type="text/javascript" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<h1 id="markov-decision-process">Markov Decision Process</h1>
<p>A Markov decision process (MDP) has the following elements:</p>
<ul>
<li>A set of states, $s_t \in \mathcal{S}$</li>
<li>A set of actions which depend on the current state, $a_t \in \mathcal{A}(s)$</li>
<li>A policy which maps from state to action, $\pi(s) \in \mathcal{A}(s)$</li>
<li>State transition probabilities, $\mathcal{T} (s_{t+1} \mid s_t, a_t)$</li>
<li>A reward function $\mathcal{R} (s_{t+1} \mid s_t, a_t)$. We can also denote the reward received at the current time step as $r_t$</li>
</ul>
<p>The agent operating within such a process will generally accumulate value at each state according to behaviour under some policy. This value function is typically denoted as $V^{\pi}(s)$, but for state-action pairs is typically denoted as $\mathcal{Q}^{\pi}(s, a)$.</p>
<p>The goal of the agent solving the MDP is to find the optimal policy such that total future value is maximized. The value functions under the optimal policy are typically denoted as either $V^\ast (s)$ or $\mathcal{Q}^\ast (s, a)$.</p>
<p>Over time, the reward obtained will accumulate. At the $k^{th}$ time step, the reward is $\gamma^{k} V$, where $0 < \gamma < 1$ represents a discount factor. One can liken this to a measure of impulsivity or impatience, in behavioural terms.</p>
<h1 id="understanding-the-bellman-equations">Understanding the Bellman Equations</h1>
<p>If you were to measure the value of the current state you are in, how would you do this? The intuitive way would simply be to tally the value of any rewarding properties obtained at the present instant. This, however, misses an important fact: that the current state partially determines which states one can end up in later. As such, we can expand on the previous point by stating that the value of a current state would be the sum of the reward received in the moment, plus the total value of all future rewards expected as a result. This is closer to the true answer, but we must add one final touch: the future is worth less than the present (on account of the uncertainty of the future), so we must <em>discount</em> future rewards when adding them to the current reward. Adding current reward to a discounted total future reward results in what business-folk call the <em>net present value</em>.</p>
<p>You might have noticed a problem: we can’t know the future, especially <em>far</em> into the future. This is where the Bellman equations become interesting due to the property of <em>recursion</em>. Recall that the value of the present state represents the net present value of all states in the future. This necessarily means that the value of the next state, $s’$, represents the net present value of all future rewards thereafter. If we replace the prime (‘) notation with subscripts denoting the time step (i.e. $s_{0}$ is the initial state, $s_{2}$ is the state at time step 2, etc.), we can see this recursion in action:</p>
<script type="math/tex; mode=display">\begin{equation}
V(s_{0}) = r_{0} +
\gamma \sum_{s_{1} \in \mathcal{S}} \mathcal{T}(s_{1} \mid s_{0}, a_{0})\mathcal{R}(s_{1} \mid s_{0}, a_{0}) +
\gamma^{2} \sum_{s_{2} \in \mathcal{S}} \mathcal{T}(s_{2} \mid s_{1}, a_{1})\mathcal{R}(s_{2} \mid s_{1}, a_{1}) + \cdots
\end{equation}</script>
<p>which can be summarized as follows:</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{aligned}
V(s) & = \sum_{t = 0}^{T} \gamma^{t} \sum_{s_{t+1} \in \mathcal{S}} \mathcal{T}(s_{t+1} \mid s_{t}, a_{t})\mathcal{R}(s_{t+1} \mid s_{t}, a_{t}) \\
& = \sum_{t = 0}^{T} \gamma^{t} \Bigg\langle \mathcal{R}(s_{t+1} \mid s_{t}, a_{t}) \Bigg\rangle_{\mathcal{T}},
\end{aligned} %]]></script>
<p>and where the angled brackets $\langle \cdot \rangle_{\mathcal{T}}$ denote an expectation under probability measure $\mathcal{T}$. As such,</p>
<script type="math/tex; mode=display">\begin{equation}
V(s_{t}) = r_{t} +
\gamma \sum_{s_{t+1} \in \mathcal{S}} \mathcal{T}(s_{t+1} \mid s_{t}, a_{t})V(s_{t+1}).
\end{equation}</script>
<p>Bellman was concerned with finding the <em>optimal policy</em>, which in plain language means choosing the best possible action at each given state, where “best possible action” means the action that maximizes total future reward. The optimal control policy is thus</p>
<script type="math/tex; mode=display">\begin{equation}
\pi^\ast (s_{t}) = {\mathrm{arg} \max}_{a_{t}} \Bigg[\sum_{s_{t+1} \in \mathcal{S}} \mathcal{T}(s_{t+1} \mid s_{t}, a_{t})V^\ast(s_{t+1}) \Bigg] ,
\end{equation}</script>
<p>and it serves to maximize the value $V^{*}(s_{t})$ at the present state:</p>
<script type="math/tex; mode=display">V^\ast(s_{t}) = r_{t} + \max_{a_{t}} \Bigg[\gamma \sum_{s_{t+1} \in \mathcal{S}} \mathcal{T}(s_{t+1} \mid s_{t}, a_{t})V^\ast(s_{t+1}) \Bigg]</script>
<p>Typically—except for a few exceptions—this must be solved numerically.</p>
Fri, 30 Sep 2016 00:00:00 +0000
https://abrahamnunes.github.io/2016/09/30/Bellman-Equation-Quick-n-Dirty.html
https://abrahamnunes.github.io/2016/09/30/Bellman-Equation-Quick-n-Dirty.htmlreinforcement_learningcomputational_psychiatrycomputational_neuroscienceA tutorial on ordinary and weighted least squares<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]}
});
</script>
<script type="text/javascript" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<h1 id="introduction">Introduction</h1>
<p>Least squares methods are commonly used and are important foundations for understanding many concepts in statistics, machine learning, and computational neuroscience. This tutorial will provide an introduction to ordinary and weighted least squares.</p>
<h1 id="ordinary-least-squares">Ordinary Least Squares</h1>
<p>The brain must continuously infer the causes of its sensory experiences. Indeed, the brain has no direct access to the world around it, but rather relies on sequences of action potentials sent from sensory receptors, which carry correlative information regarding the causes of those spike trains. These observed data will be denoted by the vector $\mathbf{x}$. We assume these data are generated by some unobservable (“hidden”) causes in the world, which we will denote by the vector $\boldsymbol\theta$. The generative model can be represented as in the following equation:</p>
<script type="math/tex; mode=display">\begin{equation}
\mathbf{x} = \boldsymbol\beta \boldsymbol\theta
\end{equation}</script>
<p>where $\boldsymbol\beta$ denotes a matrix of parameters which transform $\boldsymbol\theta$ to $\mathbf{x}$. We can define the error in this model as</p>
<script type="math/tex; mode=display">\begin{equation}
\mathbb{\xi} = \boldsymbol\beta\boldsymbol\theta - \mathbf{x},
\end{equation}</script>
<p>and the loss function as</p>
<script type="math/tex; mode=display">\begin{equation}
|| \mathbb{\xi} ||^{2} = [\boldsymbol\beta\boldsymbol\theta - \mathbf{x}]^\top [\boldsymbol\beta\boldsymbol\theta - \mathbf{x}].
\end{equation}</script>
<p>Let’s expand on the above Equation:</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{aligned}
|| \mathbb{\xi} ||^{2} & = [\boldsymbol\beta\boldsymbol\theta - \mathbf{x}]^\top [\boldsymbol\beta\boldsymbol\theta - \mathbf{x}] \\
& = [\boldsymbol\theta^\top \boldsymbol\beta^\top \boldsymbol\beta\boldsymbol\theta - \mathbf{x}^\top \boldsymbol\beta\boldsymbol\theta - \boldsymbol\theta^\top \boldsymbol\beta^\top \mathbf{x} + \mathbf{x}^\top \mathbf{x}]
\end{aligned} %]]></script>
<p>Taking the derivative of this function with respect to the causes $\boldsymbol\theta$,</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{aligned}
\frac{d}{d\boldsymbol\theta} || \mathbb{\xi} ||^{2} & = \frac{d}{d\boldsymbol\theta} [\boldsymbol\theta^\top \boldsymbol\beta^\top \boldsymbol\beta\boldsymbol\theta - \mathbf{x}^\top \boldsymbol\beta\boldsymbol\theta - \boldsymbol\theta^\top \boldsymbol\beta^\top \mathbf{x} + \mathbf{x}^\top \mathbf{x}] \\
& = 2\boldsymbol\beta^\top \boldsymbol\beta\boldsymbol\theta - 2 \boldsymbol\beta^\top \mathbf{x}.
\end{aligned} %]]></script>
<p>Setting this to zero, we can solve for $\boldsymbol\theta$:</p>
<script type="math/tex; mode=display">\begin{equation}
\boldsymbol\theta = (\boldsymbol\beta^\top \boldsymbol\beta)^{-1} \boldsymbol\beta^\top \mathbf{x},
\end{equation}</script>
<p>which is the ordinary least squares equation.</p>
<h1 id="weighted-least-squares">Weighted Least Squares</h1>
<p>Consider that some measurements of $\mathbf{x}$ may have less importance than others. In this case, we may want to account for differential value of the measurements of $\mathbf{x}$. We can vary the contribution of any given observation by assigning a <em>weight</em> value. We thus introduce a weight matrix $\mathbf{W}$ into $\mathbf{x} = \boldsymbol\beta\boldsymbol\theta$,</p>
<script type="math/tex; mode=display">\begin{equation}
\mathbf{W}\mathbf{x} = \mathbf{W}\boldsymbol\beta\boldsymbol\theta,
\end{equation}</script>
<p>and consequently into $\mathbb{\xi} = \boldsymbol\beta\boldsymbol\theta - \mathbf{x}$:</p>
<script type="math/tex; mode=display">\begin{equation}
\mathbf{W}\mathbb{\xi} = \mathbf{W}\boldsymbol\beta\boldsymbol\theta - \mathbf{W}\mathbf{x}.
\end{equation}</script>
<p>Consider the case of $n$ independent and identically distributed (iid) measurements, $\mathbf{x} \in \lbrace x_{1}, x_{2}, \ldots, x_{n} \rbrace$. Each measurement will have an associated variance, which we will assume is the same given the iid property; this will obviate the need for subscripting, and the variance will be denoted as $\sigma^2$. Let us suppose, for now, that</p>
<script type="math/tex; mode=display">% <![CDATA[
\mathbf{W} = \begin{bmatrix}
\sigma^2 & 0 & \cdots & 0 \\
0 & \sigma^2 & & \\
\vdots & & \ddots & \vdots \\
0 & & \cdots & \sigma^2 \\
\end{bmatrix}. %]]></script>
<p>This is one particular instance of a weight matrix you might choose. Typically, however, one would want to assign \textit{higher} weights to measurements with greater precision, i.e. $1/\sigma^2$.</p>
<p>Just as in the OLS case, we take the derivative of the cost function and set it to zero to solve for $\boldsymbol\theta$.</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{aligned}
|| \mathbb{\mathbf{W}\xi} ||^{2} & =
[\mathbf{W}\boldsymbol\beta\boldsymbol\theta -
\mathbf{W}\mathbf{x}]^\top[\mathbf{W}\boldsymbol\beta\boldsymbol\theta - \mathbf{W}\mathbf{x}]\\
\frac{d}{d\boldsymbol\theta} || \mathbf{W}\xi ||^{2} & = \frac{d}{d\boldsymbol\theta}[\boldsymbol\theta^\top \boldsymbol\beta^\top \mathbf{W}^\top \mathbf{W} \boldsymbol\beta \boldsymbol\theta - \boldsymbol\theta^\top \boldsymbol\beta^\top \mathbf{W}^\top \mathbf{W}\mathbf{x} - \mathbf{x}^\top \mathbf{W}^\top \mathbf{W} \boldsymbol\beta \boldsymbol\theta + \mathbf{x}^\top \mathbf{W}^\top \mathbf{W} \mathbf{x}] \\
0 & = 2 \boldsymbol\beta^\top \mathbf{W}^\top \mathbf{W} \boldsymbol\beta \boldsymbol\theta - 2\boldsymbol\beta^\top \mathbf{W}^\top\mathbf{W}\mathbf{x} \\
& \\
\boldsymbol\theta & = (\boldsymbol\beta^\top \mathbf{W}^\top \mathbf{W} \boldsymbol\beta)^{-1}\boldsymbol\beta^\top \mathbf{W}^\top\mathbf{W}\mathbf{x}
\end{aligned} %]]></script>
<p>Note that setting $\mathbf{W} = \mathbf{I}$, where $\mathbf{I}$ is the identity matrix, results in ordinary least squares:</p>
<script type="math/tex; mode=display">(\boldsymbol\beta^\top \mathbf{I}^\top \mathbf{I} \boldsymbol\beta)^{-1}\boldsymbol\beta^\top \mathbf{I}^\top \mathbf{I} \mathbf{x} = (\boldsymbol\beta^\top \boldsymbol\beta)^{-1}\boldsymbol\beta^\top \mathbf{x}</script>
Thu, 29 Sep 2016 00:00:00 +0000
https://abrahamnunes.github.io/2016/09/29/Tutorial-OLS-WLS.html
https://abrahamnunes.github.io/2016/09/29/Tutorial-OLS-WLS.htmlmachine_learningstatisticsregressionInflammation and Reward Sensitivity<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]}
});
</script>
<script type="text/javascript" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<p>This post includes some brief notes on the recent paper by Harrison, Voon, et al. [1] in <em>Biological Psychiatry</em>, which studied the effects of systemic inflammation on reinforcement learning in humans.</p>
<p>Harrison et al [1] used a task previously implemented by Pessiglione et al. [2] whose reward structure is represented by the following table:</p>
<table>
<thead>
<tr>
<th>State Name</th>
<th>Action 1 Value</th>
<th>Action 2 Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>“Gain”</td>
<td><code class="highlighter-rouge">+1*binornd(1, 0.8)</code></td>
<td><code class="highlighter-rouge">+1*binornd(1, 0.2)</code></td>
</tr>
<tr>
<td>“Neutral”</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>“Loss”</td>
<td><code class="highlighter-rouge">-1*binornd(1, 0.8)</code></td>
<td><code class="highlighter-rouge">-1*binornd(1, 0.2)</code></td>
</tr>
</tbody>
</table>
<p>Each state consisted of two unique visual stimuli, from which the subject was required to select one of the two. As such, in the above table, we refer to the stimuli as “Action 1” and “Action 2”, since they can be considered representative of the actions available to the subject in each given state.</p>
<p>The authors modeled subjects’ behavioural data using a two-parameter Rescorla-Wagner rule as the learning model</p>
<script type="math/tex; mode=display">Q_{t+1}(s_t, a_t) = Q_t(s_t, a_t) + \alpha \delta_t,</script>
<p>where $\delta_t = R_t - Q_t(s_t, a_t)$. The free parameters $\alpha$ and $R_t$ represent learning rate and subjective reward, respectively. The observation model consisted of a standard softmax with inverse temperature parameter $\beta$. Inverse temperature is quite well named by Harrison et al. as “choice randomness” [1].</p>
<p>The authors found no association between inflammation and the learning rate or inverse temperature parameters, but did observe a statistically significant association between subjective reward $R_t$ and inflammation. Specifically, the authors observed an increase in the magnitude of subjective value of the punishment stimuli during the inflammation condition. These results were consistent with those of Huys et al. [12], who found that anhedonia was related almost exclusively to reward sensitivity, rather than learning rate or otherwise.</p>
<p>Using model-based fMRI, Harrison et al. [1] replicated findings from [2] demonstrating correlations with reward prediction error in the ventral striatum, as well as punishment prediction error correlation (a negative correlation with reward prediction error) in the left insula. The authors then went on to demonstrate that the inflammation condition was associated with the following statistically significant changes on fMRI:</p>
<ul>
<li>Reduced encoding of reward prediction error in the ventral striatum</li>
<li>Increased right insula encoding of punishment prediction error</li>
</ul>
<blockquote>
<p>NB: I do not quite understand how to interpret the change involving the right insula, given that the initial correlation of negative reward prediction errors only in the left insula. To this end, I need to further develop an understanding of model-based fMRI, which I am currently working on for the <code class="highlighter-rouge">fitr</code> package.</p>
</blockquote>
<p>The study by Harrison et al. [1] suggests that mild systemic inflammation may be associated with heightened punishment sensitivity, and that these neurocomputational processes may be the result of inflammation-related changes at the ventral striatum and insulae. They review the relevance of their findings by noting that impaired reward appraisal is observed in so-called “sickness behaviour,” which is characteristic of several conditions, including depression. As such, the results of Harrison et al. [1] may be a starting point for bridging neuroimmunological theories of depression with observable phenotypes [9, 10]. This study also highlights that ventral striatal encoding of reward prediction error may be sensitive to systemic inflammation; the authors suggest that this may “afford one element of an efficient mechanism for the rapid reorientation of behaviour in the face of acute infection.”</p>
<p>This shift in sensitivity from reward to punishment may be relevant to the learned helplessness model of depression, in which abnormal reward vs. punishment sensitivity is implicated [11]</p>
<p>Given the known association between mesolimbic dopaminergic signalling and reward prediction error, one may assume that the changes observed by Harrison et al. [1] have dopaminergic underpinnings. However, the authors astutely note that their study could not address this question directly. Notwithstanding, Haloperidol administration during the same task by Pessiglione et al. [2] showed similar results to the present study with respect to differences in ventral striatal reward prediction error signalling. Harrison et al. hypothesize, then, that inflammation may have altered dopamine release at the ventral striatum. This hypothesis has been tested in rodents, wherein systemic inflammation resulted in (A) abnormal dopaminergic tone at the ventral striatum [3], and in humans, whose presynaptic dopaminergic sythesis and release is reduced after systemic inflammation [4].</p>
<h2 id="some-interesting-references-made-by-harrison-et-al-1">Some interesting references made by Harrison et al. [1]</h2>
<ul>
<li>Some cytokines, such as IFN-$\alpha$ inhibit dopamine synthesis by limiting the amount of tetrahydrobiopterin in the CNS [5].
<ul>
<li>Tetrahydrobiopterin is an essential cofactor for the rate limiting enzyme of dopamine synthesis: tyrosine hydroxylase</li>
</ul>
</li>
<li>Inflammation can increase the expression of monoamine transporters [6-8] and indoleamine 2,3-dioxygenase (a tryptophan-degrading enzyme) [6]</li>
</ul>
<h1 id="references">References</h1>
<ol>
<li>Harrison et al. (2016) A Neurocomputational Account of How Inflammation Enhances Sensitivity to Punishments Versus Rewards. <em>Biol Psychiatry</em>. 80:73-81</li>
<li>Pessiglione et al. (2006) Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans. <em>Nature</em>. 442:1042-45</li>
<li>Borowski et al. (1998) Lipopolysaccharide, central in vivo biogenic amine variations, and anhedonia. <em>Neuroreport</em>. 9:3797-3802</li>
<li>Capuron et al. (2012) Dopaminergic mechanisms of reduced basal ganglia responses to hedonic reward during interferon alfa administration. <em>Arch Gen Psychiatry</em>. 69:1044-1053</li>
<li>Kitagami et al. (2003) Mechanism of systemically injected interferon-alpha impeding monoamine biosynthesis in rats: Role of nitric oxide as a signal crossing the blood-brain barrier. <em>Brain Res</em>. 978: 104-114</li>
<li>Felger et al. (2012) Cytokine effects on the basal ganglia and dopamine function: The subcortical source of inflammatory malaise. <em>Front Neuroendocrinol</em>. 33:315-327=</li>
<li>Kamata et al. (2000) Effect of single intracerebrovascular injection of alpha-interferon on monoamine concentrations in the rat brain. <em>Eur Neuropsychopharmacol</em> 10:129-132</li>
<li>Shuto et al. (1997) Repeated interferon-alpha administration inhibits dopaminergic neural activity in the mouse brain. <em>Brain Res</em> 747:348-351</li>
<li>Dantzer et al. (2008) From inflammation to sickness and depression: When the immune system subjugates the brain. <em>Nat Rev Neurosci</em> 9:46-56</li>
<li>Dowlati et al. (2010) A meta-analysis of cytokines in major depression. <em>Biol Psychiatry</em>. 67:446-457</li>
<li>Seligman, ME (1972) Learned helplessness. <em>Annu Rev Med</em>. 23:407-412</li>
<li>Huys et al. (2013) Mapping anhedonia onto reinforcement learning: A behavioural meta-analysis. <em>Biol Mood Anxiety Disord</em> 3:1-16</li>
</ol>
Fri, 16 Sep 2016 00:00:00 +0000
https://abrahamnunes.github.io/2016/09/16/Inflammation-Reward-Sensitivity.html
https://abrahamnunes.github.io/2016/09/16/Inflammation-Reward-Sensitivity.htmlcomputational_psychiatrypaper_reviewreinforcement_learningneurobiologyA simple example of model fitting with `fitr`<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]}
});
</script>
<script type="text/javascript" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<h2 id="introduction">Introduction</h2>
<p>Here I present a simple example of how to fit reinforcement learning models to behavioural data using our <a href="https://abrahamnunes.github.io/fitr"><code class="highlighter-rouge">fitr</code> package</a> which is currently still under extensive development. The plots generated herein use code I have compiled into a toolbox called <a href="https://abrahamnunes.github.io/pqplot"><code class="highlighter-rouge">pqplot</code></a>.</p>
<p>This post won’t review how tasks are structured. Rather, it will show the simplest example of model fitting, assuming that the tasks and likelihood functions are built in to the <code class="highlighter-rouge">fitr</code> toolbox.</p>
<h2 id="generating-synthetic-data-from-a-go-nogo-task">Generating Synthetic Data from a Go-Nogo Task</h2>
<p>Unless you have already collected behavioural data from real subjects, we can simulate some subjects on a task in order to generate synthetic data. We will use a simple two state, two action task with the following reward structure:</p>
<table>
<thead>
<tr>
<th> </th>
<th>Action 1</th>
<th>Action 2</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>State 1</strong></td>
<td><code class="highlighter-rouge">1*Binomial(n=1, p=0.7)</code></td>
<td><code class="highlighter-rouge">-1*Binomial(n=1, p=0.7)</code></td>
</tr>
<tr>
<td><strong>State 2</strong></td>
<td><code class="highlighter-rouge">-1*Binomial(n=1, p=0.7)</code></td>
<td><code class="highlighter-rouge">1*Binomial(n=1, p=0.7)</code></td>
</tr>
</tbody>
</table>
<p>This can be thought of as a go-nogo task with two states. There will be 200 trials in the task, and the transition probability across trials (i.e. $P(s_1 \to s_2)$ or $P(s_2 \to s_1)$) is 0.5. These task parameters are specified as follows:</p>
<div class="language-matlab highlighter-rouge"><pre class="highlight"><code>
<span class="c1">% Task parameters</span>
<span class="n">taskparams</span><span class="o">.</span><span class="n">ntrials</span> <span class="o">=</span> <span class="mi">200</span><span class="p">;</span>
<span class="n">taskparams</span><span class="o">.</span><span class="n">nstates</span> <span class="o">=</span> <span class="mi">2</span><span class="p">;</span>
<span class="n">taskparams</span><span class="o">.</span><span class="n">nactions</span> <span class="o">=</span> <span class="mi">2</span><span class="p">;</span>
<span class="n">taskparams</span><span class="o">.</span><span class="n">preward</span> <span class="o">=</span> <span class="p">[</span><span class="mf">0.7</span><span class="p">,</span> <span class="mf">0.7</span><span class="p">;</span> <span class="mf">0.7</span><span class="p">,</span> <span class="mf">0.7</span><span class="p">];</span>
<span class="n">taskparams</span><span class="o">.</span><span class="n">rewards</span> <span class="o">=</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span> <span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">];</span>
<span class="n">taskparams</span><span class="o">.</span><span class="n">ptrans</span> <span class="o">=</span> <span class="p">[</span><span class="mf">0.5</span><span class="p">;</span> <span class="mf">0.5</span><span class="p">];</span>
</code></pre>
</div>
<p>Now we will generate a group of subjects. The subjects’ learning rule will be a simple Rescorla-Wagner rule with learning rate $\alpha$ and softmax inverse temperature $\beta$:</p>
<script type="math/tex; mode=display">Q_t(s_t, a_t; \alpha) = (1-\alpha)Q_{t-1}(s_t, a_t; \alpha) + \alpha \Big(r_t - Q_{t-1}(s_t, a_t; \alpha)\Big)</script>
<script type="math/tex; mode=display">P(a_t|Q_t(s_t, a_t); \beta) = \frac{e^{\beta Q_t(s_t, a_t; \alpha)}}{\sum_{a'} e^{\beta Q_t(s_t, a'; \alpha)}}</script>
<p>For the present example, the learning rule is encoded in the task code, which we will present below. Setting up a group of subjects is straightforward: just specify the number of subjects, initialize an $N_{subject} \times K_{parameter}$ array within the subject structure, and populate each column with parameters from the distributions of your choice. Here, we use a beta distribution for the learning rate and a gamma distribution for the inverse temperature.</p>
<div class="language-matlab highlighter-rouge"><pre class="highlight"><code><span class="n">subjects</span><span class="o">.</span><span class="n">N</span> <span class="o">=</span> <span class="mi">50</span><span class="p">;</span>
<span class="n">subjects</span><span class="o">.</span><span class="n">params</span> <span class="o">=</span> <span class="nb">zeros</span><span class="p">(</span><span class="n">subjects</span><span class="o">.</span><span class="n">N</span><span class="p">,</span> <span class="mi">2</span><span class="p">);</span>
<span class="n">subjects</span><span class="o">.</span><span class="n">params</span><span class="p">(:,</span><span class="mi">1</span><span class="p">)</span> <span class="o">=</span> <span class="n">betarnd</span><span class="p">(</span><span class="mf">1.1</span><span class="p">,</span> <span class="mf">1.1</span><span class="p">,</span> <span class="p">[</span><span class="n">subjects</span><span class="o">.</span><span class="n">N</span><span class="p">,</span> <span class="mi">1</span><span class="p">]);</span> <span class="c1">%learning rate</span>
<span class="n">subjects</span><span class="o">.</span><span class="n">params</span><span class="p">(:,</span><span class="mi">2</span><span class="p">)</span> <span class="o">=</span> <span class="n">gamrnd</span><span class="p">(</span><span class="mf">5.</span><span class="p">,</span> <span class="mf">1.</span><span class="p">,</span> <span class="p">[</span><span class="n">subjects</span><span class="o">.</span><span class="n">N</span><span class="p">,</span> <span class="mi">1</span><span class="p">]);</span> <span class="c1">%inverse temperature</span>
</code></pre>
</div>
<p>The current task we are implementing can be fouind in the <code class="highlighter-rouge">gonogobandit.m</code> class within the <code class="highlighter-rouge">fitr-matlab/tasks</code> folder. To generate data, simply run</p>
<div class="language-matlab highlighter-rouge"><pre class="highlight"><code><span class="n">results</span> <span class="o">=</span> <span class="n">gonogobandit</span><span class="o">.</span><span class="n">vanilla</span><span class="p">(</span><span class="n">subjects</span><span class="p">,</span> <span class="n">taskparams</span><span class="p">);</span>
</code></pre>
</div>
<p>The <code class="highlighter-rouge">results</code> structure is as follows</p>
<div class="highlighter-rouge"><pre class="highlight"><code>results =
1x50 struct array with fields:
S
rpe
A
R
</code></pre>
</div>
<p>where <code class="highlighter-rouge">results(i).S</code> is a vector of states for subject $i$, <code class="highlighter-rouge">results(i).A</code> is a vector of actions, <code class="highlighter-rouge">results(i).R</code> is a vector of rewards received, and <code class="highlighter-rouge">results(i).rpe</code> is a vector of reward prediction errors. Only <code class="highlighter-rouge">S</code>, <code class="highlighter-rouge">A</code>, and <code class="highlighter-rouge">R</code> are required for model fitting, though.</p>
<h2 id="composing-models">Composing models</h2>
<p>Models are simple structures with likelihood functions and the respective parameters required for those likelihood functions. Parameters in this example are specified from built-in functions that return a structure with fields <code class="highlighter-rouge">name</code> and <code class="highlighter-rouge">rng</code>. The <code class="highlighter-rouge">rng</code> field can take values of “unit” (meaning interval from [0, 1]), “pos” (meaning interval from [0, +infinity)), or otherwise if specified by the user. These parameter generating functions are found in <code class="highlighter-rouge">fitr-matlab/utils/rlparam.m</code> class.</p>
<p>Let’s generate 2 models:</p>
<div class="language-matlab highlighter-rouge"><pre class="highlight"><code><span class="n">model1</span><span class="o">.</span><span class="n">lik</span> <span class="o">=</span> <span class="o">@</span><span class="n">gnbanditll</span><span class="o">.</span><span class="n">lrbeta</span><span class="p">;</span>
<span class="n">model1</span><span class="o">.</span><span class="n">param</span> <span class="o">=</span> <span class="n">rlparam</span><span class="o">.</span><span class="n">learningrate</span><span class="p">();</span>
<span class="n">model1</span><span class="o">.</span><span class="n">param</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span> <span class="o">=</span> <span class="n">rlparam</span><span class="o">.</span><span class="n">inversetemp</span><span class="p">();</span>
<span class="n">model2</span><span class="o">.</span><span class="n">lik</span> <span class="o">=</span> <span class="o">@</span><span class="n">gnbanditll</span><span class="o">.</span><span class="n">lrbetarho</span><span class="p">;</span>
<span class="n">model2</span><span class="o">.</span><span class="n">param</span> <span class="o">=</span> <span class="n">rlparam</span><span class="o">.</span><span class="n">learningrate</span><span class="p">();</span>
<span class="n">model2</span><span class="o">.</span><span class="n">param</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span> <span class="o">=</span> <span class="n">rlparam</span><span class="o">.</span><span class="n">inversetemp</span><span class="p">();</span>
<span class="n">model2</span><span class="o">.</span><span class="n">param</span><span class="p">(</span><span class="mi">3</span><span class="p">)</span> <span class="o">=</span> <span class="n">rlparam</span><span class="o">.</span><span class="n">rewardsensitivity</span><span class="p">();</span>
<span class="n">model3</span><span class="o">.</span><span class="n">lik</span> <span class="o">=</span> <span class="o">@</span><span class="n">gnbanditll</span><span class="o">.</span><span class="n">lr2beta</span><span class="p">;</span>
<span class="n">model3</span><span class="o">.</span><span class="n">param</span> <span class="o">=</span> <span class="n">rlparam</span><span class="o">.</span><span class="n">learningrate</span><span class="p">();</span>
<span class="n">model3</span><span class="o">.</span><span class="n">param</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span> <span class="o">=</span> <span class="n">rlparam</span><span class="o">.</span><span class="n">learningrate</span><span class="p">();</span>
<span class="n">model3</span><span class="o">.</span><span class="n">param</span><span class="p">(</span><span class="mi">3</span><span class="p">)</span> <span class="o">=</span> <span class="n">rlparam</span><span class="o">.</span><span class="n">inversetemp</span><span class="p">();</span>
<span class="n">model4</span><span class="o">.</span><span class="n">lik</span> <span class="o">=</span> <span class="o">@</span><span class="n">gnbanditll</span><span class="o">.</span><span class="n">lr2betarho</span><span class="p">;</span>
<span class="n">model4</span><span class="o">.</span><span class="n">param</span> <span class="o">=</span> <span class="n">rlparam</span><span class="o">.</span><span class="n">learningrate</span><span class="p">();</span>
<span class="n">model4</span><span class="o">.</span><span class="n">param</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span> <span class="o">=</span> <span class="n">rlparam</span><span class="o">.</span><span class="n">learningrate</span><span class="p">();</span>
<span class="n">model4</span><span class="o">.</span><span class="n">param</span><span class="p">(</span><span class="mi">3</span><span class="p">)</span> <span class="o">=</span> <span class="n">rlparam</span><span class="o">.</span><span class="n">inversetemp</span><span class="p">();</span>
<span class="n">model4</span><span class="o">.</span><span class="n">param</span><span class="p">(</span><span class="mi">4</span><span class="p">)</span> <span class="o">=</span> <span class="n">rlparam</span><span class="o">.</span><span class="n">rewardsensitivity</span><span class="p">();</span>
<span class="n">model5</span><span class="o">.</span><span class="n">lik</span> <span class="o">=</span> <span class="o">@</span><span class="n">gnbanditll</span><span class="o">.</span><span class="n">randmodel</span><span class="p">;</span>
<span class="n">model5</span><span class="o">.</span><span class="n">param</span> <span class="o">=</span> <span class="n">rlparam</span><span class="o">.</span><span class="n">inversetemp</span><span class="p">();</span>
</code></pre>
</div>
<p>Note that each model’s likelihood function is specified in the <code class="highlighter-rouge">gnbanditll.m</code> class, located in <code class="highlighter-rouge">fitr-matlab/likelihood_functions</code>.</p>
<h2 id="fitting-models">Fitting Models</h2>
<p>We can now run the model fitting procedures, which are drawn from Huys et al. (2011).</p>
<p>First, we specify the model fitting options, including <code class="highlighter-rouge">maxiters</code>, which is self-explanatory, <code class="highlighter-rouge">nstarts</code>, which specifies the number of random parameter initializations for optimization, and <code class="highlighter-rouge">climit</code>, which specifies the stopping criterion. Model fitting stops when the current fit’s log-posterior probability is within <code class="highlighter-rouge">climit</code> absolute difference from the last fit iteration.</p>
<p>Model fitting is done with the <code class="highlighter-rouge">fitmodel()</code> function, which accepts the experimental results the model specification, and options as arguments.</p>
<div class="language-matlab highlighter-rouge"><pre class="highlight"><code><span class="n">fitoptions</span><span class="o">.</span><span class="n">maxiters</span> <span class="o">=</span> <span class="mi">1000</span><span class="p">;</span>
<span class="n">fitoptions</span><span class="o">.</span><span class="n">nstarts</span> <span class="o">=</span> <span class="mi">2</span><span class="p">;</span>
<span class="n">fitoptions</span><span class="o">.</span><span class="n">climit</span> <span class="o">=</span> <span class="mi">10</span><span class="p">;</span>
<span class="n">fit1</span> <span class="o">=</span> <span class="n">fitmodel</span><span class="p">(</span><span class="n">results</span><span class="p">,</span> <span class="n">model1</span><span class="p">,</span> <span class="n">fitoptions</span><span class="p">);</span>
<span class="n">fit2</span> <span class="o">=</span> <span class="n">fitmodel</span><span class="p">(</span><span class="n">results</span><span class="p">,</span> <span class="n">model2</span><span class="p">,</span> <span class="n">fitoptions</span><span class="p">);</span>
<span class="n">fit3</span> <span class="o">=</span> <span class="n">fitmodel</span><span class="p">(</span><span class="n">results</span><span class="p">,</span> <span class="n">model3</span><span class="p">,</span> <span class="n">fitoptions</span><span class="p">);</span>
<span class="n">fit4</span> <span class="o">=</span> <span class="n">fitmodel</span><span class="p">(</span><span class="n">results</span><span class="p">,</span> <span class="n">model4</span><span class="p">,</span> <span class="n">fitoptions</span><span class="p">);</span>
<span class="n">fit5</span> <span class="o">=</span> <span class="n">fitmodel</span><span class="p">(</span><span class="n">results</span><span class="p">,</span> <span class="n">model5</span><span class="p">,</span> <span class="n">fitoptions</span><span class="p">);</span>
</code></pre>
</div>
<h2 id="model-selection">Model Selection</h2>
<p>We implement the Bayesian Model Selection of Rigoux et al. (2014), the code of which was drawn from Samuel Gershman’s []<code class="highlighter-rouge">mfit</code> package](https://github.com/sjgershm/mfit). It can be implemented as follows:</p>
<div class="language-matlab highlighter-rouge"><pre class="highlight"><code>
<span class="n">models</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span><span class="o">.</span><span class="n">name</span> <span class="o">=</span> <span class="s1">'Model 1'</span><span class="p">;</span>
<span class="n">models</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span><span class="o">.</span><span class="n">fit</span> <span class="o">=</span> <span class="n">fit1</span><span class="p">;</span>
<span class="n">models</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span><span class="o">.</span><span class="n">name</span> <span class="o">=</span> <span class="s1">'Model 2'</span><span class="p">;</span>
<span class="n">models</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span><span class="o">.</span><span class="n">fit</span> <span class="o">=</span> <span class="n">fit2</span><span class="p">;</span>
<span class="n">models</span><span class="p">(</span><span class="mi">3</span><span class="p">)</span><span class="o">.</span><span class="n">name</span> <span class="o">=</span> <span class="s1">'Model 3'</span><span class="p">;</span>
<span class="n">models</span><span class="p">(</span><span class="mi">3</span><span class="p">)</span><span class="o">.</span><span class="n">fit</span> <span class="o">=</span> <span class="n">fit3</span><span class="p">;</span>
<span class="n">models</span><span class="p">(</span><span class="mi">4</span><span class="p">)</span><span class="o">.</span><span class="n">name</span> <span class="o">=</span> <span class="s1">'Model 4'</span><span class="p">;</span>
<span class="n">models</span><span class="p">(</span><span class="mi">4</span><span class="p">)</span><span class="o">.</span><span class="n">fit</span> <span class="o">=</span> <span class="n">fit4</span><span class="p">;</span>
<span class="n">models</span><span class="p">(</span><span class="mi">5</span><span class="p">)</span><span class="o">.</span><span class="n">name</span> <span class="o">=</span> <span class="s1">'Model 5'</span><span class="p">;</span>
<span class="n">models</span><span class="p">(</span><span class="mi">5</span><span class="p">)</span><span class="o">.</span><span class="n">fit</span> <span class="o">=</span> <span class="n">fit5</span><span class="p">;</span>
<span class="n">bms</span> <span class="o">=</span> <span class="n">BMS</span><span class="p">(</span><span class="n">models</span><span class="p">);</span>
</code></pre>
</div>
<h2 id="plotting-results">Plotting results</h2>
<p>We can now look at how well the model selection and parameter estimation procedures worked in this simple case:</p>
<p>Model selection results are as follows:</p>
<p><img src="/figures/rwmodelselres.svg" width="100%" /></p>
<p>The model selection metric we use from the above plot is the “Protected Exceedance Probability,” which was reviewed by Rigoux et al. (2014).</p>
<p>The model parameter fits are presented in the plot below. Here, each column is a model parameter, $\alpha$ being the learning rate and $\beta$ being the softmax parameter. Dark lines represent the actual parameter values (in unconstrained space), and light lines represent the parameter estimates (in unconstrained space). Note that Model 1, which was specified by the Bayesian Model Selection procedures as having the highest protected exceedance probability, has the best parameter fit overall across each parameter in the model.</p>
<p><img src="/figures/rwdemoaeplots.svg" width="100%" /></p>
<h2 id="next-steps">Next Steps</h2>
<p>I hope to build some documentation soon describing the overall structure of the <code class="highlighter-rouge">fitr</code> package, and I hope to expand the number of built in models. First, I am working on speeding up convergence and improving parameter estimates by allowing the model fitting procedure to make use of analytical gradients.</p>
<h2 id="references-and-further-reading">References and Further Reading</h2>
<ol>
<li>Huys, Q. J. M., Cools, R., Gölzer, M., Friedel, E., Heinz, A., Dolan, R. J., & Dayan, P. (2011). Disentangling the roles of approach, activation and valence in instrumental and pavlovian responding. PLoS Computational Biology, 7(4). http://doi.org/10.1371/journal.pcbi.1002028</li>
<li>Gershman, S. J. (2016). Empirical priors for reinforcement learning models. Journal of Mathematical Psychology, 71, 1–6. http://doi.org/10.1016/j.jmp.2016.01.006</li>
<li>Rigoux, L., Stephan, K. E., Friston, K. J., & Daunizeau, J. (2014). Bayesian model selection for group studies - Revisited. NeuroImage, 84, 971–985. http://doi.org/10.1016/j.neuroimage.2013.08.065</li>
<li>Daw, N. D. (2011). Trial-by-trial data analysis using computational models. Decision Making, Affect, and Learning: Attention and Performance XXIII, 1–26. http://doi.org/10.1093/acprof:oso/9780199600434.003.0001</li>
</ol>
Sat, 10 Sep 2016 00:00:00 +0000
https://abrahamnunes.github.io/2016/09/10/Simple-fitr-example.html
https://abrahamnunes.github.io/2016/09/10/Simple-fitr-example.htmlA medical objective function<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]}
});
</script>
<script type="text/javascript" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<p>The ultimate goal—or “objective function”—of any health-care professional is to maximize patients’ expected quality-weighted longevity. At this introductory stage, we are not assuming any particular measure of quality-weighted longevity (e.g. quality-adjusted life years, disability-adjusted life years, years lived with disability, etc.). Rather, we are simply assuming that the relevant outcome metric is some combination of quality of life and longevity. Since (A) one must often sacrifice quality of life for longevity or vice-versa, and (B) quality of life is naturally conceptualized as a measure spanning the 0 to 1 (or 0\% to 100\%) range, it is natural to construct the clinical objective function with the central feature of quality-weighted duration of life. We will denote quality-weighted longevity by $\mathcal{Q}$, and note that it is a function that depends on a set of variables corresponding to patient characteristics (we’ll denote the set of these variables as $\mathcal{D}$ for “data”), and a set of parameters $\boldsymbol\Theta$ which reflect the effects that each variable has on quality-weighted longevity. The function $\mathcal{Q}$ can thus be represented as follows:</p>
<script type="math/tex; mode=display">\mathcal{Q}(\mathcal{D}; \boldsymbol\Theta),</script>
<p>which can be decomposed into the quality-weight component, $\omega(t)$, and survival probability component $S(t)$, both of which are functions of time, $0 < t < T$, where $T$ denotes the individual’s time of death. In survival analysis, the survival function $S(t)$ is commonly defined as</p>
<script type="math/tex; mode=display">S(t) = P(T > t),</script>
<p>where $P(T>t)$ is the probability that an individual’s time of death $T$ will occur after time $t$. This can be intuitively understood by considering that at an initial measurement time $t=0$, the individual’s probability dying after that time is 100\%, but this probability decreases the further one looks ahead beyond $t = 0$.</p>
<p>Since both quality of life at some future time $\omega(t)$ and survival probability at some future time $S(t)$ are both uncertain, if the clinician seeks to maximize quality-weighted longevity, he or she must estimate the <em>net-present quality-weighted longevity</em>, which we will denote by $\mathcal{Q}_0$, where the subscript 0 denotes that the estimate is made at time $t=0$:</p>
<script type="math/tex; mode=display">\begin{equation}
\mathcal{Q}_0(\mathcal{D}; \boldsymbol\Theta) = \int_0^{\infty} \omega(t; \boldsymbol\theta_\omega) S(t; \boldsymbol\theta_S) \;\; dt.
\end{equation}</script>
<p>Although it is easy to formulate this equation for the general case at time $t$, we are only concerned with the $t=0$ case (i.e. at the present clinical interaction). This equation states that the individual’s expected total quality-weighted longevity as of time $t=0$ (the present) is equal to the product of quality and survival probability from the present onward. The subscripts on $\boldsymbol\theta_\omega$ and $\boldsymbol\theta_S$ denote the components of $\boldsymbol\Theta = \lbrace \boldsymbol\theta_\omega, \boldsymbol\theta_S \rbrace$ that parameterize the quality-weight function $\omega$ and the survival function $S$, respectively.</p>
<p>The estimate of $\mathcal{Q} _0$ is a random variable and so we must incorporate some measure of uncertainty. If one assumes that $\mathcal{Q} _0$ is normally distributed with mean $\mu _\mathcal{Q}$ and variance $\hat{\sigma} _\mathcal{Q}^2$,</p>
<script type="math/tex; mode=display">\mathcal{Q}_0 \sim \mathcal{N}(\mathcal{Q}_0 ; \mu_\mathcal{Q}, \sigma_\mathcal{Q}^2) = P(\mathcal{Q}_0|\mu_\mathcal{Q}, \sigma_\mathcal{Q}^2),</script>
<p>then the expected value of $\mathcal{Q}_0$, which we will denote by the conventional hat $\hat{\mathcal{Q}}_0$ would be computed as</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{aligned}
\hat{\mathcal{Q}}_0 & = \int P(\mathcal{Q}_0|\mu_\mathcal{Q}, \sigma_\mathcal{Q}^2) \;\; \mathcal{Q}_0(\mathcal{D}, \boldsymbol\Theta) \;\; d\mathcal{Q}_0 \\
& = \int P(\mathcal{Q}_0|\mu_\mathcal{Q}, \sigma_\mathcal{Q}^2) \Bigg( \int_0^{\infty} \omega(t; \boldsymbol\theta_\omega) S(t; \boldsymbol\theta_S) \;\; dt \Bigg) \;\; d\mathcal{Q}_0 \\
& = \Bigg\langle \int_0^{\infty} \omega(t; \boldsymbol\theta_\omega) S(t; \boldsymbol\theta_S) \;\; dt \Bigg\rangle_{P(\mathcal{Q}_0|\mu_\mathcal{Q}, \sigma_\mathcal{Q}^2)},
\end{aligned} %]]></script>
<p>where the angled brackets $\langle f(x) \rangle$ denote the expectation of $f(x)$ under its respective probability measure. There are two clinical goals that naturally emerge from this formulation. First, one seeks to accurately estimate $\hat{\mathcal{Q} _{0}}(\mathcal{D}; \boldsymbol\Theta)$, which requires inferring the parameters $\boldsymbol\Theta = \lbrace \boldsymbol\theta _{\omega}, \boldsymbol\theta _{S} \rbrace$ and minimizing $\sigma _{\mathcal{Q}}^2$. Plainly speaking, this amounts to accurately predicting the patient’s prognosis (which is determined by the parameters that govern the quality and survival functions, $\boldsymbol\Theta$ ), as well as reducing the uncertainty in that estimate (which can be done by minimizing $\sigma _{\mathcal{Q}}^2$ ). The second goal that emerges from this formulation is to develop some plan to maximize $\hat{\mathcal{Q} _0}(\mathcal{D}; \boldsymbol\Theta)$: that is, to intervene such that the patient’s predicted quality-weighted longevity is maximized. One can appreciate that prognosis after the clinical encounter (the estimate of $\hat{\mathcal{Q} _0}(\mathcal{D}; \boldsymbol\Theta)$) depends on management (i.e. the method by which $\hat{\mathcal{Q} _0}(\mathcal{D}; \boldsymbol\Theta)$ is maximized), which depends on diagnosis. These dependencies are depicted in the figure below. It can be appreciated that estimating and optimizing $\hat{\mathcal{Q} _0}(\mathcal{D}; \boldsymbol\Theta)$ is a complex process by which a clinician must continually evaluate diagnosis, management, and prognosis jointly.</p>
<p><img src="/figures/dx-px-mgmt.png" alt="The relationship between diagnosis, prognosis, and management." /></p>
<p>A natural approach to the above problem is one in which the clinician iteratively estimates and maximizes $\hat{\mathcal{Q}_0}(\mathcal{D}; \boldsymbol\Theta)$ over time as more information is gathered.</p>
Thu, 08 Sep 2016 00:00:00 +0000
https://abrahamnunes.github.io/2016/09/08/Medical-Objective-Function.html
https://abrahamnunes.github.io/2016/09/08/Medical-Objective-Function.htmlComputing gradients of reinforcement learning models for optimization<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]}
});
</script>
<script type="text/javascript" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<h2 id="introduction">Introduction</h2>
<p>When fitting reinforcement learning models to behavioural data the efficiency of optimization algorithms can be dramatically improved by providing the analytical gradients to the optimization function (e.g. <code class="highlighter-rouge">fminunc</code> if using Matlab, or <code class="highlighter-rouge">scipy.optimize.minimize</code> if using python, etc.).</p>
<h2 id="procedures">Procedures</h2>
<p>Consider the case when one seeks to find the maximum a posteriori estimate (MAP) for a vector of parameters $\boldsymbol\theta$</p>
<script type="math/tex; mode=display">\hat{\boldsymbol\theta} = \underset{\boldsymbol\theta}{\text{argmax}}\;\;P(\mathcal{A}|\boldsymbol\theta)P(\boldsymbol\theta|\boldsymbol\phi),</script>
<p>where $\mathcal{A}$ is a set of actions taken by a group of subjects, $\boldsymbol\theta$ are the model parameters, and $\boldsymbol\phi$ are the parameters of the prior distribution of $\boldsymbol\theta$. Since the logarithm is a monotonically increasing function, the above equation can be written as</p>
<script type="math/tex; mode=display">\hat{\boldsymbol\theta} = \underset{\boldsymbol\theta}{\text{argmax}}\;\; \log P(\mathcal{A}|\boldsymbol\theta) + \log P(\boldsymbol\theta|\boldsymbol\phi).</script>
<p>The right hand side of the above equation must be differentiated with respect to the parameters $\boldsymbol\theta$:</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{aligned}
\frac{\partial}{\partial \boldsymbol\theta} \log P(\mathcal{A}, \boldsymbol\theta | \boldsymbol\phi) & = \frac{\partial}{\partial \boldsymbol\theta} \Big[ \log P(\mathcal{A}|\boldsymbol\theta) + \log P(\boldsymbol\theta|\boldsymbol\phi) \Big] \\
& = \frac{\partial}{\partial \boldsymbol\theta} \log P(\mathcal{A}|\boldsymbol\theta) + \frac{\partial}{\partial \boldsymbol\theta} \log P(\boldsymbol\theta|\boldsymbol\phi),
\end{aligned} %]]></script>
<p>where the elements of the right hand side can be considered individually.</p>
<h2 id="the-prior-distribution">The Prior Distribution</h2>
<p>As per Huys et al. (2011), we assume that the parameters $\boldsymbol\theta$ are distributed according to a multivariate Gaussian with hyperparameters $\boldsymbol\phi = \lbrace \phi_\mu \phi_\Sigma \rbrace$, where $\phi_\mu$ is the prior mean, and $\phi_\Sigma$ is the prior covariance matrix:</p>
<script type="math/tex; mode=display">P(\boldsymbol\theta|\boldsymbol\phi) = \frac{1}{(2\pi)^{\frac{K}{2}}|\phi_\Sigma|^{\frac{1}{2}}} \exp \Big\lbrace - \frac{1}{2}(\boldsymbol\theta - \phi_\mu)^\top \phi_\Sigma ^{-1} (\boldsymbol\theta - \phi_\mu) \Big\rbrace,</script>
<p>where $K$ is the dimension of $\boldsymbol\theta$. For example, if the model you are fitting has parameters $\boldsymbol\theta = \lbrace \alpha, \beta, \omega \rbrace$, then $K=3$. Taking the log of the multivariate Gaussian probability density function (pdf) yields</p>
<script type="math/tex; mode=display">\log P(\boldsymbol\theta|\boldsymbol\phi) = -\frac{K}{2}\log 2\pi - \frac{1}{2} \log |\phi_\Sigma| - \frac{1}{2}(\boldsymbol\theta - \phi_\mu)^\top \phi_\Sigma ^{-1} (\boldsymbol\theta - \phi_\mu).</script>
<p>Taking the derivative, with respect to $\boldsymbol\theta$ proceeds as follows</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{aligned}
\frac{\partial}{\partial \boldsymbol\theta} \log P(\boldsymbol\theta|\boldsymbol\phi) & = \frac{\partial}{\partial \boldsymbol\theta} \Bigg[ -\frac{K}{2}\log 2\pi - \frac{1}{2} \log |\phi_\Sigma| - \frac{1}{2}(\boldsymbol\theta - \phi_\mu)^\top \phi_\Sigma ^{-1} (\boldsymbol\theta - \phi_\mu) \Bigg] \\
& = - \frac{\partial}{\partial \boldsymbol\theta} \frac{1}{2}(\boldsymbol\theta - \phi_\mu)^\top \phi_\Sigma ^{-1} (\boldsymbol\theta - \phi_\mu).
\end{aligned} %]]></script>
<p>I like to break things down to facilitate simpler notation and use a very simple looking chain rule. Let</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{aligned}
f & = - \frac{1}{2}(\boldsymbol\theta - \phi_\mu)^\top \phi_\Sigma ^{-1} (\boldsymbol\theta - \phi_\mu), \\
g & = (\boldsymbol\theta - \phi_\mu)^\top \phi_\Sigma ^{-1} (\boldsymbol\theta - \phi_\mu), \text{ and} \\
X & = (\boldsymbol\theta - \phi_\mu),
\end{aligned} %]]></script>
<p>and thus,</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{aligned}
f & = - \frac{1}{2}g, \\
g & = X^\top \phi_\Sigma ^{-1} X, \text{ and} \\
X & = (\boldsymbol\theta - \phi_\mu),
\end{aligned} %]]></script>
<p>The chain rule is then easily represented as</p>
<script type="math/tex; mode=display">\frac{\partial f}{\partial \boldsymbol\theta} = \frac{\partial f}{\partial g} \frac{\partial g}{\partial X} \frac{\partial X}{\partial \boldsymbol\theta},</script>
<p>and we can compute each derivative individually as follows</p>
<script type="math/tex; mode=display">\begin{aligned}
\frac{\partial f}{\partial g} = - \frac{1}{2}, \\
\frac{\partial g}{\partial X} = 2 \phi_\Sigma^{-1}X \\
\frac{\partial X}{\partial \boldsymbol\theta} = 1.
\end{aligned}</script>
<p>Combining these, we find that the derivative of the prior distribution with respect to the parameters $\boldsymbol\theta$, when distributed according to a multivariate Gaussian, is</p>
<script type="math/tex; mode=display">\frac{\partial}{\partial \boldsymbol\theta} \log P(\boldsymbol\theta|\boldsymbol\phi) = - \phi_\Sigma^{-1} (\boldsymbol\theta - \phi_\mu).</script>
<h3 id="the-log-likelihood-function">The Log-Likelihood Function</h3>
<p>The log-likelihood function is perhaps the more interesting derivative to compute because it will vary based on the model and the number of parameters in each model. A RL model consists of two components: an <em>observation model</em> and a <em>learning model</em>. The observation model</p>
<h3 id="observation-model">Observation Model</h3>
<p>The observation model consists of the process by which the agent selects actions. Classically—and herein—we implement the observation model as a softmax function characterizing the probability of action $a_t$ given the current state of the learning model, $\mathcal{Q}_t(s_t, a_t)$ and the model parameters $\boldsymbol\theta$:</p>
<script type="math/tex; mode=display">P(a _t | \mathcal{Q} _t(s _t, a_t); \boldsymbol\theta) = \frac{e^{\beta \mathcal{Q} _t(s _t, a _t)}}{\sum _{a'} e^{\beta \mathcal{Q} _t(s _t, a')}}</script>
<p>The probability of the entire series of actions selected by subject $i$ is</p>
<script type="math/tex; mode=display">P(\lbrace a _t \rbrace _{t = 1}^T | \lbrace \mathcal{Q} _t(s _t, a _t) \rbrace _{t = 1}^T; \boldsymbol\theta).</script>
<p>Assuming that if conditioned by $\mathcal{Q}$, the actions are independent across trials, one can express this as the product of individual observation probabilities at each time $t$</p>
<script type="math/tex; mode=display">P(\lbrace a _t \rbrace _{t = 1}^T | \lbrace \mathcal{Q} _t(s _t, a _t) \rbrace _{t = 1}^T; \boldsymbol\theta) = \prod _{t = 1}^T P( a _t | \mathcal{Q} _t(s _t, a _t) ; \boldsymbol\theta).</script>
<p>Taking the log of both sides,</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{aligned}
\log P(\lbrace a _t \rbrace _{t = 1}^T | \lbrace \mathcal{Q} _t(s _t, a _t) \rbrace _{t = 1}^T; \boldsymbol\theta) & = \sum _{t = 1}^T \log P( a _t | \mathcal{Q} _t(s _t, a _t) ; \boldsymbol\theta) \\\\
& = \sum _{t = 1}^T \Big[ \beta \mathcal{Q} _t(s _t, a _t; \boldsymbol\theta _{\mathcal{Q}}) - \log \sum _{a'} e^{\beta \mathcal{Q} _t(s _t, a'; \boldsymbol\theta _{\mathcal{Q}})} \Big]
\end{aligned} %]]></script>
<p>we have the log-likelihood of the participants’ actions, which we would like to maximize with respect to the parameters $\boldsymbol\theta$:</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{aligned}
\frac{\partial}{\partial \boldsymbol\theta} \log P(\lbrace a _t \rbrace _{t = 1}^T | \lbrace \mathcal{Q} _t(s _t, a _t) \rbrace _{t = 1}^T; \boldsymbol\theta) & = \frac{\partial}{\partial \boldsymbol\theta} \sum _{t = 1}^T \log P( a _t | \mathcal{Q} _t(s _t, a _t) ; \boldsymbol\theta) \\\\
& = \sum _{t = 1}^T \frac{\partial}{\partial \boldsymbol\theta} \log P( a _t | \mathcal{Q} _t(s _t, a _t) ; \boldsymbol\theta) \\\\
& = \sum _{t = 1}^T \frac{\partial}{\partial \boldsymbol\theta} \Big[ \beta \mathcal{Q} _t(s _t, a _t;\boldsymbol\theta _{\mathcal{Q}}) - \log \sum _{a'} e^{\beta \mathcal{Q} _t(s _t, a'; \boldsymbol\theta _{\mathcal{Q}})} \Big] \\\\
& = \sum _{t = 1}^T \Big[ \frac{\partial}{\partial \boldsymbol\theta} \beta \mathcal{Q} _t(s _t, a _t; \boldsymbol\theta _{\mathcal{Q}}) - \frac{\partial}{\partial \boldsymbol\theta} \log \sum _{a'} e^{\beta \mathcal{Q} _t(s _t, a'; \boldsymbol\theta _{\mathcal{Q}})} \Big] \\\\
& = \sum _{t = 1}^T \Big[ \frac{\partial}{\partial \boldsymbol\theta} \beta \mathcal{Q} _t(s _t, a _t; \boldsymbol\theta _{\mathcal{Q}}) - \frac{e^{\beta \mathcal{Q} _t(s _t, \lbrace a' \rbrace _{k = 1}^K; \boldsymbol\theta _{\mathcal{Q}})}}{\sum _{a'} e^{\beta \mathcal{Q} _t(s _t, a'; \boldsymbol\theta _{\mathcal{Q}})}} \frac{\partial}{\partial \boldsymbol\theta} \beta \mathcal{Q} _t(s _t, \lbrace a' \rbrace _{k = 1}^K; \boldsymbol\theta _{\mathcal{Q}}) \Big] \\\\
& = \sum _{t = 1}^T \Big[ \frac{\partial}{\partial \boldsymbol\theta} \beta \mathcal{Q} _t(s _t, a _t; \boldsymbol\theta _{\mathcal{Q}}) - \sum _{a'} P(a' | \mathcal{Q} _t(s _t, a'), \boldsymbol\theta) \frac{\partial}{\partial \boldsymbol\theta} \beta \mathcal{Q} _t(s _t, a'; \boldsymbol\theta _{\mathcal{Q}}) \Big] \\\\
\end{aligned} %]]></script>
<h3 id="a-rescorla-wagner-learning-rule">A Rescorla-Wagner Learning Rule</h3>
<p>Consider the following learning rule</p>
<script type="math/tex; mode=display">\mathcal{Q}_{t}(s_t, a_t; \alpha) = (1-\alpha) \mathcal{Q}_{t-1}(s_t, a_t; \alpha) + \alpha \Big(r_t - \mathcal{Q}_{t-1}(s_t, a_t; \alpha)\Big).</script>
<p>We will be computing the derivatives of our RL model using this Rescorla-Wagner learning rule.</p>
<h3 id="specific-derivatives">Specific Derivatives</h3>
<p>Now that we have specified the base form of the derivative of the observation model and a basic learning model with respect to the parameter vector $\boldsymbol\theta$, we can compute the derivatives with respect to the individual parameters.</p>
<p>For notational simplicity, let</p>
<script type="math/tex; mode=display">\log P(\lbrace a _t \rbrace _{t = 1}^T | \lbrace \mathcal{Q} _t(s _t, a _t) \rbrace _{t = 1}^T; \boldsymbol\theta) = \log \mathcal{L}(\boldsymbol\theta)</script>
<p>Then the derivative with respect to the learning rate $\alpha$ is</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{aligned}
\frac{\partial \log \mathcal{L}(\boldsymbol\theta)}{\partial \alpha} & = \sum _{t = 1}^T \Bigg[ \beta \frac{\partial}{\partial \alpha} \mathcal{Q} _t(s _t, a _t; \boldsymbol\theta _{\mathcal{Q}}) - \sum _{a'} P(a' | \mathcal{Q} _t(s _t, a'), \boldsymbol\theta) \beta \frac{\partial}{\partial \alpha} \mathcal{Q} _t(s _t, a'; \boldsymbol\theta _{\mathcal{Q}}) \Bigg], \\
\end{aligned} %]]></script>
<p>where</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{aligned}
\frac{\partial}{\partial \alpha} \mathcal{Q} _t(s _t, a _t; \boldsymbol\theta _{\mathcal{Q}}) & = \frac{\partial}{\partial \alpha} \Bigg[ (1-\alpha)\mathcal{Q}_{t-1}(s_t, a_t; \alpha) + r_t \Bigg] \\
& = \frac{\partial}{\partial \alpha} (1-\alpha)\mathcal{Q}_{t-1}(s_t, a_t; \alpha) + \frac{\partial}{\partial \alpha} r_t, \\
\end{aligned} %]]></script>
<p>and using the product rule,</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{aligned}
\frac{\partial}{\partial \alpha} \mathcal{Q} _t(s _t, a _t; \boldsymbol\theta _{\mathcal{Q}}) & = \Bigg[ \Bigg(\frac{\partial}{\partial \alpha} (1-\alpha) \Bigg) + (1-\alpha) \Bigg( \frac{\partial}{\partial \alpha} \mathcal{Q}_{t-1}(s_t, a_t; \alpha) \Bigg) \Bigg] + r_t \\
& = (1-\alpha)\mathcal{Q}_{t-1}(s_t, a_t; \alpha) + \Big(r_t - \mathcal{Q}_{t-1}(s_t, a_t; \alpha) \Big)
\end{aligned} %]]></script>
<p>The derivative with respect to the inverse temperature $\beta$ is</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{aligned}
\frac{\partial \log \mathcal{L}(\boldsymbol\theta)}{\partial \beta} & = \sum _{t = 1}^T \Big[ \frac{\partial}{\partial \beta} \beta \mathcal{Q}_t(s _t, a _t; \boldsymbol\theta _{\mathcal{Q}}) - \sum _{a'} P(a' | \mathcal{Q} _t(s _t, a'), \boldsymbol\theta) \frac{\partial}{\partial \beta} \beta \mathcal{Q} _t(s _t, a'; \boldsymbol\theta _{\mathcal{Q}}) \Big] \\
& = \sum _{t = 1}^T \Big[ \mathcal{Q} _t(s _t, a _t; \boldsymbol\theta _{\mathcal{Q}}) - \sum _{a'} P(a' | \mathcal{Q}_t(s _t, a'), \boldsymbol\theta) \mathcal{Q} _t(s _t, a'; \boldsymbol\theta _{\mathcal{Q}}) \Big]. \\
\end{aligned} %]]></script>
<h3 id="combining-the-derivatives-of-the-likelihood-and-priors">Combining the Derivatives of the Likelihood and Priors</h3>
<script type="math/tex; mode=display">% <![CDATA[
\begin{aligned}
\frac{\partial}{\partial \boldsymbol\theta} \log P(\mathcal{A}, \boldsymbol\theta | \boldsymbol\phi) & = \frac{\partial}{\partial \boldsymbol\theta} \log P(\mathcal{A}|\boldsymbol\theta) + \frac{\partial}{\partial \boldsymbol\theta} \log P(\boldsymbol\theta|\boldsymbol\phi) \\
& = \left[ {\begin{array}{c} \frac{\partial }{\partial \alpha} \log \mathcal{L}(\boldsymbol\theta) \\ \frac{\partial }{\partial \beta} \log \mathcal{L}(\boldsymbol\theta) \end{array}} \right] - \phi_\Sigma^{-1} (\boldsymbol\theta - \phi_\mu)
\end{aligned} %]]></script>
<p>If using a function such as <code class="highlighter-rouge">fminunc</code>, the objective function would return both the log posterior, as well as its gradient. For instance, as follows:</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{aligned}
& \text{function} \;\; \Bigg[ \log P(\mathcal{A}, \boldsymbol\theta | \boldsymbol\phi), \;\; \frac{\partial}{\partial \boldsymbol\theta} \log P(\mathcal{A}, \boldsymbol\theta | \boldsymbol\phi) \Bigg] = \text{logposterior}(\boldsymbol\theta) \\
& ... \\
& \text{end}
\end{aligned} %]]></script>
<p>Then one could apply <code class="highlighter-rouge">fminunc</code> as usual</p>
<div class="language-matlab highlighter-rouge"><pre class="highlight"><code><span class="p">[</span><span class="n">theta</span><span class="p">,</span> <span class="nb">varargout</span><span class="p">]</span> <span class="o">=</span> <span class="n">fminunc</span><span class="p">(</span><span class="n">logposterior</span><span class="p">,</span> <span class="n">theta_initial</span><span class="p">,</span> <span class="n">options</span><span class="p">)</span>
</code></pre>
</div>
Wed, 07 Sep 2016 00:00:00 +0000
https://abrahamnunes.github.io/2016/09/07/Computing-Gradient-RL-Model.html
https://abrahamnunes.github.io/2016/09/07/Computing-Gradient-RL-Model.htmlcomputational_psychiatryreinforcement_learningImplementing the Izhikevich Neuron in R<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]}
});
</script>
<script type="text/javascript" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<p>The following is an implementation of the Izhikevich Neuron in R. More details can be found at <a href="http://www.izhikevich.org/publications/spikes.htm">Eugene Izhikevich’s webpage</a>.</p>
<p>Essentially, the Izhikevich model seeks to balance realistic modeling of neuronal membrane dynamics with computational efficiency. The model is a set of coupled differential equations:</p>
<script type="math/tex; mode=display">\frac{dv}{dt} = 0.04v^{2} + 5v + 140 - u + I</script>
<p>and</p>
<script type="math/tex; mode=display">\frac{du}{dt} = a(bv - u).</script>
<p>Where</p>
<ul>
<li>$v$ is a dimensionless variable representing membrane potential</li>
<li>$u$ is a dimensionless variable representing membrane potential recovery</li>
<li>$a$ is a dimensionless parameter representing the time scale of $u$</li>
<li>$b$ is a dimensionless parameter representing the sensitivity of $u$ to subthreshold fluctuations in $v$</li>
<li>$c$ is a dimensionless parameter representing the after-spike reset value of $v$ (caused by fast high-threshold $K^{+}$ conductance)</li>
<li>$d$ is a dimensionless parameter representing the after-spike reset value of $u$ (caused by slow high-threshold $Na^{+}$ and $K^{+}$ conductances)</li>
<li>$I$ is an externally applied current</li>
</ul>
<p>Setting the parameters $a = 0.02$, $b = 0.2$, $c = -65$, and $d = 2$, we will generate a fast-spiking neuron simulation.</p>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1">#SPECIFY PARAMETERS
</span><span class="n">nsteps</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">500</span><span class="w"> </span><span class="c1">#number of time steps over which to integrate
</span><span class="n">a</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0.02</span><span class="w">
</span><span class="n">b</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0.2</span><span class="w">
</span><span class="n">c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">-65</span><span class="w">
</span><span class="n">d</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">2</span><span class="w">
</span><span class="n">I</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">matrix</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="n">nrow</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">nsteps</span><span class="p">,</span><span class="w"> </span><span class="n">ncol</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="p">)</span><span class="w">
</span><span class="n">I</span><span class="p">[</span><span class="m">100</span><span class="o">:</span><span class="m">400</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">6</span><span class="w">
</span><span class="n">v_res</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">30</span><span class="w">
</span><span class="n">v</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">matrix</span><span class="p">(</span><span class="kc">NA</span><span class="p">,</span><span class="w"> </span><span class="n">nrow</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">nsteps</span><span class="p">,</span><span class="w"> </span><span class="n">ncol</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="p">)</span><span class="w">
</span><span class="n">u</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">matrix</span><span class="p">(</span><span class="kc">NA</span><span class="p">,</span><span class="w"> </span><span class="n">nrow</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">nsteps</span><span class="p">,</span><span class="w"> </span><span class="n">ncol</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="p">)</span><span class="w">
</span><span class="n">v</span><span class="p">[</span><span class="m">1</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">-70</span><span class="w">
</span><span class="n">u</span><span class="p">[</span><span class="m">1</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">b</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">v</span><span class="p">[</span><span class="m">1</span><span class="p">]</span><span class="w">
</span><span class="c1">#IZHIKEVICH NEURON FUNCTION
</span><span class="n">Izhikevich</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">v</span><span class="p">,</span><span class="w"> </span><span class="n">u</span><span class="p">,</span><span class="w"> </span><span class="n">a</span><span class="p">,</span><span class="w"> </span><span class="n">b</span><span class="p">,</span><span class="w"> </span><span class="n">c</span><span class="p">,</span><span class="w"> </span><span class="n">d</span><span class="p">,</span><span class="w"> </span><span class="n">I</span><span class="p">,</span><span class="w"> </span><span class="n">v_res</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">v</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="n">v_res</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="n">v</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">c</span><span class="w">
</span><span class="n">u</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">u</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">d</span><span class="w">
</span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="n">dv</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="m">0.04</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="p">(</span><span class="n">v</span><span class="o">^</span><span class="m">2</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="p">(</span><span class="m">5</span><span class="o">*</span><span class="n">v</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="m">140</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">u</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">I</span><span class="w">
</span><span class="n">v</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">v</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">dv</span><span class="w">
</span><span class="n">du</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="p">((</span><span class="n">b</span><span class="o">*</span><span class="n">v</span><span class="p">)</span><span class="o">-</span><span class="n">u</span><span class="p">)</span><span class="w">
</span><span class="n">u</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">u</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">du</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="nf">return</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="n">v</span><span class="p">,</span><span class="w"> </span><span class="n">u</span><span class="p">))</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="c1">#INTEGRATION VIA THE EULER METHOD
</span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">1</span><span class="o">:</span><span class="p">(</span><span class="n">nsteps</span><span class="m">-1</span><span class="p">))</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="n">res</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Izhikevich</span><span class="p">(</span><span class="n">v</span><span class="p">[</span><span class="n">i</span><span class="p">],</span><span class="w"> </span><span class="n">u</span><span class="p">[</span><span class="n">i</span><span class="p">],</span><span class="w"> </span><span class="n">a</span><span class="p">,</span><span class="w"> </span><span class="n">b</span><span class="p">,</span><span class="w"> </span><span class="n">c</span><span class="p">,</span><span class="w"> </span><span class="n">d</span><span class="p">,</span><span class="w"> </span><span class="n">I</span><span class="p">[</span><span class="n">i</span><span class="p">],</span><span class="w"> </span><span class="n">v_res</span><span class="p">)</span><span class="w">
</span><span class="n">v</span><span class="p">[</span><span class="n">i</span><span class="m">+1</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">min</span><span class="p">(</span><span class="n">res</span><span class="p">[</span><span class="m">1</span><span class="p">],</span><span class="w"> </span><span class="n">v_res</span><span class="p">)</span><span class="w">
</span><span class="n">u</span><span class="p">[</span><span class="n">i</span><span class="m">+1</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">res</span><span class="p">[</span><span class="m">2</span><span class="p">]</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="c1">#PLOT
</span><span class="n">plot</span><span class="p">(</span><span class="n">seq</span><span class="p">(</span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="n">nsteps</span><span class="p">,</span><span class="w"> </span><span class="m">1</span><span class="p">),</span><span class="w"> </span><span class="n">v</span><span class="p">,</span><span class="w">
</span><span class="n">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'l'</span><span class="p">,</span><span class="w">
</span><span class="n">main</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Izhikevich Neuron"</span><span class="p">,</span><span class="w">
</span><span class="n">xlab</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Time (ms)"</span><span class="p">,</span><span class="w">
</span><span class="n">ylab</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Membrane Potential (mV)"</span><span class="p">)</span></code></pre></figure>
<p><img src="/figures/unnamed-chunk-1-1.png" alt="plot of chunk unnamed-chunk-1" /></p>
<p>Of particular interest is that other neuron types can be simulated by adjusting the model parameters. For instance, setting the parameters $a = 0.02$, $b = 0.2$, $c = -50$, and $d = 2$ will generate a chattering neuron.</p>
<p><img src="/figures/unnamed-chunk-2-1.png" alt="plot of chunk unnamed-chunk-2" /></p>
<p>Parameters for further neuron types may be found at Eugene Izhikevich’s page (link above).</p>
Tue, 23 Jun 2015 00:00:00 +0000
https://abrahamnunes.github.io/2015/06/23/Implementing-the-Izhikevich-Neuron-in-R.html
https://abrahamnunes.github.io/2015/06/23/Implementing-the-Izhikevich-Neuron-in-R.htmlComputational_Neuroscience