Is your task really 2AFC?

Probably not.

You probably know that 2AFC stands for “two-alternative forced choice.” However, you may not know that this means 2AFC tasks involve the presentation of two stimuli on each trial!

Let’s say that you want your participants to distinguish between faces and houses. On each trial, you present an image and ask participants to decide if it’s a face or a house. This is a fantastic design, just not a 2AFC task. Rather, it’s a discrimination task though some psychophysicists would also call it a detection task. Likewise if your subjects are judging the direction of motion in a random dot kinematogram, or if they are deciding whether an item is old or new, or whenever they are judging a single stimulus per trial.

Example of tasks that are and aren't 2AFC.
Example of tasks that are and aren’t 2AFC.

So, what are 2AFC tasks? There are two variants: spatial 2AFC and temporal 2AFC (also called two-interval forced choice, or 2IFC). In the spatial 2AFC, on each trial you’d present a face and a house in different locations (usually left and right of fixation). In 2IFC tasks, on each trial you’d present first the face and then the house (or vice versa) in the same spatial location. In both cases, participants’ task is to determine which of the two stimuli was the face.

Despite my best efforts, I haven’t been able to find out how the confusion around what a 2AFC task is emerged. According to this Wikipedia page, Fechner developed 2AFC back in the 19th century. Confusing 1-stimulus discrimination tasks with 2AFC seems to have already been the rule by the middle of the 20th century (and perhaps much earlier). It appears to be the rule now, at least outside of the vision science world.

What is so special about 2AFC tasks? They are supposed to be ‘unbiased.’ In other words, subjects are often assumed not to have much of a bias between left/right or first/second interval decisions. Such lack of bias is great for measuring performance and generally makes distributional assumptions (like assuming Gaussians in SDT) unnecessary. The problem is this assumption has been shown to be empirically false (Yeshurun, Carrasco, & Maloney, 2008). Yeshurun and colleagues even recommend against using 2IFC tasks altogether.

A note on comparing 1-stimulus discrimination and 2AFC tasks: because 2AFC tasks present two stimuli on each trial, participants have more information to work with and thus are expected to perform better. In fact, their signal-to-noise ratio (d’) should be exactly √2 higher in 2AFC tasks. Empirically, however, it has been found to be both substantially higher or substantially lower. Math is prettier than actual human performance!

In the end, perhaps it’s not a bad thing if you’ve used a simple discrimination task in your experiment. Just don’t call it 2AFC.


Has the drift diffusion model made any new predictions?

The drift diffusion model (DDM) is one of the most well-known and widespread models of perceptual decision making. I also happen to think that it is wrong. The reasons for this opinion are many and varied, but I’ll keep those for another post. However, while the theoretical arguments for or again DDM are important, it is arguably even more important whether it “works”. Practically, there is a reason why DDM is so popular and it is that DDM provides very good fits to behavioral data. The problem is: it may very well be that the model fits well because it is overly flexible, and some researchers have made that argument before (Jones & Dzhafarov, 2014). The question of flexibility is also a tough one: all models are at least somewhat flexible and it’s hard to say when a model is “too” flexible. Again, I have opinions on this issue but others have equally strong opinions in the opposite direction, and the flexibility of the model is not what this post is about.

Of course, the issues of whether a model is a good approximation or reality and whether it is too flexible are not specific to DDM; the same issues come up with virtually any model in cognitive science. So, what would be a truly convincing evidence that a model is good? There is a pretty standard answer and it’s called predictions.

So, that leads us to the central question of this post: Has DDM made any new predictions?

I think that I’m reasonably aware of the literature on DDM and have read all major reviews from the last decade. There is a lot of talk about what DDM “explains” (read “fits”) but I haven’t seen a single case where DDM has made a new prediction that has subsequently been confirmed empirically. The timeline if of course critical here: historically, DDM has been modified a few times to fit several effects that had already been known in the literature, such as error trials being slower than correct trials in some conditions but faster in others. So, the question is whether DDM predicted any effects that were not known when it was developed. I would assume that a case like that would be well known, which makes me think that there is no such case.

But perhaps I just missed something. So, if I did, please let me know in the comments. I’ll update this blog post with a discussion of any predictions that anyone mentions in the comments (or via email, Twitter, etc).

Finally, should DDM be expected to make new predictions? Well, I think so. DDM is not a simple model. It postulates a specific structure for the mechanism underlying perceptual decision making and it has both choice and RT as outputs. If it indeed captures the process of perceptual decision making well, then it is hard to imagine that it wouldn’t make a prediction about an effect that we hadn’t already discovered.

So, has the drift diffusion model made any new predictions?


Based on discussions on Twitter, here are several predictions that have been mentioned. The Examples 1-2 from the section “Successful predictions?” and Examples 2-4 from the section Failed predictions come from this Twitter thread by @PeterKvam.

Successful predictions?

  1. “The distribution of evidence sampled by decision makers following a diffusion strategy should not match the true distribution of evidence – it should be more extreme” (Kvam et al., 2021). My reaction: yes, but this prediction doesn’t seem specific to DDM or other accumulation-to-bound models.
  2. “Absorbing choice boundaries imply that once a choice is made, subsequent evidence is ignored.” Some evidence that this is correct (Peixoto et al., 2021). However, many popular models (e.g., 2DSD, Pleskac & Busemeyer, 2010) propose that subsequent evidence could be considered and provide suggestive evidence for it. In the end, I don’t think that what happens after a choice is made can either falsify or support DDM.

Note: many people suggested different effects where DDM was used to express a cognitive prediction not directly related to the model. I am not including such examples here because they are not predictions of DDM itself; for example, if such predictions fail, this would have no implication about DDM and only about the cognitive effect.

Failed predictions

  1. DDM predicts that placing more emphasis on accuracy should lead to monotonically increasing or inverted-U shaped functions for RT skewness and the ratio SD(RT)/mean(RT). Recently, we’ve shown that both of these predictions are false and the functions are instead U-shaped (Rafiei & Rahnev, 2021). One criticism that we received repeatedly: DDM only predicts this under the selective influence assumption (namely that manipulations of SAT difficulty should only affect the boundary and drift rate). If you don’t buy this, then this is not really a “prediction of DDM”. However, throwing away the selective influence assumption robs the model of its ability to predict (or even explain) many cognitive effects.
  2. “A choice can only be triggered after sampling a piece of evidence for the to-be-chosen option”. Difficult to evaluate in perception tasks but incorrect in purchasing decisions (Busemeyer & Rapoport, 1988).
  3. “Choice should be determined by the balance of evidence and not by the support for any individual option”. In other words, the magnitude of support / total evidence for different options should not affect choice as long as balance is maintained. Doesn’t seem correct (Steverson et al., 2019).
  4. “Evidence represented by decision makers should not be autocorrelated across time if the underlying information itself isn’t autocorrelated.” Some evidence exists that this is incorrect (Kelly et al., 2001).

Interim verdict

DDM clearly can make predictions. However, it’s “successful” predictions are either not specific to the model or not predictions at all. Will update this verdict as I receive more examples.

Does expectation change the sensory signal?

Expecting to see something makes us more likely to report seeing it. Imagine that I show you an ambiguous green/blue color. As I am showing it to you, I inform you that you’re about to see green. And, no surprise here, you’re more likely to say that you saw green.


The fascinating question is why. In particular, there are two very different possibilities for what expecting green did to influence your report:
Option 1: Expectation changed the sensory signal itself
Option 2: Expectation changed the decision processes but left the sensory signal unaltered

Several lines of evidence suggest that expectation alters the sensory signal (Option 1). First, popular computational theories, such as predictive coding, suggest that top-down signals affect how the forthcoming stimulus is actually processed, thus altering the signal itself. Second, neuroimaging studies have repeatedly found that expectations affect the activity in early sensory areas such as V1. Such results seem to fit best with Option 1 since areas such as V1 are assumed to reflect the sensory signal. The problem is that all of this evidence is very indirect: computational theories can be wrong, while activity in sensory areas do not always have to reflect only the feedforward signal.

Therefore, distinguishing between Options 1 and 2 would be most convincing if based on behavioral data. If we can find behavioral evidence that the sensory signal is processed differently in the presence of expectations, then that would be solid evidence for Option 1. If we can’t, then we should prefer Option 2.

This is exactly the problem we tackled in our recent article in Scientific Reports (Bang & Rahnev, 2017).

Full disclosure: we went into this project fully expecting to provide the killer evidence for Option 1. Instead, we ended up with (what seems to us to be) strong evidence for Option 2.

Here’s briefly what we did. We asked subjects to discriminate between left- and right-tilted Gabor patches. Before or after the presentation of the Gabor patch, we provided a cue about the likely identity of the stimulus. We reasoned that a cue coming before the stimulus (pre cue) would induce an expectation that could change the sensory signal. On the other hand, a cue presented long after the stimulus (post cue) can’t change the sensory signal but must act through affecting the decision. Now, we only need to compare how the sensory information was used in the presence of pre vs. post cues – any difference that we can find would necessarily stem from a sensory signal change induced by the pre but not the post cue.

So, was there a difference in the use of sensory information between pre and post cues? Not a trace.

We ran several types of sensitive analyses including temporal reverse correlation (to check for a change in how signals at different time points influence the decision) and feature-based reverse correlation (to check for a change in how different stimulus orientations influence the decision). All of our results were perfectly explained by a simple model in which both the pre and post cue only affected the decision criterion but left the sensory signal unaltered.

Of course, this is a single study and we need more evidence before completely rejecting the idea that expectations alter the sensory signal (Option 1). But right now, I’d put my money on expectations leaving the sensory signal unchanged and simply altering our decision processes (Option 2).

In other words, if our results generalize, it means that you’re not actually seeing the ambiguous color as green; you’re just deciding that it must be green.

Confidence leak

What determines if you are confident in your decisions? In the perceptual domain, the standard answer to this question is usually “signal strength.” The idea is that we are more confident for larger, brighter, less noisy stimuli. Beyond that, however, our previous decisions also have an impact: after making a decision with high confidence, we are more likely to be very confident again. This is an example of the ubiquitous sequential effect that turns up everywhere in perception.

But what if your previous decision was for a different task altogether? Does your confidence on a completely different task still influence you when you are deciding on your current task?

In a paper that just came out in Psychological Science (data and code here), we show that sequential effects of confidence appear even between different tasks. We called this effect confidence leak.

The task used in 2 of the 5 experiments.

Our design was simple: present 40 letters (Xs and Os) in different colors (red or blue), and then ask participants to judge whether there are more Xs or Os, and whether there are more blue or red letters. Even though the processing of letter form and color tend to be separated in the visual system, we find that, on a trial-by-trial basis, confidence on one task predicts the confidence on the other.

A number of analyses and control experiments confirmed that the effect is not due to trivial reasons such as motor priming, attentional fluctuations, or confidence drifts. What really convinced us, though, was our causal manipulation. We increased the difficulty of one of the tasks, so that confidence decreased. Lo and behold, we found a corresponding decrease in the confidence in the other task. In other words, the lowered confidence in the first task leaked into the second task.

Why does confidence leak happen? We think that people believe that the environment is consistent (autocorrelated) and thus an easy trial ought to be followed by another easy trial, even if the tasks are different. We constructed a Bayesian model that demonstrates how this assumption would naturally produce the observed confidence leak.

Who are the people that show more of this (suboptimal) confidence leak? In a separate experiment, we found that people with lower gray matter volume in the anterior prefrontal cortex (aPFC) exhibited higher confidence leak and so did people with lower metacognitive sensitivity.

It is interesting to consider the limits of the phenomenon. Would confidence leak appear for tasks that involve different modalities (e.g., vision and audition)? Would it happen for memory or general knowledge tasks? If our model is correct, as long as people are willing to assume that the difficulty of different tasks is correlated in the natural environment, confidence leak would invariably follow.