Category Archives: Evaluation


At least in a professional setting, my default position on assessment is that lower-stakes assessment should be the first option (where “low stakes” means there are no consequences for getting questions wrong, like polling). Professionals should not need external incentives to take assessments seriously, and high stakes assessment carry a lot of overhead costs. The point of assessment should be to let participants and instructors understand their own level of understanding, and that can usually be done with low stakes assessments. (High stakes testing is still useful or necessary sometimes, but just shouldn’t be the default.)

In Make It Stick, this gave me pause:

Make quizzing and practice exercises count toward the course grade, even if for very low stakes. Students in classes where practice exercises carry consequences for the course grade learn better than those in classes where the exercises are the same but carry no consequences. (p. 227)

Courses in professional environments don’t necessarily have grades, but that still raises the question of whether attaching some kind of consequence to practice questions in professional training would increase learning, as the authors assert.

To the extent that professionals who are participating in training should understand the direct relationship between what they are learning and their ability to grow in their careers, I’d imagine that external incentives or accountability shouldn’t be necessary unless those professionals are only in the training because they have been required to, or to pick up credits for their licensure, in which case external incentives might make sense, but also in which case there are bigger issues to solve.



I direct learning for a CPA firm. I’m not a CPA, but I feel like I learn a lot from them.

One concept that auditors talk a lot about is controls. Controls are processes, tools, and checkpoints that businesses have in place to guard against error and fraud. For instance, if a large transaction requires the signature of the CFO, that’s a control. Password-protecting critical financial systems is a control.

In short, controls are a concept that auditors understand because auditors know that businesses with poor controls in place are going to be a lot harder to audit.

I’ve used controls as a way to explain the importance of measuring mastery of learning objectives. When an auditor–indeed, most any professional–is asked to design a course for less experienced professionals, their default is to typically treat it more like a presentation than a course and include little interactivity and no means for instructors to assess how well learners grasp the material before moving on to the next topic.

One could argue in good faith that it is the learner’s responsibility to learn. That as a professional if someone is struggling, it is on them to recognize that reality and take steps to ameliorate it. In reality, that puts the firm at risk.

So when I talk about introducing checkpoints and polling questions and case studies, I sometimes talk about them in terms of controls. Without those elements built into the course, we have no way of knowing if a course was effective (and more formatively, instructors will have no way of knowing whether what they are doing is working or whether they need to do something else).

Auditors know what separates a strong control from a weak one, so this becomes a powerful way to make the case for investing in classroom activities that provide evidence of learning.

Writing Exam Questions Changes Your Perspective

It’s interesting to think about and debate the effects of exams on learners, but what is the effects on course developers?

I was helping a couple of SMEs create exam questions recently, and the dialog between the SMEs was really interesting. “Do you think we do enough in the course to really help learners understand that concept? Maybe we’ll need another example. Do you think we make that point clearly enough?” And so on. Crafting exam questions really made them think hard about what they were teaching and how they were teaching it. I don’t always see this kind of introspection sparked, but it is a lot of fun when I do.

Back when I was first learning how to create instructional designs, I was taught to write the objectives first, and then the exam questions, and only then do you start to design the course. The idea being that if you have difficulty writing exam questions, you may lack clarity around your objectives. It was great advice, and saved me a ton of design time over the years.

The Future: Brain Monitoring

Michael Allen brings up the possibility (p. 146) that brain monitoring during instruction may not be that far away. There’s an interesting thought. What if we could monitor the brain directly during instruction to tell true engagement levels?

This may sound invasive or creepy, but what about as a personal learning tool? Metacognition is not easy; we don’t always realize when we aren’t learning very efficiently, so what if there was a machine that could measure our current level of learning? It could signal us that it’s time to take a break, or shut off distractions, or try a different learning approach.

From there, it’s not a long leap to elearning that can respond to real time monitoring of learning efficiency to make instructional choices for us, or gently suggest it’s time to take a break.

In terms of classrooms, it’s not hard to imagine a classroom setting where learners would want instructors to have (probably aggregated, anonymized) access to real time data about engagement if it could lead to a better classroom experience. Maybe! It’s interesting to think about (acknowledging that it is also interesting to think about the myriad privacy concerns, slippery slope possibilities, dangers of blurring the line between thought and algorithms, etc.).

Application-Based Assessment Questions

We’ve been spending time at the firm experimenting with high stakes testing–that is, end-of-course exams that carry consequences for failure. I’m leading a working group to revise our policies based on what we’ve learned.

One recommendation I brought to the group was that all high stakes tests have to feature application-based questions for at least half the questions. The idea here is that application-based questions better test usable knowledge. Also, we want to test knowledge acquired, not the ability to run keyword searches in the participant guide (our tests are open book).

(An assessment-based question is one that asks test takers to apply knowledge to realistic situations. For instance, the question, “Which of these is the best response to an angry customer who claims that she did make a reservation even though none shows up in the system and there are no tables open?” is application-based. Asking, “All of these are critical principles for dealing with angry customers EXCEPT:” is not because it is abstract. You could get the question right by memorizing a list.)

Developing good assessment questions is hard, and this recommendation will create significant work for some of the busiest people at the firm. However, very much to their credit, they quickly gelled around the position that testing shouldn’t be a compliance exercise; if we are going to test learners, we should write tests that, to the best of our ability, actually assess whether learners can apply what they learned in class to real world situations.

Course Assessment Based on Appreciative Inquiry

Last week I wrote about using solution-focused brief therapy as inspiration for the kinds of questions leaders should be asking their direct reports, according to author David Rock. I was following up on a second model that inspired Rock, called appreciative inquiry, when I came across this article talking about using appreciative inquiry as a model for course evaluation.

Appreciative inquiry is built around using questions to focus on the positive: what have you accomplished and how we can build on those areas of success?

As a course evaluation model, appreciative inquiry suggests that course evaluation doesn’t have to be an inventory of good and bad, but a critical search for what was accomplished and how even greater successes can be built on that foundation. The model as applied by the evaluators in the article above doesn’t mean ignoring problems, but instead viewing them critically as obstacles to doing more of the good.

Meaningfully, the model suggests that all, or at least as many as possible, learners should participate in this dialog–both, I think, to get learners to reflect on the positives in the course, leaving them presumably more likely to feel good about the knowledge they acquired and thus more likely to apply it, and to create a sense that learning is an active partnership between learners and instructors/designers.

An interesting twist that creates an inherently positive and optimistic spin to course evaluation…

Distracted Learning Index

For fun at one of our internal conferences last week I started measuring how many learners in the course were visibly displaying non-course information on a device–in other words, how many participants were multitasking. I initially called this measure the partial-tasking index, but it seems silly to invent another word for multitasking, even if that word is misleading because it implies success doing more than one thing at a time, which is very difficult to do unless the multitasker has achieved automaticity in one of the tasks. One of my colleagues pointed out a parallel to distracted driving, suggesting there should be a distracted learning index.

I took two readings per class of the percentage of multitaskers, then average the two scores. The best multitasking about I saw for a class was 3%. The worst score was 29%.

I should acknowledge here that the correlation between learning and a good distracted learning index score is probably pretty low. Just because learners are not interacting with a device doesn’t mean they are learning. On the other hand, a poor index score probably is indicative of a problem, particularly if the score gets worse during the class. It’s just a data point, an easy one to gather that is interesting to compare against other courses.

One of my fears of taking this measurement at all is that it could be misinterpreted as a call to eliminate connected technology in the classroom (if there are no distractions, learners will be forced to pay attention). It is true that courses at this conference that had electronic participant materials had, on average, more multitasking. However, some courses with electronic materials scored well on the index, so a lack of laptops doesn’t guarantee engagement, particularly since everyone at the conference has a little computer in their pockets that they can pull out whenever they are bored. Besides, technology-enhanced materials have too much upside for me to advocate a return to paper. Also, it wasn’t a fair comparison because the courses with no electronic participant guides were more often the ones with professional keynote-level speakers.

Another interesting point in the data is that large courses (>100 participants) had similar scores on average as small courses (around 30 participants). This is counterintuitive as I’d expect the larger classes to offer a kind anonymity that mighty encourage multitasking. Again, though, this might be an apples-to-oranges comparison as the larger classes tended to feature professional speakers.