When Thinking Twice Helps — And When It Doesn't

Sat, 21 Mar 2026 00:00:00 +0000

The Saturday Morning Experiment

Last Saturday, I installed a Python library, pointed it at Amazon Bedrock, and asked a model the same questions three times, with zero, one, and three rounds of self-reflection.

The results surprised me.

 Q Refl Time Acc Comp Nuan Total
 1 0 3.0s 4 3 3 10
 1 1 5.5s 4 2 3 9
 1 3 8.8s 4 3 4 11
 2 0 2.6s 4 2 2 8
 2 1 5.7s 4 2 2 8
 2 3 8.5s 4 2 2 8
 3 0 3.1s 1 1 1 3
 3 1 5.2s 1 1 1 3
 3 3 8.6s 1 1 1 3

Q is the question number, Refl the number of self-reflection rounds (0 = straight answer, 1 = one revision, 3 = three revisions). Acc, Comp, and Nuan are the judge’s scores for Accuracy, Completeness, and Nuance, each on a 1-5 scale, 15 max total.

schristoph.online

When Thinking Twice Helps — And When It Doesn't

The Saturday Morning Experiment