Field Sobriety Tests and the Problem with “Accuracy” in Michigan DUI Cases
When I evaluate a Michigan drunk driving case, I often begin with a simple question: what did the officer actually observe, and what do those observations reliably prove? That question matters because field sobriety testing often takes on a life of its own in an OWI case. A driver may be nervous, tired, confused by instructions, standing on an uneven roadside, dealing with lights and traffic, or simply performing poorly on unfamiliar physical exercises. Yet the police report may reduce all of that human complexity to a set of “clues” and a conclusion that the driver “failed” the Standardized Field Sobriety Test battery.
The Standardized Field Sobriety Test battery, commonly called the SFST battery, generally refers to three tests: Horizontal Gaze Nystagmus, the Walk and Turn, and the One Leg Stand. These tests are taught in police training as tools to help officers decide whether a motorist may have a bodily alcohol content above a target level. In court, however, the testimony is often presented in a more forceful way. Officers may repeat familiar accuracy numbers from older validation studies, suggesting that the SFST battery is “91 percent accurate,” or that individual tests reliably identify intoxicated drivers. Those numbers can sound scientific. The problem is that they may not mean what jurors, judges, lawyers, or even officers assume they mean.
Greg Kane, MD, and Elizabeth Kane addressed this problem in a 2021 peer-reviewed article published by Oxford University Press in Law, Probability and Risk. Their thesis is direct: the high reported accuracy of the SFST battery is not really a property of the test. It is a property of the statistic used to describe the test. Stated more plainly, the familiar courtroom accuracy numbers can be produced by the way the data is counted, rather than by the test’s actual ability to separate impaired drivers from unimpaired drivers.
That distinction matters in Michigan because an OWI case is not an abstract statistics exercise. Under MCL 257.625(1), operating while intoxicated requires proof that the defendant operated a motor vehicle in a prohibited location while under the influence of alcohol, a controlled substance, or a combination of substances, or while having an unlawful bodily alcohol content. People v Hyde, 285 Mich App 428, 447-448 (2009). Michigan also recognizes operating while visibly impaired under MCL 257.625(3), which requires proof that, because of alcohol, a controlled substance, another intoxicating substance, or a combination, the person’s ability to operate the vehicle was visibly impaired. These are legal standards, not training slogans.
The Kane article is important because it reexamines the original Stuster and Burns 1998 data set, one of the central studies used to support SFST accuracy claims. According to the summary provided, the authors obtained the original data set directly from Dr. Jack Stuster and cross-verified it against a second copy obtained by Dr. Michael Hlastala through a FOIA request to NHTSA. The two files were identical. That point is significant because the analysis did not depend on speculation, reconstruction, or selective use of outside data. It used the actual data underlying the validation claims.
The data set contained records for 297 drivers, with complete HGN, Walk and Turn, and One Leg Stand results for 261 drivers. Of those 261 drivers, 242 failed at least one subtest under the Stuster and Burns criteria, and only 19 passed all three. Kane and Kane then applied Stuster and Burns’ own pass-fail criteria. They did not move the goalposts. They used the same clue-count approach, HGN at four or more clues, Walk and Turn at two or more clues, and One Leg Stand at two or more clues, and recalculated reported accuracy across target BAC thresholds from 0.00 to 0.30 percent.
The result is the kind of finding that should make any careful lawyer pause. The reported arrest accuracy of the SFST changed dramatically depending on the target BAC threshold. At a target BAC of 0.00 percent, the arrest accuracy was reported as 100 percent. At 0.04 percent, it was 93 percent. At 0.08 percent, it was 78 percent. At 0.10 percent, it was 69 percent. At 0.15 percent, it fell to 34 percent. At 0.24 percent, it was 5 percent. At 0.30 percent, it was 1 percent.
That pattern is backwards from what most people would expect. A test supposedly designed to detect alcohol-related impairment should not appear more accurate at very low thresholds and less accurate at extremely high thresholds. If field sobriety testing truly measured alcohol impairment in a straightforward way, one would expect the test to perform better as the BAC increased. Kane and Kane explain that the paradox arises because the statistic being used, often positive predictive value or overall accuracy, depends heavily on the prevalence of the condition in the tested population and on how the “true positive” category is defined.
This is not just a technical debate for statisticians. It has direct courtroom consequences. If an officer tells a jury that a field sobriety test is highly accurate, the jury may hear that statement as meaning the test has strong diagnostic power. But Kane and Kane show that similar accuracy patterns can be produced by a coin toss or even by a test in which everyone fails. Their “Randomized Sobriety Test” randomly assigned simulated drivers to high or low BAC status. Their “San Diego All-Fail Sobriety Test” treated every driver as failing. Both controls produced comparable patterns. That means the reported accuracy figures do not necessarily demonstrate that the SFST battery itself is doing meaningful diagnostic work.
The more useful statistical measure discussed by Kane and Kane is the likelihood ratio. A likelihood ratio is less dependent on prevalence and is commonly used in diagnostic reasoning to evaluate how much a test result changes the probability of a condition. Every likelihood ratio reported for the SFST battery fell between 1.08 and 1.87. At a target BAC of 0.08 percent, the likelihood ratio was 1.35. In medical diagnostic terms, a likelihood ratio in that range produces only a small change, and rarely an important one. In ordinary courtroom terms, a failed SFST may add much less information than the traditional “accuracy” testimony implies.
Kane and Kane’s Bayesian analysis reinforces the point. If 10 percent of drivers being tested actually have a BAC above 0.08 percent, failing the SFST raises the probability to only 13 percent. If 50 percent of drivers tested have an elevated BAC, failing the SFST raises the probability to only 57 percent. Those changes are modest. They do not support the common courtroom impression that a failed SFST, standing alone, strongly identifies a person as intoxicated, over the legal limit, or visibly impaired.
The article also addresses an issue I consider especially important in Michigan practice: the difference between BAC and impairment. The Stuster and Burns validation work measured accuracy against BAC, not against actual driving impairment. That distinction matters. A per se alcohol case and an impairment-based case are related, but they are not the same thing. In an OWI prosecution under MCL 257.625(1), the prosecutor may proceed under an unlawful bodily alcohol theory or an under-the-influence theory. In an OWVI case under MCL 257.625(3), the question is visible impairment of the ability to operate. A test allegedly validated to estimate whether a person is above a BAC threshold is not automatically validated to prove that a person’s driving ability was visibly impaired.
In my practice, this distinction affects how I evaluate police testimony. A field sobriety test may have some relevance to an officer’s investigation. It may be part of the totality of circumstances. But relevance is not the same as reliability, and reliability is not established by repeating a percentage from a training manual. If a prosecution witness wants to present field sobriety testing as scientific, technical, or specialized evidence, then the reliability of the method matters.
Michigan law supplies the framework for that challenge. MRE 702 requires expert testimony to assist the trier of fact, to be based on sufficient facts or data, to be the product of reliable principles and methods, and to reflect a reliable application of those principles and methods to the facts of the case. In Gilbert v DaimlerChrysler Corp, 470 Mich 749 (2004), the Michigan Supreme Court explained that expert testimony must be evaluated for assistance to the trier of fact, qualification, reliable data, reliable principles and methods, and reliable application. A field sobriety “accuracy” claim that rests on prevalence-dependent statistics may deserve close scrutiny under that rule.
The same analysis can matter before trial. Michigan recognizes different levels of police-citizen encounters. An informational encounter requires no level of cause. An investigatory detention requires specific and articulable facts sufficient to give rise to reasonable suspicion. An arrest requires probable cause. People v Shabaz, 424 Mich 42, 56-59 (1985); Terry v Ohio, 392 US 1 (1968). In a DUI investigation, field sobriety testing is often used to move the case from roadside suspicion to arrest and chemical testing. If the SFST battery adds only modest diagnostic value, then the defense should carefully examine what other facts allegedly supplied probable cause.
This does not mean that field sobriety evidence is automatically inadmissible, automatically irrelevant, or automatically insufficient in every case. That would overstate the argument. Courts evaluate facts case by case. An officer may observe driving behavior, odor, speech, admissions, appearance, divided attention issues, physical coordination, open containers, crash facts, or other evidence. The legal question is not whether a defense lawyer dislikes field sobriety testing. The legal question is what the evidence reliably proves and whether the officer, prosecutor, or expert is overstating it.
Cross-examination should therefore be concrete. The officer can be asked what accuracy statistic was used, what population produced that statistic, whether the officer knows the prevalence of elevated BAC in that study population, whether the study measured BAC rather than impairment, whether the defendant’s roadside conditions matched the validation conditions, and whether the officer understands the difference between positive predictive value and likelihood ratio. The point is not to turn the trial into a mathematics lecture. The point is to prevent a statistical artifact from being presented as a scientific fact.
The Walk and Turn and One Leg Stand tests also require careful factual review. These are divided-attention exercises performed in artificial circumstances. A person is asked to listen, remember, balance, count, turn, and follow instructions while under police observation. Performance can be affected by footwear, surface, lighting, weather, age, injury, fatigue, nervousness, language comprehension, and instruction quality. Even when a test is standardized in a training manual, the actual roadside administration may not be standardized at all.
The Horizontal Gaze Nystagmus test raises a different set of issues. HGN is often treated as the most scientific of the three tests, but that treatment can lead to overstatement. If the prosecution offers HGN testimony as evidence of alcohol-related nystagmus, the defense should examine training, administration, medical exclusions, timing, lighting, stimulus position, equal tracking, resting nystagmus, maximum deviation, onset angle, and whether the witness is drawing a conclusion beyond the limits of the test. A courtroom should not permit the phrase “validated test” to substitute for a careful foundation.
The Kane article also has implications for drugged driving cases and Drug Recognition Expert testimony. The DRE protocol incorporates field sobriety exercises as part of a broader evaluation. If the underlying SFST validation claims are weaker than commonly represented, then that weakness does not disappear merely because the tests are embedded inside a larger protocol. In an OUID case, the defense should examine whether the prosecution is using alcohol-based validation studies to support conclusions about drugs, impairment, or categories of substances.
For clients, the practical lesson is straightforward. A police report saying that the driver “failed field sobriety tests” is not the end of the case. It is the beginning of the analysis. I want to know which tests were given, whether they were properly instructed, whether they were properly demonstrated, whether the scoring was accurate, whether the video supports the report, whether the officer counted clues that do not exist, whether the officer ignored normal performance, and whether the test was used to prove more than it can fairly prove.
For lawyers, the lesson is equally direct. The defense should not accept SFST accuracy testimony at face value. The Kane and Kane analysis gives counsel a principled, peer-reviewed basis to challenge inflated claims about the SFST battery. That challenge may arise in a motion in limine, a MRE 702 hearing, a suppression motion, cross-examination, or expert testimony. The best use will depend on the facts, the charge, the judge, the available video, the chemical test evidence, and the precise way the prosecution intends to use the field sobriety evidence.
Michigan DUI cases are often built from small observations stacked into a conclusion. Some observations may be meaningful. Others may be ambiguous. Field sobriety testing sits directly in that zone. It may look scientific because it uses standardized language and numerical clue counts. But the appearance of standardization is not the same as proof of diagnostic accuracy. Kane and Kane’s work reminds us that the legal system should ask a more disciplined question: does this evidence actually make the prosecution’s claim meaningfully more probable, or does it merely sound more certain than it is?
When I review a Michigan OWI or OWVI case, I do not assume that field sobriety tests are meaningless. I also do not assume that they prove intoxication, impairment, or a bodily alcohol level. They must be examined like any other evidence, with attention to the law, the science, the statistics, the officer’s training, the roadside conditions, and the video. A careful defense does not begin by trusting the label “failed.” It begins by testing whether the label is accurate, whether the science supports the conclusion, and whether the prosecution is asking the evidence to carry more weight than it can bear.


