Josh Johnston is a research scientist and visualization expert for the Division of Research at Boise State. He holds an M.S. in robotics from Carnegie Mellon University.
The Boise State Broncos, picked in the preseason to win the Mountain West Conference, took the court at Colorado State on February 10, having lost three of their last four league games in the midst of an uncharacteristic shooting slump and a slew of turnovers. Despite a choppy game and losing two big men to fouls, the Broncos appeared to have exorcised their mid-season demons and put themselves in position to win on the road, possibly salvaging their last chance at an NCAA at-large bid.
A well-defended CSU miss at the end of regulation was followed by even play and another tie almost five minutes later. Boise State’s defense again held off CSU, resulting in an over-and-back violation that occurred with about a second and a half left, but wasn’t whistled until the clock showed 0.8 seconds, with the score tied 84-84. The sport scientists tell us 0.4 seconds is enough for a tip and 0.6 enough for a catch and shoot, so there was still a chance to avoid a second overtime on the road.
A Bronco 3-pointer went in at the buzzer and was initially ruled to be good. After video review, however, the referees overturned their own decision, deciding it took 1.3 seconds to take the shot. An official statement by the Mountain West Conference cited numerous independent reviewers and affirmed the 1.3 second time. Many at home and in the media, however, timed the shot at around 0.65 seconds, agreeing with the New York Post that it was “The worst college basketball call you’ll see all season.”
In this analysis, I’ll explain how expert systems — game clocks, robotics, driverless cars — bias our perception of information, often leading us to incorrectly override our judgment and experience. In the referees’ decision and other examples, I’ll show how we often trust the machine even when our role as overseers of technology is explicitly to second-guess and apply our experience. We’ll see how this weakness undermines “human in the loop” safeguards for weaponized robots. Finally, I’ll apply these lessons to why the “human as the backup” paradigm for self-driving cars pursued by many developers is a dead end, rather than a shortcut while we wait for full driverless autonomy to arrive.
Back to the game: With 0.8 sec on the clock, Coach Leon Rice calls the play, which has super-senior Anthony Drmic, the smartest player on the team, feed dunk-machine James Webb III, the most athletic player to ever take the court in a Broncos jersey, up the lane. Unable to get open, Webb instead runs up the right side where Drmic hits him for a flying 3-point shot that miraculously goes in, the third Bronco 3-pointer of the game to use the backboard. Having regained the confidence fractured by tough early losses against Michigan State and Arizona (twice), the Broncos are poised to repeat last year’s late run, where they came from behind to win the Mountain West championship.
— Dave Southorn (@davesouthorn) February 11, 2016
Except, after an eternity spent reviewing the seemingly simple play, the officials wave the shot off, the deflated Broncos start second overtime with a foul on the tip and a 9-0 CSU run, and after losing, the team is somehow left even more disappointed, confused and without answers as to why Rice’s most talented team can’t meet their own lofty expectations.
Somehow, the referees overturned a shot so obvious that any number of us at home and, later, media outlets and amateurs on the internet were timing that it took somewhere between 0.6 and 0.7 seconds. The statement by the officials after the game inexplicably claimed this shot took 1.2 to 1.3 seconds. The following day, the Mountain West issued a statement reaffirming that “[i]t is clear 1.2 to 1.3 seconds elapsed…” and that:
[t]he Mountain West Coordinator of Officials, the NCAA National Coordinator of Officials, the NCAA Secretary-Rules Editor and the MW Conference office have reviewed the play extensively and consulted on the administration of the video review. It has been determined the game officials executed the appropriate protocol and made the correct call.
How could all of these highly trained professionals get this so wrong? As we look at how trust of technology blinded these officials to their own better judgment, we’ll see an important lesson for automation under human supervision and understand why Google is removing steering wheels from their prototype driverless cars.
This story is moving fast, and at the time of writing the Mountain West has admitted no hardware malfunctions. The reader, however, likely has the benefit of knowing why the timer in the conference’s own video evidence runs at twice the rate of the game clock. I’m left with my theories.
The replay system is counting frames at twice the duration it should. I believe it involves mishandling the differences between interlaced and progressive scan video, the largely ignored “i” or “p” in formats like 1080i and 720p. An interesting technological meme, interlaced video finds its roots in the 1920s, when the low frame rate of film movies was masked by flashing each frame on the screen three times, eliminating visible flicker.
Later, CRT television feeds were standardized to display half the lines of resolution in one frame, then the other half in the next. In the US, the rate was set at 60 Hz per half frame (30 Hz per complete frame) to match the rate of studio lights powered by 60 Hz AC power. The European broadcast standard, like the European electrical grid standard, was 50 Hz/25 Hz.
Now that we don’t use CRTs for our displays, the standard interlaced signal is deinterlaced by the television to form complete frames at half the rate. LCDs display whole frames in a process called progressive scan. A 60 Hz interlaced signal becomes a 30 Hz progressive scan signal when deinterlaced. Doing this right is particularly important during slow motion or pausing. Those of us old enough will remember pausing a VHS tape during motion and seeing a smeared image as if stuck between two frames. When playing at normal speed, we perceive this interlaced feed as continuous motion.
Depending on the implementation of the video replay system and the cameras it is connected to, there are several opportunities to mismatch expectations about whether video has been deinterlaced and whether frames were duplicated in that process. This could create double counting and make the stopwatch run at twice the speed it should.
Regardless of the cause of this malfunction, the takeaway is that technology results in complex systems that are full of incompletely-defined interfaces, unstated assumptions, and opportunities to be used in unexpected ways.
Autonomous and artificial intelligence systems are faster, more vigilant and have access to much more knowledge than a human. They are brittle, however, and tend not to fail gracefully when faced with unforeseen situations.
Humans, on the other hand, are adaptable and robust to unexpected or new challenges, though we tend to be distractible or let our focus wander. The temptation is to match automation and humans together, with the AI performing the dull, repetitive tasks and the human stepping in to resolve complex situations as they arise.
This is the approach of driverless cars in the mold of Tesla’s Autopilot, a mostly-hands-free steering system that occasionally requires the driver to step in and quickly resume driving. I’ve detailed my concerns about the technical readiness of this approach elsewhere, but am going to focus here specifically on the transition to human control. In driverless car circles, a mixed autonomy model is often called “human as the backup.” In military circles, it’s “human in the loop,” meaning a human is always providing oversight and has the ability and obligation to step in and correct a bad decision. We often hear human in the loop invoked as a firewall to prevent robots from making lethal decisions. Google “human in the loop warfare” and almost every result is a person saying some variant of “we must keep humans in the loop.”
In his book Wired For War, PW Singer ably demolishes the concept that human in the loop puts people in control as responsible overseers. While the book is worth reading for anyone interested in this topic, this article repeats a vignette from the book about how sailors on the Aegis-equipped USS Vincennes shot down Iran Air Flight 655 despite its completely innocent flight profile.
[T]he Vincennes’s radars spotted Iran Air Flight 655, an Airbus passenger jet. The jet was on a consistent course and speed and was broadcasting a radar and radio signal that showed it to be civilian. The automated Aegis system, though, had been designed for managing battles against attacking Soviet bombers in the open North Atlantic, not for dealing with skies crowded with civilian aircraft like those over the Gulf. The computer system registered the plane with an icon on the screen that made it seem to be an Iranian F-14 fighter (a plane half the size), and hence an “Assumed Enemy.”
Even though the hard data were telling the human crew that the plane wasn’t a fighter jet, they trusted what the computer was telling them more. Aegis was [in] Semi-Automatic mode, giving it the least amount of autonomy. But not one of the 18 sailors and officers on the command crew was willing to challenge the computer’s wisdom. They authorized it to fire.
We are highly influenced by expert systems. In fact, while supervising an autonomous system flawlessly performing its repetitive tasks for hours on end, we’re being trained to trust its judgment. When our situational awareness is provided partially or entirely by this system, we’re further influenced by its assumptions, and willingly assume them without necessarily understanding them.
We may believe we’re exercising judgment, but the data we use (an F-14 icon on a screen, or a stopwatch on a replay) are already prejudiced by the very technology we don’t trust to make the decision. A referee, or any other official reviewer of the Boise State shot, has spent his career with a whistle and stopwatch and trusts both without question.
The specific technology failure of the replay system will be sorted out, but there are more important implications for the design of human/machine systems.
Consider the referees, who watched Webb’s shot in real time and considered it good. Maybe they could look at the replay and decide it was a hair over, at 0.8 or 0.9 seconds. But how can they become so easily convinced that it actually took 1.3 seconds, 50 percent longer than the time on the clock, and twice the time they saw it take with their own eyes?
As a Boise State employee and Duke graduate scarred by the ending of the Duke-Miami football game last fall, I’d love to blame the refs. However, the robot scientist in me recognizes that most of us will disregard our own experience and the direct information if it is contradicted by an expert system we trust.
In this case, the effect was strong enough for the referees to ignore their own eyes, the clock operator’s skill and their knowledge that a quick catch and shoot should take half the time the replay system claimed. As they know from the famous Duke-Kentucky game, Christian Laettner had time to catch a 3/4 court lob, dribble, leisurely turn around and shoot with less than 2.1 seconds elapsing.
Obviously Webb’s quick shot without turning wouldn’t take more than 60 percent of that time. Even the head of the NCAA officials can believe it does, however, when presented with a simple stopwatch at the bottom of the screen.
An example worthy of an article in its own right is Air France Flight 447, which crashed into the Atlantic Ocean when its pilots failed to take over safe control of the aircraft when the autopilot disengaged. The pitot tubes were likely obstructed, providing faulty airspeed and attitude information to the autopilot. The pilots became disoriented after taking control because they relied on the same information for their situational awareness.
Unable to adapt to the control transition or gain situational awareness, the pilots unwittingly commanded a nose up orientation, entered a high altitude stall, and dropped into the ocean believing they were actually nose down.
Given these failures by trained professionals on the job, what are the reasonable expectations we can have for a “human as the backup” in a driverless car? My belief is that we cannot have any. A car driving itself will by design or by side effect allow the supervising driver to disengage from the process. When he must take control, the transition requires him to regain focus, build situational awareness of his vehicle and its surroundings, identify a strategy to overcome the situation that confused his autopilot, and execute it well. Some argue this can be overcome by longer transition times, but for the car to be able to drive long enough to support all drivers and circumstances, it effectively can operate indefinitely.
Supervised autonomy, having the autonomous system perform routine tasks under oversight of a human operator, is in many ways harder than purely autonomous operation.
Many car companies are finding success by inverting the formula, with the human driver operating the car most of the time, only to be corrected by a supervising autonomous agent. Represented by active safety technologies like lane keeping assist and collision mitigation braking, this approach captures the safety benefits of autonomy but not the convenience and leisure.
Companies that responsibly seek to take the driver’s hands off the wheel (sorry Tesla) are warming to the idea that they must remove the wheel itself. This is Google’s position. Their engineers are even skeptical of an override button, likening it to the one in an elevator that only ever gets pressed by accident or small children.
After Iran Air Flight 655 was shot down and after Air France Flight 447 crashed, the Navy and airlines introduced training to reclaim skepticism of technology aids to decision processes and to reemphasize direct information and human reasoning.
The Mountain West and NCAA should also consider how technology and the review process fits into their decision processes both in-game and in post-game evaluation. (As a fan, may I vote for fewer and shorter reviews?) Those developing technology or public policy for driverless cars should also reconsider the temptation to expect the operator to be a rational and ready replacement for brittle technology.
And regardless of how the Mountain West chooses to count the result, the Broncos can know that according to human judgment, they got the shot off.