You are here

Robotics: How machines see the world

Can you tell the difference between a human and a soda can? For most of us, distinguishing an average-sized adult from a five-inch-high aluminium can isn’t a difficult task. But to an autonomous robot, they can both look the same. Confused? So are the robots.

Last month, the UK government announced that self-driving cars would hit the roads by 2015, following in the footsteps of Nevada and California. Soon autonomous robots of all shapes and sizes – from cars to hospital helpers – will be a familiar sight in public. But in order for that to happen, the machines need to learn to navigate our environment, and that requires a lot more than a good pair of eyes.

Robots like self-driving cars don’t only come equipped with video cameras for seeing what we can see. They can also have ultrasound – already widely used in parking sensors – as well as radar, sonar, laser, and infra red. These machines are constantly sending out flashes of invisible light and sound, and carefully studying the reflections to see their surroundings – such as pedestrians, cyclists and other motorists. You’d think that would be enough to get a comprehensive view, but there’s a big difference between seeing the world, and understanding it.

Which brings us back to the confusion between cans and pedestrians. When an autonomous car scans a person with its forward-facing radar, they show the same reflectivity as a soda can, explains Sven Beiker, executive director of the Center for Automotive Research at Stanford University. “That tells you that radar is not the best instrument to detect people. The laser or especially camera are more suited to do that.”

Just an illusion

The hard part is getting a robot to intelligently identify what it has detected. We take for granted what goes into creating our own view of the road. We tend to think of the world falling onto retinas like the picture through a camera lens, but sight is much more complicated. “The whole visual system shreds images, breaks them up into maps of colour, maps of motion, and so on, and somehow then manages to reintegrate that,” explains Peter McOwan, a professor of computer science at Queen Mary, University of London. How the brain performs this trick is still a mystery, but it’s one he’s trying to replicate in robot brains by studying what happens when we have glitches in our own vision.

There are some images that our brains consistently put together incorrectly, and these are what we call optical illusions. McOwan is interested in optical illusions because if his mathematical models of vision can predict new ones, it's a useful indicator that the model is reflecting human vision accurately. “Optical illusions are intrinsically fascinating magic tricks from nature but at the same time they are also a way to test how good your model is,” he says.

Most robots, for example, would not be fooled by the Adelson checkerboard illusion where we think two identical grey squares are different shades: “Humans looking at this illusion process the image and remove the effect of the shadow, which is why we end up seeing the squares as different shades of grey,” explains McOwan. Although it might seem like the machine wins this round, robots have problems recognising shadows and accounting for the way they change the landscape. “Computer vision suffers really badly when there are variations in lighting conditions, occlusions and shadows,” says McOwan. “Shadows are very often considered to be real objects.”

Hijack alert

This is why autonomous vehicles need more than a pair of suitably advanced cameras. Radar and laser scanners are necessary because machine intelligences need much more information to recognise an object than we do. It’s not just places and objects that robots need to recognise. To be faithful assistants and useful workers, they need to recognise people and our intentions. Military robots need to correctly distinguish enemy soldiers from frightened civilians, and care robots need to recognise not just people but their emotions – even if (perhaps especially if) we’re trying to disguise them. All of these are pattern-recognition problems.

The contextual awareness needed to safely navigate the world is not to be taken lightly. Beiker gives the example of a plastic ball rolling into the road. Most human drivers would expect that a child might follow it, and slow down accordingly. A robot can too, but distinguishing between a ball and a plastic bag is difficult, even with all of their sensors and algorithms. And that’s before we start thinking about people who might set out to intentionally distract or confuse a robot, tricking it into driving onto the pavement or falling down a staircase. Could a robot recognise a fake road diversion that might be a prelude to a theft or a hijacking?

McOwan isn’t overly worried by the prospect of criminals sabotaging autonomous machines. “It’s more important that a robot acts predictably to the environment and social norms rather than correctly,” he says. “It’s all about what you would do, and what you would expect a robot to do. At the end of the day if you step into a self-driving car, you are at the mercy of the systems surrounding you.”

No technology on Earth is 100% safe, says Beiker, but he questions the focus on making sure everything works safely, rather than focus on what makes it work. “I found it amazing how much time the automotive industry spends on things they don’t want to happen compared to the time they spend on things they do want to happen,” he says.

He admits that for the foreseeable future we have to have a human monitoring the system. “It’s not realistic to say any time soon computers will take over and make all decisions on behalf of the driver. We’re not there yet.”

Frank Swain