You are here

Technology Overview

Stereo Vision

Stereo Geometry
Focus Robotics nDepth™ stereo vision processor provides depth measurements using
a pair of camera sensors and a technology called computational
stereo vision
. The basis of the technology lies in the fact that a single
physical point in 3D space projects to a pair of unique image locations when
observed by two cameras. This can be seen in the diagram where point
P in space projects to a unique location OL
in the left image and 0R in the right
image. If it is possible to locate these corresponding points in the camera
images, then the 3D location of the physical point P can be
computed by triangulation.

Camera Calibration

In order to triangulate the location, accurate estimates of camera external
geometry (position and orientation of each camera), and internal geometry (lens
distortion, focal lengths, optical centers) must be calculated. This is
necessary to relate camera information expressed in pixels to the external
world coordinate system expressed in meters. This process is referred to as
camera calibration. The nDepth™ processor is programmed
with calibration information for a particular camera setup and then uses that
to calibrate each new image on the fly in real time. Calibration information
is determined at the factory for the camera or can be recalibrated in the
field using a simple software tool and printed checkerboard called a
calibration object.

Stereo Correspondence

Finding the points in the left and right images which correspond to the same
physical point in space is called the correspondence problem.
This is a very difficult problem to solve and is the main function of the
nDepth™ vision processor. From a theoretical point of view, no general
solution can exist given the ambiguity which results from textureless regions,
occlusion, specularities, and the like. From a computational standpoint, trying
to match each of the pixels in one image to each of the pixels in the other
image is extremely difficult given the massive number of comparisons.


The nDepth™ processor reduces the amount it needs to search by first
rotating and aligning the left and right images so that it only needs to search
along one horizontal scanline for a corresponding pixel rather than the entire
image. This processes is referred to as rectifying the image and
has the added benefit of reducing false matches.

Block Matching

Block Matching
Block Matching
To increase the likelihood of a correct match, the nDepth™ processor
also looks at a region of 9x9 pixels around each pixel in the left image and
searches for the best matching region of equal size in the right image.
This is called block matching. The actual metric used by the
nDepth™ processor for comparing matches is called the Sum of Absolute
, or SAD for short. The horizontal distance d
searched across the image is called the disparity and is
inversely proportional to distance. The greater the matched disparity is, the
closer the object is to the camera. The nDepth™ vision processor is capable
of searching up to 124 disparity levels.

This same basic algorithm is also used by NASA's twin Mars Exploration Rovers
Spirit and Opportunity for navigation on Mars. The difference is that the rovers
use software and only achieve a few frames per second on 256x256 images whereas the
Focus Robotics nDepth™ processor provides 30 frames per second on
752x480 images with 92 disparity levels.

Consistency Checking

To further reduce the possibility of errors, the nDepth™ processor also
computes correspondence starting from the right image and searching in the left
image. If the results differ from the original left to right search, they are
marked as invalid so they can be ignored by higher level systems. This is called
left right consistency checking. Examples where this is
highly effective is at object borders and occlusions (points only visible in
one camera and thus not possible to triangulate).

In total, the nDepth™ processor completes an amazing 2 billion pixel
disparity operations per second.