[Beowulf] cluster for doing real time video panoramas?

Wed Dec 21 16:41:07 PST 2005

[ I think that most of what I write below is quite OT for this list...  
Apologies to those that don't enjoy the subject! ]

On Wed, 21 Dec 2005, Jim Lux wrote:

> I've got a robot on which I want to put a half dozen or so video
> cameras (video in that they capture a stream of images, but not
> necessarily that put out analog video..)

It's not entirely clear to me what you want to say above... How is the
video coming to the computer ? You are later mentioning 1394 cameras,
so I assume something similar to the DV output from common camcorders.

> I've also got some telemetry that tells me what the orientation of
> the robot is.

Does it also give info about the orientation of the assembly of
cameras ? I'm thinking of the usual problem: is the horizon line going
down or is my camera tilted ? Although if you really mean spherical
(read below), this might not be a problem anymore as you might not
care about what horizon is anyway.

> I want to take the video streams and stitch them (in near real time)
> into a spherical panorama

Do you really mean spherical or only circular (the example that you
gave being what I call circular) ? IOW: are the focal axes of the
cameras placed only in a plane (or approximately, given alignment
precision) ?

Given that I have a personal interest in video, I thought about
something similar: not a circular or spherical coverage, but at least
a large field of view from which I can choose when editing the
"window" that is then given to the viewer - this comes from a very
mundane need: I'm not such a good camera operator, so I always miss
things or make a wrong framing. So this would be similar to how
Electronic (as opposed to optical) Image Stabilization (EIS) works,
although EIS does not need any transformation or calibration as the
whole big image comes from one source.

All that I write below starts from the assumption that the cameras are
mounted on an assembly in a "permanent" position, such that their
relative positions (one camera with respect to another) do not change.  
Also that you don't zoom or that you can control the zoom on all
cameras simultaneously; otherwise putting all the movies together is
probably hard (in the circular setup; but doable probably with motion
vectors or related stuff that is already used in MPEG4 compression) or
impossible (in the spherical setup where you'd miss parts of the
space).

First step should be the calibration of the cameras with respect to
each other. In the COTS world, I don't think that you'd be able to get
cameras to be fixed such that they equally split the space between
them (so that the overlap between any 2 cameras would be the same);  
then you also need color calibration, sound level calibration (with
directional mics, otherwise it makes no sense) and so on -
multi-camera setups are rather difficult to master for an amateur 
(like me, at least :-)) Another problem that you might face is the 
frame synchronization between the cameras which might come into play 
for moving objects.

Talking about moving, IMHO you need to have progressive output from
the cameras. Interlaced movies would probably create artifacts when
joining together; deinterlacing several video streams at once might be
a nice application (but very coarse grained - f.e. one stream per CPU)
for a cluster, but the results might still not be "perfect", as with
progressive output, as the deinterlacing results for the same part of 
the scene taken from several cameras might be different.

If the position of the cameras can be finely modified, it might be a
good idea to try to get them close to the ideal equal splitting of the
space by just looking at their output. But in any case, if the cameras
are fixed with respect to each other, you don't need to calculate the
overlapping regions for every frame - which means that the final frame
size can also be known at this time; knowing the overlapping also
makes easy to arrange the blending parameters. If you want a spherical
setup, it's quite likely to have more than 2 cameras that overlap in a
certain place so the calibration will likely be more difficult, but
once it's done you don't need to do it again...

What makes sense to me as a next step would be to map the "camera
space" to the "real world" - for example for the circular setup by
making a circle around the cameras assembly with degrees marked on it.  
In a spherical setup, you probably have to use 3 marked circles, one
for each axis. This way you can find a correspondence between a pixel
(let's say in the middle of the frame) and its real world angle, such
that when you are looking later for a certain angle you know what
pixel to put in the center of the image. If the whole camera assembly
rotates by a certain angle (and that's the reason for my second
question up in this message), you simply add (or substract) this to
(or from) the angle that you're looking for.

To come back to cluster usage, I think that you can treat the whole
thing by doing something like a spatial decomposition, such that
either:
- each computer gets the same amount of video data (to avoid 
overloading). This way it's clear which computer takes video from 
which camera, but the amount of "real world" space that each of them 
gives is not equal, so putting them together might be difficult.
- each computer takes care the same amount of the "real world" space, 
so each computer provides the same amount of output data. However the 
video streams splitting between the CPUs might be a problem as each 
frame might need to be distributed to several CPUs.

> But, then, how do you do the real work... should the camera
> recalibration be done all on one processor?

It's not clear to me what you call "recalibration". You mean color
correction, perspective change and so on ? These could be done in
parallel, but if the cameras don't move relative to each other, the
transformations are always the same so it's probably easy to partition 
them on CPUs even as much as to get a balanced load.

> Should each camera (or pair) gets its own cpu, which builds that
> part of the overall spherical image, and hands them off to yet
> another processor which "looks" at the appropriate part of the video
> image and sends that to the user?

Well, first of all, your words suggest to me that you are talking 
about a circular setup. In a real spherical one, you should have some 
parts that overlap from at least 3 cameras (where their edges look 
like a T) so you can't talk about pairs.

Secondly, all my thoughts above try to cover the case where you want
to get at each moment a complete image out of the system, like when
different people are watching maybe different parts of the output. If
there's only one "window" that should be seen, then the image would
probably come from at most 2-3 cameras; the transformation could
probably be done on different CPUs (like in the case for full output),
but putting them together (blending) would be easy enough to do even
on one CPU, so no much use for a cluster there... unless the 
transformations are so CPU intensive that can't be done in realtime, 
in which case you could send each frame to a different CPU and get the 
output with a small delay (equal to the time needed to transform one 
frame).

That's it, I hope that I made sense... given that it's well past
midnight ;-)

-- 
Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De