[Beowulf] cluster for doing real time video panoramas?

Thu Dec 22 21:47:03 PST 2005

At 08:40 PM 12/22/2005, Gerry Creager wrote:
>I'll take a stab at some of this... the parts that appear intuitively
>obvious to me.
>
>On 12/21/05, Bogdan Costescu <Bogdan.Costescu at iwr.uni-heidelberg.de> wrote:
> >
> > On Wed, 21 Dec 2005, Jim Lux wrote:
> >
> > > I've got a robot on which I want to put a half dozen or so video
> > > cameras (video in that they capture a stream of images, but not
> > > necessarily that put out analog video..)
>
>Streaming video from 'n' cameras, or streams of interleaved images
>(full frames).

6 "channels" of video from unsynchronized cameras (unless there's a 
convenient way to genlock the inexpensive 1394 cameras), so, presumably, at 
some level, you've got 6 streams of full video frames, all at the same 
rate, but not necessarily synchronized in arrival.

> > > I've also got some telemetry that tells me what the orientation of
> > > the robot is.
> >
> > Does it also give info about the orientation of the assembly of
> > cameras ? I'm thinking of the usual problem: is the horizon line going
> > down or is my camera tilted ? Although if you really mean spherical
> > (read below), this might not be a problem anymore as you might not
> > care about what horizon is anyway.
>
>The only sane, rational way to do this I can see is if the camera
>reference frame is appropriately mapped and each camera's orientation
>is well-known.

This would be the case.  The camera/body is a rigid body, so knowing 
orientation of the body tells you orienatation of the cameras. More to the 
point, the relative orientation of all cameras is known, in body 
coordinates.  Also, the camera/lens optical parameters are known 
(aberrations, exposure variation, etc.)

 >
> > Given that I have a personal interest in video, I thought about
> > something similar: not a circular or spherical coverage, but at least
> > a large field of view from which I can choose when editing the
> > "window" that is then given to the viewer - this comes from a very
> > mundane need: I'm not such a good camera operator, so I always miss
> > things or make a wrong framing. So this would be similar to how
> > Electronic (as opposed to optical) Image Stabilization (EIS) works,
> > although EIS does not need any transformation or calibration as the
> > whole big image comes from one source.
>
>With the combination of on-plane and off-axis image origination, one
>has the potential for a stereoscopic and thus distance effect.  A
>circular coverage wouldn't provide this.  Remapping a spherical
>coverage into an immersive planar or cave coverage could accomplish
>this.

There's some literature out there on this. Notably, a system with 6 
cameras, each with >120 deg FOV, pairs at each vertex of a triangle, so you 
essentially have a stereo pair pointing every 120 degrees, with 
overlap.  There's some amount of transformation that can synthesize (at 
least for things reasonably far away) a stereo pair from any arbitrary azimuth.

>First step is to rigidly characterize each camera's optical path: the
>lens.  Once it's charactistics are known and correlated with its
>cohort, the math for the reconstruction and projection becomes
>somewhat simpler (took a lot to not say, "Trivial" having been down
>this road before!).  THEN one characterizes the physical frame and
>identifies the optical image overlap of adjacent cameras.  Evenly
>splitting the overlap might not necessarily help here, if I understand
>the process.

Fortunately, this is exactly what things like panotools and PTGui is for.

>Interlacing would have to be almost camera-frame sequential video and
>at high frame rates.  I agree that deinterlaced streams would offer
>better result.  One stream per CPU might be taxing:  This might
>require more'n one CPU per stream at 1394 image speeds!

And why I posted on the beowulf list!

 > once it's done you don't need to do it again...

>I'm not so sure this is beneficial if you can appropriately model the
>lens systems of each camera and the optical qualities of the imager
>(CMOS or CCD...)  I think you can apply the numerical mixing and
>projection in near real time if you have resolved the models early.

Seems like this should be true.  Once all the transformations are 
calculated, it's just lots of arithmetic operations.

> > To come back to cluster usage, I think that you can treat the whole
> > thing by doing something like a spatial decomposition, such that
> > either:
> > - each computer gets the same amount of video data (to avoid
> > overloading). This way it's clear which computer takes video from
> > which camera, but the amount of "real world" space that each of them
> > gives is not equal, so putting them together might be difficult.
> > - each computer takes care the same amount of the "real world" space,
> > so each computer provides the same amount of output data. However the
> > video streams splitting between the CPUs might be a problem as each
> > frame might need to be distributed to several CPUs.
>
>I would vote for decomposition by hardware device (camera).

In that, you'd have each camera feed a CPU which transforms it's images 
into a common space (applying that camera's exposure and lens corrections)

>  And, I'd
>have some degree of preprocessing with consideration that the cluster
>might not necessarily be our convenient little flat NUMA cluster we're
>all so used to.  If I had a cluster of 4-way nodes I'd be considering
>reordering the code to have preprocessing of the image-stream on one
>core, making it in effect a 'head' core, and letting it decompose the
>process to the remaining cores.  I'm not convinced the typical CPU can
>appropriately handle a feature-rich environment imaged using a decent
>DV video steam.

I don't know that the content of the images makes any difference, they're 
all just arrays of pixels that need to be mapped from one space, into another.

> > > But, then, how do you do the real work... should the camera
> > > recalibration be done all on one processor?
> >
> > It's not clear to me what you call "recalibration". You mean color
> > correction, perspective change and so on ? These could be done in
> > parallel, but if the cameras don't move relative to each other, the
> > transformations are always the same so it's probably easy to partition
> > them on CPUs even as much as to get a balanced load.
>
>Agreed.

Lots of good ideas,all..

Thanks,
Jim