[Beowulf] cluster for doing real time video panoramas?

Joe Landman landman at scalableinformatics.com
Thu Dec 22 22:15:24 PST 2005


I seem to remember SGI doing things like this in Reality Engine texture 
memory many moons ago.  We would map multiple video streams onto the 
exterior of a mirrored sphere for one of the demos.  The graphics 
hardware then (and more so now) is very good for calculations like these 
....

Jim Lux wrote:
> At 08:40 PM 12/22/2005, Gerry Creager wrote:
>> I'll take a stab at some of this... the parts that appear intuitively
>> obvious to me.
>>
>> On 12/21/05, Bogdan Costescu <Bogdan.Costescu at iwr.uni-heidelberg.de> 
>> wrote:
>> >
>> > On Wed, 21 Dec 2005, Jim Lux wrote:
>> >
>> > > I've got a robot on which I want to put a half dozen or so video
>> > > cameras (video in that they capture a stream of images, but not
>> > > necessarily that put out analog video..)
>>
>> Streaming video from 'n' cameras, or streams of interleaved images
>> (full frames).
> 
> 6 "channels" of video from unsynchronized cameras (unless there's a 
> convenient way to genlock the inexpensive 1394 cameras), so, presumably, 
> at some level, you've got 6 streams of full video frames, all at the 
> same rate, but not necessarily synchronized in arrival.
> 
> 
>> > > I've also got some telemetry that tells me what the orientation of
>> > > the robot is.
>> >
>> > Does it also give info about the orientation of the assembly of
>> > cameras ? I'm thinking of the usual problem: is the horizon line going
>> > down or is my camera tilted ? Although if you really mean spherical
>> > (read below), this might not be a problem anymore as you might not
>> > care about what horizon is anyway.
>>
>> The only sane, rational way to do this I can see is if the camera
>> reference frame is appropriately mapped and each camera's orientation
>> is well-known.
> 
> This would be the case.  The camera/body is a rigid body, so knowing 
> orientation of the body tells you orienatation of the cameras. More to 
> the point, the relative orientation of all cameras is known, in body 
> coordinates.  Also, the camera/lens optical parameters are known 
> (aberrations, exposure variation, etc.)
> 
> 
> 
>  >
>> > Given that I have a personal interest in video, I thought about
>> > something similar: not a circular or spherical coverage, but at least
>> > a large field of view from which I can choose when editing the
>> > "window" that is then given to the viewer - this comes from a very
>> > mundane need: I'm not such a good camera operator, so I always miss
>> > things or make a wrong framing. So this would be similar to how
>> > Electronic (as opposed to optical) Image Stabilization (EIS) works,
>> > although EIS does not need any transformation or calibration as the
>> > whole big image comes from one source.
>>
>> With the combination of on-plane and off-axis image origination, one
>> has the potential for a stereoscopic and thus distance effect.  A
>> circular coverage wouldn't provide this.  Remapping a spherical
>> coverage into an immersive planar or cave coverage could accomplish
>> this.
> 
> There's some literature out there on this. Notably, a system with 6 
> cameras, each with >120 deg FOV, pairs at each vertex of a triangle, so 
> you essentially have a stereo pair pointing every 120 degrees, with 
> overlap.  There's some amount of transformation that can synthesize (at 
> least for things reasonably far away) a stereo pair from any arbitrary 
> azimuth.
> 
> 
>> First step is to rigidly characterize each camera's optical path: the
>> lens.  Once it's charactistics are known and correlated with its
>> cohort, the math for the reconstruction and projection becomes
>> somewhat simpler (took a lot to not say, "Trivial" having been down
>> this road before!).  THEN one characterizes the physical frame and
>> identifies the optical image overlap of adjacent cameras.  Evenly
>> splitting the overlap might not necessarily help here, if I understand
>> the process.
> 
> 
> Fortunately, this is exactly what things like panotools and PTGui is for.
> 
> 
>> Interlacing would have to be almost camera-frame sequential video and
>> at high frame rates.  I agree that deinterlaced streams would offer
>> better result.  One stream per CPU might be taxing:  This might
>> require more'n one CPU per stream at 1394 image speeds!
> 
> And why I posted on the beowulf list!
> 
>  > once it's done you don't need to do it again...
> 
>> I'm not so sure this is beneficial if you can appropriately model the
>> lens systems of each camera and the optical qualities of the imager
>> (CMOS or CCD...)  I think you can apply the numerical mixing and
>> projection in near real time if you have resolved the models early.
> 
> Seems like this should be true.  Once all the transformations are 
> calculated, it's just lots of arithmetic operations.
> 
> 
>> > To come back to cluster usage, I think that you can treat the whole
>> > thing by doing something like a spatial decomposition, such that
>> > either:
>> > - each computer gets the same amount of video data (to avoid
>> > overloading). This way it's clear which computer takes video from
>> > which camera, but the amount of "real world" space that each of them
>> > gives is not equal, so putting them together might be difficult.
>> > - each computer takes care the same amount of the "real world" space,
>> > so each computer provides the same amount of output data. However the
>> > video streams splitting between the CPUs might be a problem as each
>> > frame might need to be distributed to several CPUs.
>>
>> I would vote for decomposition by hardware device (camera).
> 
> In that, you'd have each camera feed a CPU which transforms it's images 
> into a common space (applying that camera's exposure and lens corrections)
> 
>>  And, I'd
>> have some degree of preprocessing with consideration that the cluster
>> might not necessarily be our convenient little flat NUMA cluster we're
>> all so used to.  If I had a cluster of 4-way nodes I'd be considering
>> reordering the code to have preprocessing of the image-stream on one
>> core, making it in effect a 'head' core, and letting it decompose the
>> process to the remaining cores.  I'm not convinced the typical CPU can
>> appropriately handle a feature-rich environment imaged using a decent
>> DV video steam.
> 
> I don't know that the content of the images makes any difference, 
> they're all just arrays of pixels that need to be mapped from one space, 
> into another.
> 
>> > > But, then, how do you do the real work... should the camera
>> > > recalibration be done all on one processor?
>> >
>> > It's not clear to me what you call "recalibration". You mean color
>> > correction, perspective change and so on ? These could be done in
>> > parallel, but if the cameras don't move relative to each other, the
>> > transformations are always the same so it's probably easy to partition
>> > them on CPUs even as much as to get a balanced load.
>>
>> Agreed.
> 
> Lots of good ideas,all..
> 
> Thanks,
> Jim
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit 
> http://www.beowulf.org/mailman/listinfo/beowulf

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 734 786 8452
cell : +1 734 612 4615



More information about the Beowulf mailing list