[Beowulf] cluster for doing real time video panoramas?

Thu Dec 22 07:31:16 PST 2005

At 06:01 AM 12/22/2005, Bogdan Costescu wrote:
>On Wed, 21 Dec 2005, Jim Lux wrote:
>
>>http://industrialcomponent.com/unibrain/ubwebcam.html
>
>It's not clear from the description what you need to do to get the video 
>stream. It's connected through FireWire, but it doesn't say anything about 
>DV - and given the resolution which is different from standard NTSC or PAL...

To do the image transformation, all I need is an array of pixels, which is 
what the camera spits out: VGA frames at 640x480.. yes, they're the usual 
R,G,B scheme with Bayer layout, so the color resolution is half the frame 
resolution, but that's typical for color cameras (even electronic 
ones).  The resolution is better than NTSC or PAL (not by much, but there 
it is)

>>For instance, if the cameras pointed along the x axis have the long 
>>direction of the frame in the y direction and the short in the z, and the 
>>cameras on the y axis have the long direction in z, and the short in x, 
>>and the cameras on z have the long direction in x and short in y.  IF the 
>>short direction covers 90 degrees, then you'll have overlap (I think..)
>
>Your setup would have to match the short side from one frame with the long 
>side from another frame which I think is not so easy.

You're going to be resampling anyway, and the odds of a camera pixel lining 
up exactly with an output pixel is vanishingly small.  Once you've got the 
transformation, it works fairly well.

>IMHO it would be easier to have something like 8 cameras, 4 disposed in a 
>horizontal plane at 90 degrees from each other such that their short sides 
>match, then have 2 cameras pointing upwards each directed 60 degrees from 
>the horizontal plane and from the other camera; then other 2 cameras 
>pointing downwards like in a mirror. The long sides from all the 4 last 
>cameras would correspond to the long sides of 2 opposite cameras from the 
>horizontal plane.

But that makes my robot cost $300 more, plus the extra 
computation!<grin>...  There are a whole raft of optics issues I'm looking 
into.  For instance, if you can cover 120 degrees, you can do it with 4 
cameras (arranged at the vertices/faces of a tetrahedron)

However, the cheap cameras don't have the field of view one really 
needs.  But maybe, 3-4 real wide angle cameras (perhaps shining the camera 
into a curved reflector (think christmas tree ornament) to give spherical 
situational awareness, and then a remotely controlled narrow fov camera for 
"foveal vision".  However, I have tried to drive a robot by remote control 
using a single tilt/pan camera and it's a mindbendingly difficult 
chore.  You really need that peripheral vision and the ability to turn your 
head to anticipate the turn.

For those that don't believe this, here's Jim Lux's exercise for would be 
teleoperators:

Get a pair of safety goggles (or a motorcycle helmet).  Cover the faceplate 
with tape, leaving an aperture corresponding to the field of view of your 
proposed camera setup. If you have a helmet,roll up a towel and put it 
around your neck and tape the helmet to it so you can't turn your 
head.  Try to walk around the room.  Try to ride a bike, drive a cart, 
etc.  You will crash, unless you go excruciatingly slow.

Or, for the more technically inclined.. go get one of those nifty 2.4GHz 
wireless cameras and tape it to the top of a toy radio controlled 
car.  Attempt to drive the car by looking only at the video screen, without 
having any visibility of the car.

>>http://www.luxfamily.com/pano1.htm
>
>404, but don't bother, I got the idea.

oops   http://www.luxfamily.com/pano.htm

>>e.g., the cameras on the Mars rovers)

But those panoramas are done in non-real time.  Not only the rendering, but 
the picture taking too (they use frame sequential color, which is what they 
also used for color TV from the moon some 30+ years ago).  If you look at 
the "movies" of the dust devils you can see the color fringe 
artifacts.  However, they also have a fairly high res camera too (1000x1000 
pixels, I think)

Since it takes 20 minutes for the signals to get to Earth, there's not much 
point in real-time rendering, you're not going to be "joysticking" the 
rover around from Earth.  The rover cam stuff is fairly sophisticated and 
they've got some really nifty stereo vision software to take two viewpoints 
and turn it into a terrain map.  I'll probably poke around for that as well.

>I thought that you want to have such a nice machine for yourself :-)
>
>>It just occurred to me that another way might be to interleave frames 
>>among CPUs (there'd be some lag, and there's the sync problem).. Say it 
>>takes 0.1 seconds to process all the cameras into a single image.  Then, 
>>with three processors, I could keep up at 30 frames/second.
>
>Indeed, I mentioned this as a possibility in the end of my message. 
>However, don't forget that a cluster is not only about CPUs but also about 
>communication. A typical NTSC frame (720x480) with 24bits per pixel takes 
>about 1Mb (you don't want to loose quality with lossy compression and CPU 
>time with any compression in general). With 100Mbit/s networking you can 
>transfer only about 10 frames per second - and this while adding anyway 
>delay to the video stream (0.1s which translates into 3 NTSC frames). So 
>GigE or better should be chosen... or some delay added to the audio stream.

Good point.. I wonder how one could hook up all cameras to all nodes 
(cheaply)..
If you wanted to parallelize the rendering on a frame by frame basis, you'd 
need to stream the video from all cameras to all processors.

>>I envision a scheme where you generate a series of full spherical frames 
>>that you could record (e.g. onto a hard disk), and then, later you could 
>>play it back looking in any direction.  And, in real time (while driving 
>>the robot), you could look at a subset.
>
>When looking at a subset, you would actually not care about reconstructing 
>the whole spherical image, so you could handle only the 1-3 cameras that 
>point approximately in the direction that you are interested in, which 
>would be decided based on the mapping of camera space to real space. This 
>might mean a serious decrease in computational and communication needs, 
>especially as you might not need to make the full transformations even for 
>these 1-3 cameras (f.e. if the "window" corresponds exactly to one camera, 
>you don't need any transformation, except maybe for color correction) and 
>then maybe even one computer is able to do the whole work.

Hmm.. good point, one could record all cameras raw (for later 
reconstruction), and only realtime render just the ones you need.

Jim Lux