[Beowulf] GPU boards and cluster servers
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Vincent Diepeveen diep at xs4all.nlWed Sep 10 16:27:35 PDT 2008
- Previous message: [Beowulf] Re: Lustre failover
- Next message: [Beowulf] Re: GPU boards and cluster servers
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
John, I'd go for AMD thing. Think about it more than 4x more cache a stream processor, as they got 64 of them doing 5 instructions a cycle or so, versus nvidia has 240 of them. Seymour Cray's law has a better balance for it than Nvidia. Additionally it will be easier to find documentation and information about AMD as they are a processor manufacturer used to give out information about their hardware, Nvidia still has to learn that. As for speed of course today Nvidia with a new GPU faster end this year AMD, next year who knows, but usually it will be turning a coin each time. Each newer GPU you can assume to be faster. Adding cores for those guys is relative easy in contradiction for CPU's. In case you plan to make an algorithm that's not embarrassingly parallel, Nvidia has a problem that AMD doesn't. It has 2 layers of parallellism versus AMD just a single one. AFAIK in AMD/ATI you've got 64 processors that get each the same instruction stream, justlike 1 block of nvidia; but nvidia additionally to that has also a grid of blocks; that means you have to make special parallellistic algorithm also between blocks which is different from the parallellism from just 64 stream processors that execute instructions @ 5 units at a time. Additionally debugging blocks is going to be tougher than debugging 1 block; If you have 1 block that all executes the same code at the same time, then that's reasonable deterministic (could be memory writes to the same adress aren't deterministic in case you plan to do those). Think about it 4x more cache a stream processor (assuming cards have same amount of cache and potential, which averaged over a few years of time will be the same). Crucial to FFT type workloads. Vincent
- Previous message: [Beowulf] Re: Lustre failover
- Next message: [Beowulf] Re: GPU boards and cluster servers
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
