[Beowulf] Infiniband Subnet Manager
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Nifty niftyompi Mitch niftyompi at niftyegg.comSat Aug 30 12:20:02 PDT 2008
- Previous message: [Beowulf] Infiniband Subnet Manager
- Next message: [Beowulf] Stroustrup regarding multicore
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Thu, Aug 28, 2008 at 08:41:18AM -0400, Prentice Bisbal wrote: > > Since an infiniband fabric needs a subnet mananger, should the master > node have an IB HCA and be connected to the IB network in order to run > the subnet manager? > > My logic behind this is that the master node will be full > enterprise-level hardware (redundant every thing), and should never go > down or be rebooted during normal use. I expect the nodes to go down > more frequently (not fully redundant hardware, higher operating loads, > etc.). > > Exactly what functions does the subnet manager perform, and what happens > if it disappears from the IB fabric? > > I've been doing research into IB all day yesterday, and I'm continuing > today, so please no RTFM answers. How big a fabric? The subnet manager (SM) manages the fabric. The most obvious functions are * assign LID (local ID) * setup routing (routing is static BTW) * notices changes. i.e. discovery, configuration and continuous monitoring of the fabric Once a fabric is live and correctly setup if the subnet manager dies nothing bad happens unless something changes. The assigned LIDs continue to be valid and the routes continue to be valid. You only loose monitoring. Some vendor switches have the ability to manage fabrics with a built in subnet management card (extra $). In many cases this it the best solution... If the SM is on the head node it might be easier to watch the SM .... In the subnet management specification there is stuff about fail over... It is possible to have a second subnet manager running on the fabric. The second SM should go idle and only be active if the other one goes silent. Caution #1 -- failover is hard to test and multiple SMs may introduce instability so test, test but do not tinker on a prodution fabric. Do monitor -- gently is fine. Caution #2 -- do not mix subnet managers. If you run a second SM run one that is identical! Do not mix OpenSM and a managed switch without vendor approval and testing.... do not mix versions of any SM... Caution #3 -- Like so many things one is good (required in this case), two might be nice but many is just wrong. This is a good URL to read and bookmark... http://infiniband.sourceforge.net/SM/overview.htm Google for OpenSM, Cisco pages have some good stuff too. -- T o m M i t c h e l l Got a great hat... now what.
- Previous message: [Beowulf] Infiniband Subnet Manager
- Next message: [Beowulf] Stroustrup regarding multicore
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
