[Beowulf] Re: UPS & power supply instability - ongoing discussions

Maurice Hilarius maurice at harddata.com
Thu Sep 29 10:54:01 PDT 2005


David Kewley wrote:

> ..
>
>>That sounds like positive progress..
>>Did they name a reason this is happening, or are they taking steps to
>>send someone down with a scope to see what is happening?
>>    
>>
>Yes, they're sending people out.  Liebert engineers say they have basic 
>understandings of what's going wrong, and have some possible ways to work 
>around it or solve it.  I'll not say more at this time, to give them time 
>to work it out.
>
>  
>
Fair.

> ..
>
>I apologize for making a snide comment.
>
>  
>
No problem, you are obviously under some stress and pressure on this, so
emotions do run a bit high.

>> ..
>>
>>You have obviously decided, in advance, that the problem is with the
>>Liebert equipment.
>>    
>>
>
>Maurice, in your two replies to this thread you've made lots of incorrect 
>inferences and assumptions, including this one. 
>
Perhaps so.
I was basing what I said on what I read, however I have to apologize, as
I probably over-reacted, and have not heard the "whole story".

OTOH, I am appalled by the fact that it has been "4 weeks since you
reported the problem to Dell and Liebert and apparently they have done
close to nothing about it.
I know that on a job of this size, if you had bought from us, and
reported it, we would be all over it.
Even Dell can not afford this kind of bad PR..

> Please show me the respect 
>of *not* assuming what I think, what I've done, or what others have done at 
>our site for this problem.  If I've not *stated* some fact that you think 
>is important, simply ask me rather than assuming.
>
>  
>
OK, fair comment.
So, DID you get any useful response from either Dell or Liebert 4 weeks
ago, and in the interim.

>>You mention absolutely nothing about testing the power supplies.
>>That step should be the first, and fortunately is the easiest.
>>Almost any modern scope will do the job.
>>As it is low frequency it does not have to be an expensive or
>>specialized scope.
>>Instead of trusting "Kill-a-Watt clones" why not check the actual power
>>supply response, on a standard 115V single phase power input circuit?
>>    
>>
>
>That's an excellent suggestion, and is in accord with my usual 
>troubleshooting & experimental inclinations.  But because I have the 
>responsibility for *all* the aspects of commissioning this brand-new, large 
>cluster, I've had to leave lots of details to others, Liebert in this case.
>
>  
>
So, delays are partially because the staffing at your site is short and
you simply do not have enough time to do what it takes to make it run?
If so, I offer sympathy.
I see this far too often.
A budget of a million dollars for a cluster, but no cash to implement it
or maintain it.
That must be very frustrating!

>To the best of my knowledge, Liebert has not studied these exact power 
>supplies, but they say they understand PSes that are similar enough that 
>they can work out a model of our specific problem.  Until I have time to 
>run experiments myself, I am going to trust them to cover these bases.
>
>  
>
I would, in my experience they have a heck of a good rep.

>>I have seen power regulation equipment fail in a similar fashion before,
>>where the power supplies are pulling down too much current to the
>>neutral phase,
>>and making the power feed overload on one phase, driving it into
>>instability.
>>This is a classic symptom of cheap, poorly designed and made power
>>supplies. Or bad room wiring, with undersized neutral lines.
>>    
>>
>
>The PDUs have a front panel that displays lots of diagnostic measurements, 
>and they sound a rather piercing alarm when any measurement goes over its 
>Liebert-defined limit (they are the only alarms I've heard in that room 
>that can reliably be heard over the room noise, from any part of the 
>room :).  The PDUs also have suitably sized breakers and suitably sized 
>conductors on each of the 93 branch circuits.
>
>The three output phase currents all stay well under their limits, even when 
>they begin to become unstable (at the low-power end of the instability, and 
>well into the instability domain).  Toward the high-power end of the 
>instability domain that we've tested, the current oscillations become large 
>enough, and sit on top of a large enough average current, so the PDUs *do* 
>give overcurrent alarms (plus other alarms due to the wild oscillations).
>
>Unless something is going on that is not alarmed for, the PDUs and the Liert 
>techs who've been onsite don't indicate any problem with the neutral wiring 
>or the power supplies per se.
>
>  
>
So, what DO they think is causing this? I am really curious..

>>Liebert make big UPS and power units, and those are their "bread &
>>butter"
>>
>>Frankly I am surprised they have not yet dispatched a tech down to your
>>site with test equipment by now..
>>    
>>
>
>When did I say they haven't dispatched a tech to our site?  In fact they 
>have, mutliple times; I just hadn't mentioned that up to this point in this 
>thread.  
>
Ah.. that paints one very different picture.
So bascially Liebert are on it, you have not mentioned what, if anything
Dell have done, but your are coming to this list because after some
weeks you still are not seeing a solution happening?

>My concern was not that they aren't sending techs, but that they 
>have no solution yet, and that I wasn't getting a warm-fuzzy feeling that 
>they really were treating this problem as critically as we need them to.
>
>  
>
OK, so that tells me more..
Have they identified what peice of equipment that they think is causing
the problem yet?

>After yesterday's conference call, I feel better about their efforts.  Even 
>so, the proof is still in the outcome, and the outcome is far from certain.
>
>  
>
No kidding.

>>When you say "Liebert has been on this case for something like 4 weeks
>>now." what does that mean?
>>    
>>
>
>That's when we first demonstrated this problem to their onsite tech & 
>engaged their help in solving it.
>
>  
>
>>>Can anyone here offer ideas, or better yet, experience?
>>>      
>>>
>>I was trying to.
>>Apparently you do not appreciate suggestions, except ones that support
>>your distrust of Liebert.
>>    
>>
>
>I appreciate all constructive suggestions.  My appreciation does not extend 
>to insinuations.
>
>Thanks for trying to offer ideas & experience.  I *do* appreciate some of 
>what you've written in this email.  I appreciate *none* of what you wrote 
>in your first reply to this thread -- if you like, go back and read it and 
>see if you can understand why.
>
>  
>
OK, I apologize.
Some of it is still becasue of what I will call a "selective application
of information".
If you had mentioned things like what you say about Liebert's actions to
date, as you have in  in this message, it would have painted a different
story entirely.

>>Why not test the power supplies?
>>If doing it yourself is not something you are comfortable with, there
>>are many electrical inspection labs in your region that provide this
>>service, usually for under $150.
>>Look in the yellow pages under "testing" or similar.
>>
>>Many will allow you to stand there and watch and ask questions as they
>>do it.
>>    
>>
>
>Now *that* is a very good suggestion.  Thank you.  I did not know testing 
>could be this easy.  (By the way, I'm comfortable with testing / measuring 
>the power supplies, although I don't have the equipment on hand to do it 
>properly, and I don't have the full range of knowledge to interpret all of 
>what I measure.)
>
>  
>
We have to do it regularly for custom equipment.
To meet CSA, CE, and UL one gets what is called a "site inspection"
Often the best and cheapest way is to take the piece to a certified test
labs and they do the test, provide a short report, and a sticker
certifying it is electricall safe and accceeptable
It is not an FCC radio emissions test and certification, but you can ask
for that too, albeit at a higher cost.
They measure power characteristics, PF, current leakage, consumption,
stability, load maximum, etc.


>For now, I'm going to continue to let Liebert run with this problem; we've 
>offered to get them a power supply to take apart and/or measure, but so far 
>they seem to believe they understand it well enough.  I'm also going to 
>trust Dell, that their power supplies are of good quality, just of poor 
>interaction with the rest of our power infrastructure.
>
>Meanwhile, I have several other things to take care of on the cluster, 
>before users can get more than minimal use out of it, so I'm not yet going 
>to get into detailed measurements myself.
>
>David
>  
>
Good luck!
--

With our best regards,

Maurice W. Hilarius        Telephone: 01-780-456-9771
Hard Data Ltd.  FAX:       01-780-456-9772
11060 - 166 Avenue         email:maurice at harddata.com
Edmonton, AB, Canada       http://www.harddata.com/
   T5X 1Y3

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.scyld.com/pipermail/beowulf/attachments/20050929/b576e418/attachment.html


More information about the Beowulf mailing list