<div dir="ltr"><br><br><div class="gmail_quote"><div dir="ltr">On Fri, Nov 30, 2018 at 9:44 PM John Hearns via Beowulf <<a href="mailto:beowulf@beowulf.org">beowulf@beowulf.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">John, your reply makes so many points which could start a whole series of debates.</div></blockquote><div><br></div><div>I would not deny partaking of the occasional round of trolling.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div> > Best use of our time now may well be to 'rm -rf SLURM' and figure out how to install kubernetes. <br></div><div>...</div></div></blockquote><div> <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>My own thoughts on HPC for a tightly coupled, on premise setup is that we need a lightweight OS on the nodes, which does the bare minimum. No general purpose utilities, no GUIS, nothing but network and storage. And container support.<br></div><div>The cluster will have the normal login nodes of course but will present itself as a 'black box' to run containers.</div><div>But - given my herd analogy above - will we see that? Or will we see private Openstack setups?</div></div></blockquote><div><br></div><div>10 years ago, maybe even 5 I would have agreed with you wholeheartedly. I was never impressed much by early LXC, but for my first year of exposure to Docker hype I was thinking exactly what you are saying here. And then I tried CoreOS and started missing having a real OS. And then I started trying to do things with containers. And then I realized that I was seeing software which was "easier to containerize" and that "easier to containerize" really meant "written by people who can't figure out './configure; make; make install' and who build on a sand-like foundation of fragile dependencies to the extent that it only runs on their Ubuntu laptop so you have to put their Ubuntu laptop in a container." Then I started asking myself "do I want to trust software of that quality?" And after that, "do I want to trust the tools written to support that type of poor-quality software?" And then I started to notice how much containers actually *increased* the amount of time/complexity it took to manage software. And then I started enjoying all the container engine bugs... At that point, reality squished the hype for me because I had other stuff I needed to get done and didn't have budget to hire a devops person to sit around mulling these things over.</div><div><br></div><div>From the perspective of the software being containerized, I'm even more skeptical. In my world (bioinformatics) I install a lot of crappy software. We're talking stuff resulting from "I read the first three days of 'learn python in 21 days' and now I'm an expert, just run this after installing these 17 things from pypi...and trust the output" I'm good friends with crappy software, we hang out together a lot. To me it just doesn't feel like making crappy software more portable is the *right* thing to do. When I walk my dog, I follow him with a bag and "containerize" what drops out. It makes it easier to carry around, but doesn't change what it is. As of today I see the biggest benefit of containers as that they force a developer to actually document the install procedure somewhere in a way that actually has to work so we can see firsthand how ridiculous it is (*cough* tensorflow *cough*). </div><div><br></div><div>I got sidetracked on a rant again. Your proposed solution works fine in an IT style computing world, it needs the exact staff IT wants to grow these days and instead of just a self-directed sysadmin it has the potential to need a project manager. I don't see it showing up on many lab/office clusters anytime soon though because it's a model that embraces hype first and in an environment not focused on publishing or press releases around hype, it's a lot of extra work/cost/complexity for very little real benefit. While you (and many on this list) might be interested in exploring the technical merits of the approach, it's actual utility really hits home for people who require that extra complexity and layered abstraction to justify themselves. The understaffed/overworked among us will just write a shell/job script and move along to the next raging fire to put out. </div><div><br></div><div>griznog</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr">On Fri, 30 Nov 2018 at 23:04, John Hanks <<a href="mailto:griznog@gmail.com" target="_blank">griznog@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><br><br><div class="gmail_quote"><div dir="ltr">On Thu, Nov 29, 2018 at 4:46 AM Jon Forrest <<a href="mailto:nobozo@gmail.com" target="_blank">nobozo@gmail.com</a>> wrote:<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">I agree completely. There is and always be a need for what I call<br>
"pretty high performance computing", which is the highest performance<br>
computing you can achieve, given practical limits like funding, space,<br>
time, ... Sure there will always people who can figure out how to go<br>
faster, but PHPC is pretty good.<br><br></blockquote><div><br></div><div>What a great term, PHPC. That probably describes the bulk of all "HPC" oriented computing being done today, if you consider all cores in use down to the lab/workbench level of clustering. Certainly for my userbase (bioinformatics) the computational part of a project often is a small subset of the total time spent on it and time to total solution is the most important metric for them. It's rare for us to try to get that last 10% or 20% of performance gain. </div><div><br></div><div><rant>This has been a great thread overall, but I think no one is considering the elephant in the room. Technical arguments are not winning out in any of these technologies: CI/CD, containers, "devops", etc. All these things are stacking on arbitrary layers of abstraction in an attempt to cover up for the underlying, really really crappy software development practices/models and resulting code. They aren't successful because they are *good*, they are successful because they are *popular*. </div><div><br></div><div>As HPC admins, we tend to report to research oriented groups. Not always, but more often than "normal" IT folks do who are often insulated from negative user feedback by ticket systems, metrics, etc. Think about the difference in that reporting chain:</div><div><br></div><div>A PI/researcher gets her next grant, tenured position, brilliant new post-doc, etc., based on her research. Approach them about expanding the sysadmin staff by 10x people and they'll laugh you out of the room. Ask for an extra 100% budget to buy Vendor B storage rather than whitebox and they'll laugh you out of the room. They want as much raw computation/storage as cheaply as possible and would rather pay a grad student than a sysadmin to run it because a grad student is more likely to stumble over a publication and boost the PI's status. sysadmins are dead weight in this world, only tolerated. </div><div><br></div><div>A CIO or CTO gets his next job based on the headcount and budget under his control. There is no incentive to be efficient in anything they do. Of course, there is the *appearance* of efficiency to maintain, but the CIO 101 class's first lecture is on creative accounting and metrics. Pay more for Vendor B? Of course, they pay for golf and lunch, great people. Think about all those "migrate/outsource to the cloud" projects you've seen that were going to save so much money. More often than not, staff *expands* with "cloud engineers", extra training is required, sysadmin work gets inefficiently distributed to end users, err, I mean developers. Developers now need to fork into new FTEs who need training...and so it goes. More head count, more budget, more power: happy CIO. Time to apply to a larger institution/company, rinse and repeat.</div><div><br></div><div>Think about it from the perspective of your favorite phone app, whatever it may be:</div><div> - app is released, wow this is useful!</div><div> - app is updated, wow this is still useful and does 2 more things</div><div> - app is updated, ummm..., it's still useful but these 4 new things really make what I need hard to get to</div><div> - app is updated, dammit, my feature has been split and replaced with 8 new menus, none of which do what I want?!?!?</div><div><br></div><div>No one goes to the yearly performance review and says "I removed X features, Y lines of code and simplified the interface down to just the useful functions, there's nothing else to be done" and gets a raise. People get raises for *adding* stuff, for *increasing* complexity. You can't tie your name to a simplification, but an addition goes on the CV quite nicely. It doesn't matter if in the end any benefit is dwarfed by the extra complexity and inefficiency.</div><div><br></div><div>Ultimately I blame us, the sysadmins. </div><div><br></div><div>We could have installed business oriented software and worked with schools of business, but we laughed at them because they didn't use MPI. Now we have the Hadoop and SPARK abominations to deal with. <br></div><div><br></div><div>We could have handed out a little sudo here and there to give people *measured* control, but we coveted root and drove them to a more expensive instance in the cloud where they could have full control.</div><div><br></div><div>We could have rounded out node images with a useful set of packages, but we prided ourselves on optimizing node images to the point that users had to pretty much rebuild the OS in $HOME to get anything to run, and so now: containers.</div><div><br></div><div>We could have been in a position to say "hey, that's a stupid idea" (*cough* systemd *cough*) but we squandered our reputation on neckbeard BOFH pursuits and the enemies of simplicity stormed the gates. </div><div><br></div><div>Disclaimer: I'm confessing here. I recognize I played a role in this so don't think I didn't throw the first stone at myself. Guilty as charged.</div><div><br></div><div>Enjoy the technical arguments, but devops and cloud and containers and whatever next abstraction layers arise don't care. They have crept up on us under a fog of popularity and fanbois-ism and overwhelmed HPC with sheer numbers of "developers". Not because any of it is better or more efficient, but because no one really cares about efficiency. They want to work and eat and if adding and supporting a half-dozen more layers of abstraction and APIs keeps the paychecks coming, no one is simplifying anything. I call it "devops masturbation". The fact that pretty much all of it could be replaced with a small shell script is irrelevant. devops needs CI/CD, containers, and cloud to justify existence, and they will not go quietly into that good night when offered a simpler, more efficient and cheaper solution which puts them out of a job. Best use of our time now may well be to 'rm -rf SLURM' and figure out how to install kubernetes. Console yourself with the realization that people are willing to happily pay more for less if the abstraction is appealing enough, and start counting the fat stacks of cash.</div><div></rant></div><div><br></div><div>griznog</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
_______________________________________________<br>
Beowulf mailing list, <a href="mailto:Beowulf@beowulf.org" target="_blank">Beowulf@beowulf.org</a> sponsored by Penguin Computing<br>
To change your subscription (digest mode or unsubscribe) visit <a href="http://www.beowulf.org/mailman/listinfo/beowulf" rel="noreferrer" target="_blank">http://www.beowulf.org/mailman/listinfo/beowulf</a><br>
</blockquote></div></div>
_______________________________________________<br>
Beowulf mailing list, <a href="mailto:Beowulf@beowulf.org" target="_blank">Beowulf@beowulf.org</a> sponsored by Penguin Computing<br>
To change your subscription (digest mode or unsubscribe) visit <a href="http://www.beowulf.org/mailman/listinfo/beowulf" rel="noreferrer" target="_blank">http://www.beowulf.org/mailman/listinfo/beowulf</a><br>
</blockquote></div>
_______________________________________________<br>
Beowulf mailing list, <a href="mailto:Beowulf@beowulf.org" target="_blank">Beowulf@beowulf.org</a> sponsored by Penguin Computing<br>
To change your subscription (digest mode or unsubscribe) visit <a href="http://www.beowulf.org/mailman/listinfo/beowulf" rel="noreferrer" target="_blank">http://www.beowulf.org/mailman/listinfo/beowulf</a><br>
</blockquote></div></div>