[Beowulf] Hyper Convergence Infrastructure

Joshua Mora joshua_mora at usa.net
Sat Oct 3 15:43:50 PDT 2015


Hello David.
Please visit this link for some answers to your questions.
https://www.nersc.gov/research-and-development/archive/cloud-computing/

I have worked on HPC, Cloud, Virtualization, Hadoop and hyperconverged
projects.
I think the challenge is to understand where each of these "computing
frameworks" make sense. 

The larger to overlap of features the more challenging to choose. Since there
is also a convergence of technologies for both HPC and Cloud, it also makes it
more challenging to choose. 

Additionally, understanding the business helps you understand where is the
hype. For example, people wants to store "forever" their media files (pictures
and videos) and have them "readily available" anywhere. Because of it, there
is huge interest in Object based storage, geodistributed, with high
availability. Then that data can be leveraged for data analysis using
computing frameworks such as openstack/cloudstack/..
I am just making this up as an example of where there is a lot of hype.
Understanding tiered storages for hot/cold data and how you connect that with
the computing infrastructure, and having a "flexible" storage and networking
that connects efficiently things (ie. software defined storage and network)
with QoS is also a hot area as well. 

Certainly, just reading some marketing oriented information is not going to
help you. Personally I play with the technologies (from several vendors if
possible) to understand things, and then I can understand better the marketing
information.

Attending to Supercomputing, Openstack, Hadoop/Strata/Spark conferences and
why not a big data conference as well, will help you link a certain set of
problems with a certain set of computing frameworks or solutions.

You need to define your own key metrics and find out if those key metrics also
are defined and make sense under these computing frameworks.

I am sure I didn't give you a precise answer to your question but hopefully it
will make you think what kind of problems you want to solve and either seek a
special purpose build solution (likely 100% customized, hence likely within
HPC world, and on premise).

Or a general purpose and very affordable, highly available, with disaster
recovery (eg, during the course of the execution, the VM was stopped, moved
somewhere else, restarted...), with non deterministic performance that you are
ok with it (eventually it finishes) and you can use it with little effort
(Cloud computing frameworks/Virtualization, restful APIs).

Or you need to process so much "junk data, ~20TB" to find the needle (eg. a
recommendation), that you cannot host the entire thing in main memory so you
have to pump in and out from disk to main memory, with several computers racks
with local slow disks and slow interconnect, that likely will fail, but that
it has redundancy built into the algorithms to complete the big data task over
night (eg. Hadoop MR) because the failed task can be restarted in another node
in another rack.

On the hyperconverged front, the main challenge that I see is the QoS (I guess
I am performance biased). That is, to guarantee I/O performance to disks and
over the network between the computing nodes when the network pipes are shared
for storage as well.
Economic solutions will compromise those network pipes, so for I/O intensive
solutions, you have to be careful on the setups. Therefore understanding the
I/O requirements of the applications is fundamental to understand if the
hyperconverged solution of choice is going to choke.

Best regards,
Joshua Mora. 
------ Original Message ------
Received: 05:19 AM CDT, 10/03/2015
From: "Lechner, David A." <dlechner at mitre.org>
To: "beowulf at beowulf.org" <beowulf at beowulf.org>
Subject: [Beowulf] Hyper Convergence Infrastructure

> Hi
> I am wondering if anyone on this list has benchmarked the impact of an HCI
solution on performance, or how this newest "next big thing" compares to a new
Linux/intel commodity solution?
> Is there some performance penalty fron the virtualization?
> How do price points per FloP compare?
> Is the advantage in the Systems administration, and is there a comparable
open source solution?
> 
> Thanks in advance for  any insights.
> Dave Lechner
> 

> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
> 




More information about the Beowulf mailing list