[Beowulf] MPI mysql

Fri Jun 23 08:17:41 PDT 2006

On Wed, 21 Jun 2006, Felipe Duran wrote:

> Hello All.
>
> 	Could some make mysql run over mpich on a Beowulf cluster?
>
> 	I have a monitoring toll that uses mysql to store host status, but have
> more than 2000 devices to monitor and the DB burn CPU and Memory. To take a
> notion of the problem, I have a dual Xeon 3.2 with 4GB of memory going to 0
> idle on top acess. Becouse this I need to put the DB to run over a cluster
> gaining power and scalability.

I don't know exactly what you mean by this.  AFAIK, mysql is not an MPI
parallelized application in native code, although I did find one bit of
evidence here:

   http://forums.mysql.com/read.php?117,81258,81258

that somebody is trying to turn it into an MPI application.

Parallelizing a DB in a scalable way is nontrivial.  For example, simply
sharing the actual DB >>file<< via e.g NFS mount between N hosts running
the daemon is a recipe for total disaster -- you will encounter
network/disk bottlenecks, serious locking problems, distribution of work
problems, and more.  Giving N hosts their own subset of the DB on LOCAL
disk, with their OWN instance of the daemon should work perfectly but
not need MPI -- mysql already provides a network interface for e.g.
performing queries and so on, with internal locking as required.  All
you need then is a sufficiently smart front end that connects to the N
mysql servers (each with their own subset of the 2000 devices to track)
and distributes the queries etc as needed.  If you really want to get
fancy and run in "real" parallel, you can have that application fork N
threads to do the queries in and deal with their returns.

Using MPI to just initiate those N instances of mysqld (and to otherwise
do nothing) is probably possible but is not at all the way I'd do it
myself.  I'd likely just put the requisite mysql monitor app on each
host configured to watch its subset of the cluster and start it at boot
time, then (as noted) just connect to it from outside when I needed to
collect the aggregated information in some way.

I do have to ask, though, whether you really need mysql (or any real
DB).  SQL DBs are good when you have to use SQL commands to do
significant processing -- JOIN, SELECT etc.  That is, do you really need
to be able to do complex queries or build complex relational tables out
of the data?  DBs do a lot of work, as you are discovering, when poking
data into a table in such a way that queries of this sort are
facilitated and can be managed efficiently.

If you're just storing data so that you can make infrequent graphs of
e.g.  "load average of host b38 on June 16 2006" or "blocks of time that
user joe was running a job on host b1066" or the like, you may well find
it much easier to just store a simple array of timestamped data and
process it sequentially to extract the information.  This makes the
STORAGE side absolutely trivial and scalable -- the "work" is done at
the display side.  Assuming that you don't spend hours and hours a day
building graphs of usage over the cluster lifetime, but rather (say)
once a month want to generate a few nice graphs, it is much smarter to
occupy your one host streaming through the last month's data and
building the graphs without mysql at all even if it takes an hour than
to spend weeks of your time screwing around with the load and hassle of
parallelizing mysql or even using it at all.

FWIW, wulflogger will do this for you already, although I haven't tested
it up to 2000+ nodes.  Just run xmlsysd on all the nodes, pick a monitor
node and run wulflogger with a suitably coarse-grained sample interval
(fifteen minutes, an hour, whatever) and pipe its output into a file.
You'll get a trace like:

rgb at lilith|B:1001>wulflogger -t 1 -d 900
#     Name       Status    Timestamp    load1  load5 load15
lilith             up   1151075430.41    0.33   0.26   0.31
uriel              up   1151074466.65    0.00   0.00   0.00
(the rest of the cluster...)
#     Name       Status    Timestamp    load1  load5 load15
lilith             up   1151076330.42    0.30   0.25   0.31
uriel              up   1151075366.66    0.00   0.00   0.00
...

This is fairly trivial to parse with e.g. perl to extract all of
lilith's load averages over umpty days and graph them, or to sum the
aggregate load average of the cluster, or do whatever you like, no sql
needed.  It is straight ascii, easily human readable as well, and highly
compressible.  wulflogger can also include a lot more information, btw
-- this is the simplest load average display.

There are two tools that can be used to monitor the cluster directly in
real time as well available -- wulfstat (command line in an xterm tty)
and wulfweb, which is basically a perl script that can be used to update
a web-based display of this information every umpty minutes (probably 15
in the case at hand) from wulflogger data.  I keep dreaming of writing
gwulfstat (a real gtk display of the stat data) but so far the time to
do so is just a dream...

    rgb

>
> I make the following configuration.
>
> Mysql
> ./configure --prefix=/usr/local/mysql CC=/usr/mpi-beowulf/bin/mpicc
> CCFLAGS=-I/usr/mpi-beowulf/include/ CXX=/usr/mpi-beowulf/bin/mpiCC
> CXXFLAGS=-I/usr/mpi-beowulf/include/ F77=/usr/mpi-beowulf/bin/mpif77
> F77FLAGS=-I/usr/mpi-beowulf/include/ F90=/usr/mpi-beowulf/bin/mpif90
>
> When mpirun the database returns
> 060621 22:53:56  mysqld started
> 060621 22:53:56 [ERROR] /usr/local/mysql/libexec/mysqld: unknown option '-p'
> 060621 22:53:56  mysqld ended
>
>
> The bpstat show the nodes
> The beostatus show the procs and memory
> The mpi_mandel runs OK, and beostatus show the sharing of cpu.
>
> I dont know, but I think my beowulf version is old. On the disk show me to
> be the version 2.0 but I couldnt find the way to show the right version im
> running.
>
>
> Thanks in Advance.
> Felipe Duran
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu