It's Mosix what I need! (more help)

Mon May 22 02:10:22 PDT 2000

Javi Ortiz <postjoh at evalga.geneti.uv.es> wrote:
> 
I've been lurking in the hope this wouldn't be needed. Javi, I may use 
Spanish if you so wish (being in Madrid/Spain helps a bit).

Anyway, first thing you need to do is think seriously about what you
realy want to accomplish, and that requires paying atention to what
you need.

So, to answer:

	I assume your jobs only need to be started and later give back
an answer. If they won't need interactive facilities (e.g. results are
returned as files on disk) the simplest solution is using a queueing
system, like DQS or NQS. This requires less administrative/learning
burden and will do the job.

	Moreover, in such instances, even running MOSIX might be a problem
since process migration is costly. Er... how much?

	If you have big jobs, either big code, big data, big code+data or
big both, moving them around might kill your performance. I've got here
some processes eating up 500MB memory in average from start. Migrating
these is rather expensive. Make a few migrations and your performance
will go down the sink. Who else knows better than you the kind of jobs
you'll be running?

	If these are middle-sized/small there is still a penalty. You
may help using faster network connections, but there's still a penalty.

	Point is: before using MOSIX you need a forecast of how many
processes you expect to run by time unit on the machines (let's not
call this still a cluster, a farm might do), their size and how often 
you might need to migrate them.

	So, next question is, do you need migration? If you don't, 
just use a queueing system. Unless you foresee a serious need in a
near future as your needs grow/change that will require it.

	Centralize data? In principle yes. But maybe not so. If you may
partition data and jobs in a more or less even way, that might reduce
communication overhead on data intensive jobs. By the way, can you?

	It's YOU who needs to assess the data dependency of your
programs. I have another set of jobs. FASTA/BLAST may use partitioned
databases. Most users search all the data, so one may partition the data,
assign an even part of it to each local disk, run the processes on that
local data and in the end gather the top 40 from each to select the
total top 40. That way there will be NO communications whatsoever among
processors until they are all finished, and then only a small amount
in the final gathering step. For these, there's even no need at all for
Mosix or a queueing system, load balancing is explicit.

	So, the actual answer is again "look into yourself" (or your
own data). You can't find a good answer here, in the outside.

	Same for you network connections. Is your data easily partitionable?
If not, you may still be better off with a centralized point. But, what
about availability? If your centralized server goes down everything will.
And speed? Maybe you will get beter performance with two servers.
So, next thing to ask yourself is whether this worries you. And what
amounts of data will your processes be using? And in what way?

	It's not enough knowing your processes use 500 MB of data. They
might benefit from a gigabit connection in principle, and they will in
many cases... unless like some other jobs I have here they do so in
many small chunks (read a tiny bit, save some data, read another tiny bit,
save some more data, etc...). Yes, that's probably badly written software,
but that's what we have. Now, with a high number of short communication
pieces, latency becomes an issue too. Our FASTA jobs here read huge
amounts of data and did run faster on remote, newer, faster disks with high
speed network connection, while on the same setup, the above jobs did run  
better with the much slower, older but local disks. And yes, I was as 
susrprised as anyone else might.

	Efficiently using a higher speed connection may imply gathering
many small packets in a bigger one... but then processes may be waiting
for this bigger one to build up or a timeout to happen for a good deal of
their wall time. If you have one big process using many tiny send/receive
requests, its performance might decrease unless you decrease window size
and conversely network trhoughput.

	So, again, how do we know? The answer, once more is not outside
here, you should look for it inside you (or your insider knowledge of
your needs).

	Then, I don't understand the need for PC's to be tower, half-tower
or pizza-boxes. Is it got anything to do with your physical environment?
As for the DAT. What is it for? If all your data will fit in a single
DAT tape, you may as well replicate everything on disks with mirroring
and have it easier/cheaper. If not, you'll be better off with a library.
Unless you're storing genetic databases and may recover them overnight
from your local EMBnet node.

	As for the easy one. No, you do not need any graphics card to run
Xwindows applications if they are to be displayed on the user's own
screen. Xwindows apps are _client_ applications, the server being the
user's program that serves his/her screen to display the windows. It's the
server (the machine that will display the windows) that needs the graphics
card, not the client (the machine running the application). So your nodes
do NOT need a graphics card at all. Your users' machines do.

	If I were you, I'd meet the users, learn as much as possible about
your processes and their behaviour/needs, read a few FAQs, visit a few
web sites, clear up a few basic concepts and _think a lot_.

	Anyway, if you need more detailed help and would rather have it in
Spanish (sorry, my knowledge of Valenciá is close to nil), I'll be delighted
in lending you a hand.

<OFFTOPIC>
Would you mind presenting my best wishes to Andrés Moya if he's around? Thanks.
</OFFTOPIC>

				j

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 463 bytes
Desc: not available
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20000522/a88e70e9/attachment.sig>