[Beowulf] Re: [Bioclusters] servers for bio web services setup
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
David Mathog mathog at mendel.bio.caltech.eduThu Jan 13 15:18:27 PST 2005
- Previous message: [Beowulf] PVFS or NFS in a Beowulf cluster?
- Next message: [Beowulf] Cooling vs HW replacement
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
> He wants to set up a bio facility which provides web/grid services > (probably Axis or GT3/4) to a substantial user community (UK-wide but > with access control, so probably in the region of hundreds or perhaps > thousands of potential users). Services will include the usual things > things like BLAST, ClustalW, protein structure analysis etc. -- probably > a small subset of what EBI offers. A couple of things to consider in general: 1. Some of these back end jobs can generate enormously large output files. If you let somebody queue up a 1000 entry fasta file and use the default BLAST format with 50 alignments each to search the nt database - Ugh!! You definitely don't want those coming back through your front end machines if at all possible. You might, for instance, set up the back end nodes to email the results directly. Or to email a page with a link to the results. Unless a job's results are tiny the most you're probably going to want the front end machine to present is a page that looks like: Your XXXXX job finished at 21:09 GMT Results (link) Error messages (link) Other (link) Parameters (link) where all the links go out to different machines, to spread the load around. 2. Even if the result is only a million bytes or so you do not want the users to be loading those pages directly in their browsers. Browsers can take a really long time to open a file like that, but they can typically download it very fast. Have them right click download and then open it in a faster text viewer. (most of the results will be text.) This may not change the load on your server much but it can make a big difference in the end users' perception of the speed of your service. 3. Sanity check everything for valid parameters and expected run times. Let's say you provide an interface to Phylip. Do you really want to let somebody stuff a 200 sequence alignment into DNAPENNY? Not unless you want to lock up the back end machine for the next hundred years. It can be pretty tricky figuring ahead of time how long a job may run, but do the best you can so that at least in some cases the web interface can tell the users up front to change the job parameters. And on the back end absolutely set some maximum CPU time limit for jobs. Better an email "your job was terminated after one hour" than annoyed end users constantly emailing you asking where their jobs went. 4. If at all possible provide the run time parameters back to the end users. People tend to just print the result off the web page and, if the program doesn't echo the parameters when they go back later they can never remember how they ran a particular program. It's also useful for catching bugs in the web interface. 5. If the load is really significant you're going to want at least two, and maybe more, front end web servers. Ie, www.yourservice.org connects at random to www01.yourservice.org, www02.yourservice.org, etc. That will both split the load and reduce the effect of a downed front end server. If all the computation is going out onto a grid these machines won't need much local storage but would presumably need reasonably fast network connections. > Would a single high-spec machine be sufficient for this kind of thing? > Or would one have several servers doing the same thing in parallel? Depends on what the front end server is doing. If it's just shuffling smallish requests off to the end compute nodes it needn't be very large. If it's spooling hundreds of 10 Mb result files per second and then sending those off to the end users interactively it's going to have to be monstruously large (ditto for your network connections). That is, we can't really answer that question specifically until you tell us how much data needs to be stored locally, processed locally, and shipped in and out through the network. Regards, David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech
- Previous message: [Beowulf] PVFS or NFS in a Beowulf cluster?
- Next message: [Beowulf] Cooling vs HW replacement
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
