[Beowulf] Accelerator for data compressing
bill at cse.ucdavis.edu
Thu Oct 2 18:11:10 PDT 2008
Xu, Jerry wrote:
> Currently I generate nearly one TB data every few days and I need to pass it
> along enterprise network to the storage center attached to my HPC system, I am
> thinking about compressing it (most tiff format image data)
tiff uncompressed, or tiff compressed files? If uncompressed I'd guess that
bzip2 might do well with them.
> as much as I can, as
> fast as I can before I send it crossing network ... So, I am wondering whether
> anyone is familiar with any hardware based accelerator, which can dramatically
> improve the compressing procedure..
Improve? You mean compression ratio? Wall clock time? CPU utilization?
Adding forward error correction?
> suggestion for any file system architecture
> will be appreciated too..
Er, hard to imagine a reasonable recommendation without much more information.
Organization, databases (if needed), filenames and related metadata are rather
specific to the circumstances. Access patterns, retention time, backups, and
many other issues would need consideration.
> I have couple of contacts from some vendors but not
> sure whether it works as I expected, so if anyone has experience about it and
> want to share, it will be really appreciated !
Why hardware? I have some python code that managed 10MB/sec per CPU (or 80MB
on 8 CPUs if you prefer) that compresses with zlib, hashes with sha256, and
encrypts with AES (256 bit key). Assuming the compression you want isn't
substantially harder than doing zlib, sha256, and aes a single core from a
dual or quad core chip sold in the last few years should do fine.
1TB every 2 days = 6MB/sec or approximately 15% of a quad core or 60% of a
single core for my compress, hash and encrypt in python. Considering how
cheap cores are (quad desktops are often under $1k) I'm not sure what would
justify an accelerator card. Not to mention picking the particular algorithm
could make a huge difference to the CPU and compression ratio achieved. I'd
recommend taking a stack of real data and trying out different compression
tools and settings.
In any case 6MB/sec of compression isn't particularly hard these days.... even
in python on a 1-2 year old mid range cpu.
More information about the Beowulf