<!DOCTYPE html>
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
</head>
<body>
<p>Hi folks:</p>
<p> Quick post for the day job. AMD (my employer) is looking for
expert systems administrators for a mix of our internal HPC
systems, and helping customers stand up their AI and HPC clusters.</p>
<p> AMD systems include a small version of Frontier, some El Cap
adjacent nodes, and a variety of large GPU accelerator based
nodes. Customer systems range from smaller 64 node systems
through multiple orders of magnitude larger systems.<br>
</p>
<p> Needed skills/attributes include:</p>
<ul>
<li>5+ years in an HPC systems admin/HPC SRE role<br>
</li>
<li>expert Linux knowledge, debugging, problem resolution<br>
</li>
<li>strong hardware debugging experience</li>
<li>SLURM management, setup, configuration</li>
<li>development experience in Python, Bash, C/C++</li>
<li>RDMA network setup/config/testing</li>
<li>Benchmarking and performance measurement</li>
<li>Monitoring systems</li>
<li>Storage systems, including Lustre, NFS, BeeGFS, etc.</li>
<li>Installing and configuring device drivers for advanced
hardware: GPUs and networks<br>
</li>
<li>Modules and configuration (HPE/Cray and lmod)<br>
</li>
<li>capability to work in/around AMD and customer data centers,
and occasional travel to those DCs</li>
</ul>
<p> Desired experience/attributes include:</p>
<ul>
<li>Proximity to Austin Tx, or Santa Clara/San Jose offices,
though remote is possible <br>
</li>
<li>CUDA and/or ROCM experience</li>
<li>HPE/Cray programming environment and modules</li>
<li>familiarity with AI frameworks</li>
<li>US Citizenship or green card</li>
</ul>
<p> I don't have a job req to point to yet, but should have this
soon. You can reach me here, or on
<a class="moz-txt-link-freetext" href="https://linkedin.com/in/joelandman">https://linkedin.com/in/joelandman</a> . I am the hiring manager.</p>
<p> Regards</p>
<p>Joe<br>
</p>
</body>
</html>