[Beowulf] Heterogeneous Parallel Programming

Eugen Leitl eugen at leitl.org
Sat Dec 1 06:34:15 PST 2012


Heterogeneous Parallel Programming
Wen-mei W. Hwu

This course teaches the use of CUDA/OpenCL, OpenACC, and MPI for programming heterogeneous parallel computing systems. It is application oriented and only introduces necessary technological knowledge to solidify understanding.
Watch intro video
Current Session:
Nov 28th 2012 (6 weeks long)	Go to class
Workload: 6-8 hours/week 
About the Course
All computing systems, from mobile to supercomputers, are becoming heterogeneous parallel computers using both multi-core CPUs and many-thread GPUs for higher power efficiency and computation throughput. While the computing community is racing to build tools and libraries to ease the use of these heterogeneous parallel computing systems, effective and confident use of these systems will always require knowledge about the low-level programming interfaces in these systems. This course is designed for students in all disciplines to learn the essence of these programming interfaces (CUDA/OpenCL, OpenMP, and MPI) and how they should orchestrate the use of these interfaces to achieve application goals.

The course is unique in that it is application oriented and only introduces the necessary underlying computer science and computer engineering knowledge needed for understanding. It covers data parallel execution model, memory models for locality, parallel algorithm patterns, overlapping computation with communication, and scalable programming using joint MPI-CUDA in large scale computing clusters. It has been offered as a one-week intensive summer school for the past four years. In the past two years, there have been ten video-linked academic sides with a total of more than two hundred students each year.
About the Instructor(s)
[Image: https://s3.amazonaws.com/coursera/topics/hetero/instructor-1.png]
Wen-mei W. Hwu is a Professor and holds the Sanders-AMD Endowed Chair in the Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign. His research interests are in the area of architecture, implementation, compilation, and algorithms for parallel computing. He is the chief scientist of Parallel Computing Institute and director of the IMPACT research group (impact.crhc.illinois.edu). He is a co-founder and CTO of MulticoreWare. For his contributions in research and teaching, he received the ACM SigArch Maurice Wilkes Award, the ACM Grace Murray Hopper Award, the Eta Kappa Nu Holmes MacDonald Outstanding Teaching Award, the CAM/IEEE ISCA Influential Paper Award, and the Distinguished Alumni Award in Computer Science of the University of California, Berkeley. He is a fellow of IEEE and ACM. He directs the UIUC CUDA Center of Excellence and serves as one of the principal investigators of the $208M NSF Blue Waters Petascale computer project. Dr. Hwu received his Ph.D. degree in Computer Science from the University of California, Berkeley.

In 2007, Hwu teamed up with then NVIDIA Chief Scientist David Kirk to create a course called Programming Massively Parallel Processors (https://ece408.hwu.crhc.illinois.edu). Thousands of students worldwide follow the course through the web site each semester. The course material has also been used by numerous universities including MIT, Stanford, and Georgia Tech. Hwu and Kirk have also been teaching a VSCSE Summer School version to science and engineering graduate students from all over the world. In 2008, 50 graduates from 17 countries and three continents attended the summer school (http://www.greatlakesconsortium.org/events/GPUMulticore/) in Urbana with another 60 participating remotely. Students in the summer school come from diverse disciplines. In 2009, the summer school was again fully subscribed with 160 students from multiple continents. The 2010 offering was attended by 220 students at four sites linked with HD video. The 2011 attendance further increased to 280 at 10 sites across the U.S. The 2012 summer school is projected to have more than 320 students at 10 linked sites. Due to popular demand, Hwu and Kirk have also been teaching abbreviated versions of their course globally, most recently at the Chinese Academy of Science in 2008, Berkeley in 2010, Braga Portugal in 2010, and Chile in 2011. They have also collaborated with UPC/Barcelona Supercomputing Center to offer an EU PUMPS summer school in Barcelona every year since 2010. In February 2010, Hwu and Kirk published a textbook on programming massively parallel processors. The book has been extremely popular, with more than 10,000 copies sold to date. International editions and translations are available in China, India, Japan, Russia, Spain, Portugal and Latin America. The second edition is already in production.
Course Syllabus

    Week One: Introduction to Heterogeneous Computing and a Quick Overview of CUDA C and MPI, with lab setup and programming assignment of vector addition in CUDA C
    Week Two: Kernel-Based Data Parallel Programming and Memory Model for Locality, with programming assignment of simple and tiled matrix multiplication.
    Week Three: Performance Considerations and Task Parallelism Model, with programming assignment in performance tuning.
    Week Four: Parallel Algorithm Patterns – Reduction/Scan, stencil computation and Sparse computation, with programming assignment of reduction tree.
    Week Five: MPI in a Heterogeneous Computing Cluster: domain partitioning, data distribution, data exchange, and using heterogeneous computing nodes, with programming assignment of a MPI-CUDA application.
    Week Six: Related Programming Models – OpenACC, CUDA FORTRAN, C++AMP, Thrust, and important trends in heterogeneous parallel computing, with final exam.

Recommended Background
Programming experience in C/C++.
Suggested Readings

Although the class is designed to be self-contained, students wanting to expand their knowledge beyond what we can cover in a one-quarter class can find a much more extensive coverage of this topic in the book Programming Massively Parallel Processors: A Hands-on Approach (Applications of GPU Computing Series), by David Kirk and Wen-mei Hwu, published by Morgan Kaufmann (Elsevier), ISBN 0123814723.

In December 2012, Morgan Kaufmann will publish an updated second edition of Programming Massively Parallel Processors with new material tied to the course syllabus including new chapters on parallel patterns and the latest CUDA 5.0 features. Get exclusive preview access to selected chapters from the revised edition when you pre-order the second edition today. You’ll save up to 40% on the second edition and a selection of related titles. You must order using the promotion codes at this page in order to receive the preview chapters. Register for the course and visit the forum for further details.
Course Format
The class will consist of lecture videos, which are between 15 and 20 minutes in length. There will also be standalone homeworks that are not part of video lectures, optional programming assignments, and a (not optional) final exam.

    Will I get a certificate for this course?

    Yes. Students who successfully complete the class will receive a certificate signed by the instructor.
    What resources will I need for this class?

    A laptop or desktop computer with preferably a CUDA enabled GPU. There are more than 200 million laptops and desktops with CUDA enabled GPUs. However, we will provide a teaching kit to allow you to take the class if you don’t have one.
    What is the coolest thing I'll learn if I take this class?

    You will learn how to unleash the massive computing power from mobile processors to supercomputers.

Computer Science: Systems, Security, Networking
Computer Science: Programming & Software Engineering

More information about the Beowulf mailing list