Advisor(s)

David R. Kaeli

Contributor(s)

Miriam E. Leeser, Stefano Basagni (1965-)

Date of Award

2009

Date Accepted

8-2009

Degree Grantor

Northeastern University

Degree Level

M.S.

Degree Name

Master of Science

Department or Academic Unit

College of Engineering. Department of Electrical and Computer Engineering.

Keywords

GPGPU, Performance Prediction, System Modeling

Subject Categories

Graphics processing units

Disciplines

Electrical and Computer Engineering

Abstract

Graphics processing units (GPUs) have become widely accepted as the computing platform of choice in many high performance computing domains, due to the potential for approaching or exceeding the performance of a large cluster of CPUs with a single GPU for many parallel applications. Obtaining high performance on a single GPU has been widely researched, and researchers typically present speedups on the order of 10-100X for applications that map well to the GPU programming model and architecture. Progressing further, we now wish to utilize multiple GPUs to continue to obtain larger speedups, or allow applications to work with more or finer-grained data.

Although existing work has been presented that utilizes multiple GPUs as parallel accelerators, a study of the overhead and benefits of using multiple GPUs has been lacking. Since the overhead affecting GPU execution are not as obvious or well-known as with CPUs, developers may be cautious to invest the time to create a multiple-GPU implementation, or to invest in additional hardware without knowing whether execution will benefit. This thesis investigates the major factors of multi-GPU execution and creates models which allow them to be analyzed. The ultimate goal of our analysis is to allow developers to easily determine how a given application will scale across multiple GPUs.

Using the scalability (including communication) models presented in this thesis, a developer is able to predict the performance of an application with a high degree of accuracy. For the applications evaluated in this work, we saw an 11% average difference and 40% maximum difference between predicted and actual execution times. The models allow for the modeling of both various numbers and configurations of GPUs, and for various data sizes---all of which can be done without having to purchase hardware or fully implement a multiple-GPU version of the application. The performance predictions can then be used to select the optimal cost-performance point, allowing the appropriate hardware to be purchased for the given applications needs.

Document Type

Master's Thesis

Rights Information

Copyright 2009

Rights Holder

Dana Schaa



Click button above to open, or right-click to save.

Share

COinS