This page is the first assignment for CS 240A at UCSB in the Winter 2014 semester. The topic of this assignment is to find an application of parallel computing and describe in detail how the application takes advantage of parallel processors.
Machine Learning is a branch of artificial intelligence used to solve problems such as speech recognition and handwriting recognition. The “MPI-Based Parallel and Distributed Machine Learning Platform” developed by Microsoft researchers is a High Performance Computing optimization utilizing a master-slave model of communication.
The example problem used by the Microsoft group is a Expectation-Maximization problem, which utilizes the same set of initial model parameters across all processing nodes. The API developed by the research group utilizes a master/slave design across a high performance Computing cluster. MPI is used to broadcast the initial model parameters to the processing nodes, the nodes perform computations, the master aggregates results and rebroadcasts new model parameters.
The structure of the Machine Learning platform is shown in the following figure:
For benchmarking the researched used the k-mean clustering algorithm because the data consumption demands are the highest.
The application run’s on Microsoft’s HPC cluster, which the HPC Sever 2008 Ranks #3 and #5 on the Little Green 500 and is capable of petaflop speeds.
References of Interest:
Data Mining of Text
A Fast, scalable machine learning C++ Library
Microsoft Machine Learning Group
Machine Reading – A Freely available data set of 660 short stories
“Designing an MPI-Based Parallel and Distributed Machine Learning Platform on Large-Scale HPC Clusters”
International Workshop on Statistical Machine Learning for Speech Processing, IWSML 2012
Scaling Up Machine Learning
Large Scale image classification
HPC Windows Server Performance Benchmarking