Dimensionality reduction refers to a set of mathematical techniques used to reduce complexity of the original highdimensional data, while preserving its selected properties. Improvements in simulation strategies and experimental data collection methods are resulting in a deluge of heterogeneous and highdimensional data, which often makes dimensionality reduction the only viable way to gain qualitative and quantitative understanding of the data. However, existing dimensionality reduction software often does not scale to datasets arising in reallife applications, which may consist of thousands of points with millions of dimensions. In this paper, we propose a parallel framework for dimensionality reduction of largescale data. We identify key components underlying the spectral dimensionality reduction techniques, and propose their efficient parallel implementation. We show that the resulting framework can be used to process datasets consisting of millions of points when executed on a 16,000core cluster, which is beyond the reach of currently available methods. To further demonstrate applicability of our framework we perform dimensionality reduction of 75,000 images representing morphology evolution during manufacturing of organic solar cells in order to identify how processing parameters affect morphology evolution.
Computational analysis of highdimensional data continues to be a challenging problem, spurring the development of numerous computational techniques. An important and emerging class of methods for dealing with such data is dimensionality reduction. In many applications, features of interest can be preserved while mapping the high dimensionality data to a small number of dimensions. These mappings include popular techniques such as principle component analysis (PCA) [
Linear manifold learning techniques, for example, PCA or multidimensional scaling [
Dimensionality reduction techniques are often compute intensive and do not easily scale to large datasets. Recent advances in highthroughput measurements using physical entities, such as sensors, or results of complex numerical simulations are generating data of extremely high dimensionality. It is becoming increasingly difficult to process such data sequentially.
In this paper, we propose a parallel framework for dimensionality reduction. Rather than focusing on a particular method, we consider the class of spectral dimensionality reduction methods. Till date, few efforts have been made in developing parallel implementations of these methods, other than development of a parallel version of PCA [
We perform a systematic analysis of spectral dimensionality reduction techniques and provide their unified view that can be exploited by dimensionality reduction algorithm designers. We identify common computational building blocks required for implementing spectral dimensionality reduction methods and use these abstractions to derive a common parallel framework. We implement such a framework and show that it can handle large datasets and it scales to thousands of processors. We demonstrate advantages of our software by analyzing 75,000 images of morphology evolution during manufacturing of organic solar cells, which enables us to visually inspect and correlate fabrication parameters with morphology.
The remainder of this paper is organized as follows. In Section
The problem of dimensionality reduction can be formulated as follows: Consider a set
DR techniques have been extensively researched over the last decade [
The goal of DR is to identify a lowdimensional representation
We summarize the general idea of spectral DR in Algorithm
Comparison of selected spectral dimensionality reduction methods.
PCA  Isomap  LLE  

Parameter 

~ 
~ 
Function 

Length of the shortest path between 

Normalization 



Note:
(1) For each
(2) Define directed weighted graph
where
and
usually
(3) Let
from graph
(4) Normalize
(5) Find eigenvectors of
(6) Identify latent dimensionality
(7)
The abstract representation of spectral DR methods in Algorithm
We used the above presentation of DR methods to identify their basic computational kernels. To better understand how these kernels contribute to the overall performance of different DR methods, we performed a set of experiments using domain specific implementation in Matlab. Experiments were carried out for varying
Run time (in seconds) of different DR components (

100  1000  2000  4000 


0.08640  1.34998  5.66768  27.91930 

—  —  —  — 

0.06470  14.9030  130.130  1153.30 

0.08960  0.12601  0.24609  0.49253 
Normalize  0.00195  0.11875  0.74934  5.56630 
Eigensolve  0.02916  0.05536  0.23267  0.85211 
Extract 
0.00020  0.00014  0.00016  0.00022 
As can be seen, the run time of analyzed methods is dominated by two steps, namely,
Another significant DR component is normalization. Although implementation of this step varies between different methods it is invariably dominated by matrixmatrix multiplication. Therefore, we assume overall normalization complexity to be
A final key factor we have to consider is memory complexity of the described kernels. Here, the main contributing structures are matrices
One important caveat that affects the above analysis is the relationship between
Dimensionality reduction very quickly becomes both memory and compute intensive, irrespective of the particular method. Memory consumption arises from the size of input data and the auxiliary matrices created in the process. The computational cost is dominated by pairwise computations and weight matrix construction. The goal of our framework is to scale DR methods to very large datasets that could be analyzed on large parallel machines.
We designed our parallel DR package following the general outline presented in Algorithm
The graph construction procedure is based on identifying
Let
Given pairwise distances, the second step is to identify neighbors of individual points (i.e., vertices of
The computational complexity of the entire procedure can be decomposed into
Given graph
Recall that the formulation of DR methods proposed in Algorithm
The function
Graph
Taking into account the above requirements we obtain the following procedure of constructing
Complexity of this phase is
The goal of normalization is to transform matrix
Computing eigenvalues is the final step in the dimensionality reduction process. Although parallel eigensolvers are readily available, they are usually designed for shared memory and multi/many core architectures [
For these reasons we decided to implement a custom eigenvalue solver that exploits special properties of matrix
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10) Perform rowwise allreduce to obtain
(11)
(12)
(13)
(14)
(15) Compute
(16) Replicate entire vector
(17)
(18)
(19) Deflate local block of
(20)
In general, our approach follows the standard scheme of the power method (lines 3–18), repeated
Extracting eigenvalue and eigenvector in iteration
To conclude this section we would like to emphasize that our solver operates under the same assumptions as any power method. It requires that the first
To assess scalability of our framework and test its performance in reallife applications, we performed a set of experiments using the TACC Ranger cluster [
In the first set of experiments we measured how problem size influences performance of our solution. We created a collection of synthetic datasets consisting of
Run time in seconds for different

 

4096  8192  16384  32768  
16  404.63  3492.28  45288.93  — 
64  101.72  761.75  6906.64  — 
256  33.99  263.24  1655.39  14613.33 
1024  39.06  124.19  682.91  3964.65 
Relative speedup for different problem sizes
The results show that our framework provides very good scalability for large problem sizes in the entire range of tested processor configurations. The superlinear speedup observed for
To further understand how different components of the framework perform, we measured their run time obtained for changing problem sizes. Table
Componentwise run time in seconds for varying problem sizes, and

2048  4096  8192  16384  32768 


0.623  1.389  5.721  22.254  86.706 

9.132  56.517  128.225  457.306  1697.124 
Normalize  0.160  0.905  6.526  223.240  2546.11 
Eigensolve  0.050  0.155  0.188  0.699  2.838 
Run time in seconds of

 

100  1000  10000  100000  
16  0.053  0.530  5.466  92.014 
64  0.015  0.115  1.373  22.984 
256  0.005  0.027  0.349  5.860 
1024  0.002  0.007  0.682  1.875 
In the final test we compared our parallel eigensolver with
Table
Comparison of our eigensolver with





Our solver  SLEPc  Our solver  SLEPc  
16  0.0444  4.8159  2.5315  0.7049 
64  0.0088  2.1666  0.6056  0.8134 
256  0.0705  8.5538  0.1251  2.4143 
1024  0.0742  *  0.1320  * 
4096  0.0411  N/A  0.2024  10.9992 
Solar cells, or plastic solar cells, manufactured from organic blends, that is, a blend of two polymers, represent a promising lowcost, rapidly deployable strategy for harnessing solar energy. While highly costeffective and flexible, their low power conversion efficiency makes them less competitive on a commercial scale in comparison with conventional inorganic solar cells. A key aspect determining the power conversion efficiency of organic solar cells is the morphological distribution of the two polymer regions in the device. Recent studies reveal that significant improvement in power conversion efficiency is possible through better morphology control of the organic thin film layer during the manufacturing process [
The input dataset consists of
Snapshots of microstructures representing final morphologies from 50 different manufacturing processes.
Figure
Scree plot of the ten largest eigenvalues (a) and their proportional energy covered (b).
Figure
Morphology evolution as captured by the first three principal components of the original data. Different colors represent different patterning frequency
Morphology evolution for
Finally, the lowdimensional plots illustrate the ability to achieve the same morphology using different processing conditions. For instance, in Figure
Multiple pathways of morphology evolution.
In this work we illustrate a systematic analysis of dimensionality reduction techniques and recast them into a unified view that can be exploited by dimensionality reduction algorithm designers. We subsequently identify the common computational building blocks required to implement a spectral dimensionality reduction method. We use this insight to design and implement a parallel framework for dimensionality reduction that can handle large datasets and scales to thousands of processors. We demonstrate the capability and scalability of this framework on several test datasets. We finally showcase the applicability and potential of the framework towards unraveling complex processmorphology relationships in the manufacturing of plastic solar cells.
The authors declare that there is no conflict of interests regarding the publication of this paper.
This research is supported in part by the National Science Foundation through XSEDE resources provided by TACC under Grant no. TGCTS110007 and supported in part by NSF PHY0941576 and NSF1149365.