Visualization of Distributed Data Structures for High Performance Fortran-Lil ' " e Languages

This article motivates the usage of graphics and visualization for efficient utilization of High Performance Fortran's (HPF's) data distribution facilities. It proposes a graphical toolkit consisting of exploratory and estimation tools which allow the programmer to navigate through complex distributions and to obtain graphical ratings with respect to load distribution and communication. The toolkit has been implemented in a mapping design and visualization tool which is coupled with a compilation system for the HPF predecessor Vienna Fortran. Since this language covers a superset of HPF' s facilities, the tool may also be used for visualization of HPF data structures. © 1997 John Wiley & Sons, Inc.


INTRODUCTION
Di~trihuted memory multicomputer~ are increasingly being nsC'd in scil'ntiflc computing for high-pcrforrrumce calculations as they offer many ath-antages over shared mPmory architectures concerning scalabilit ~ and costs.On the other hand . . the programmer is faced "•ith the nontrivial problem of data distrilmtion.Since these architectures lack a global data space.cmnputationa!data must he distributed among the local memories of the singlP processors in such a way that good load balancing is achieved and intt>rprocessor comJnunication is kept millimal.Depending on the size and structure of the computational data.resulting data distribution~ can Lwnmw very complex and thus difficult to rate'.

Hcct'i"•d \lm 1 <)().)
Hn ised 1-' .. bruan 1 <)<)(, © 1 'l'f:' b, .lohn\\ ih ~ Son.s.Inc.Since the quality of a data distribution may have a crucial impact on the efficiency of the computation and research in automatic parallelization [1. 2] ha~ not shown satisfactory results up to 11ow.vendors and rest'archers have introduced language extensions to Fortran 77 and Fortran 90 which allow explicit srwcitication of distribution layouts for data arrays.The advantagPs of this approach include minor parallelization efforts and increased portability.High Performance Fortran [:3] is the most popular representative of these languages as it offers a portahlt> standard and allows graceful migration of codf•s.i.e .. llPF codes can be nm 011 seqncutial architP('turcs without modifications.
Following [-+ J. hcsidt>s performance anah ~is tools a well-desi~twd HPF programming enviroumcnt also includt•s an interactive data-mapping assistant whose purpose is to dtoose a suitable data distribution by utilizing kuowledgc from both the eompilm• and the uoer.The nse of graphics and visualization is expected to increase the tool's value.
In this artide \W introduce a graphical tool which provides visualization of IIPF array objt>cts sudt as diHtrilmted data arrays and logical pnwrossor arrays.fn its current form our graphical data distriburion tool (GDDT) aims at the Fortran 77 extension Vierma Fortran [5].which ca11 be rf'garded as mw of HPF"s predeee:;sors.The tool extracts distributt>d array object:; from Vienna Fortran source texts.rvaluates distribution srwcifieations, and creates a range of graphical displays including view;-•rs for data arrays.processor arrays, and diagrams relating to load distribution and communication.
In Sf•ction 2 we motivate tlw advantages of visual support for data distribution and point out tiH~ benefit:-. of this approach for important programming issues.
Section :1 describes the concepts and facilities of GDDT.The tool implements our concepts by mrans of a graphical toolkit consisting of exploratory and (•stimatiort tools.Furtherrnon•, a dear separation betwren compile-time and post-mortem support is made.This separation is essential in order to defirw the seope of visualization data, which affects the tool's eapabili-tiPs.Section 4 surveys related work and segregates our tool from other research systen1s by means of a short classification.;\ concluding general view of design rools for graphical data di~trilmtion systems togNher with a brief view on our futnrf' activitit~s concludes this article.

VISUAL SUPPORT FOR DATA DISTRIBUTION
In HPF.tlw programmer speeifies data parallelism implicitly by means of data distribution directivrs.While directives an' charaeterizrd by a quite simple syntax,.from the sernantical point of virw they have a considrrahle illlpact on the parallelization pro<:ess and involved optimizations performed by the eornpiler.Distribution directives affect two hy issues to lw cousidrrrd with distributed memory <:mnputers: 1. Load distribution.A distribution specification assigns each Plement of a givf'n data array to one or more logieal proePssors.Assuming thf' owner computes rule [6], all computations on a giYf'll array dement an' p1~rforrnf'd by its owner.
Computational load in a parallf'l loop can be

Distribution Layouts
The layout of mTa\• distributions can b(•comP Vf'fV emnplex and hard to imagille.f'Spt'cially when the programmer uses alignments or replications in connPction with high-dimensional arrays.Showing an array distribution as the correspondence between array elements and logical processors gives valuable information on load distribution.Our approach provides navigation through high-dimensimlal arrays and information retrieval based on sPlections of array elements, distribution blocks, and groups of both.

Communication Points and Volume
Sinn' the location of communication points and the involved communication primitives is determined by the compilPr and thus 110t availablP from the source text . .information concerning these issues should be made accessible to the user by means of source text annotations [8) or separate diagrams.Here. a clear separation betwee11 static and dynamic communication aspects must bP done.
Corrmmni!:ation caused by redistributions and computations on input data is mainly data dependent, thus its amount and pffects do not show before runtime.Here we may only obtain a post-mortem visualization based on trace data.By way of contrast overlwad through special communication events.such as overlap communication [ 6] (see Section 3.2.subsection '•Estimation Took').can be predictPd at compiletime.This allows appropriate visualization before the program run.
W P consider visual support at compile-time most effpctive due to the increase of productivity.i.e .. thr program tuning cycle is short Pned significantly by eliminating unnecessary execution stPps.Our concepts cover both compile-time and post-mortem visualization and concentratP on overlap communication and redistributions rPspectively.

OVERVIEW OF GDDT
Tn order to rate thP value of graphics and visualization for thP task of data distribution, wP have developed a soft ware tool which CO\'Pr,; most of the issues discussed in the preyious section.Tt implenwnts individual concPpts and techniques and shows our prf'iiminary results regarding graphical-aidPd data distribution.Figure 1 gives a global owrviPw of the tool\ look and feel.Concerning the input language used . . the facilities of our CDDT are currently hasPd on Vienna Fortran [5], one of HPF'b predecessors.We chose this data-parallel Fortran extension due to the following reasons:

. Major contributions to HPF come from ViPnna
Fortran.Roughly speaking.Vienna Fortran's language extensions for data-parallel programming can be regarded as a superset of those provided by HPF.Thus, all facilities provided by GDDT can also be applied in context with HPF.Furthermore, language-specific parts of the tool's implementation are Pncapsulated into well-defined components in order to simplify future adaptions, primarily to HPF. 2. The tool is tightly coupled with the ViPnna Fortran Compilation System (VFCS) [<J). in particular with the Vienna Fortran compiler, for the purpose of information exchange.The VFCS features an ideal information sourcP for GDDT by exporting basic information for visualization such as distribution spPcifications.overlap descriptions (see Section :1.2, Subsection "Exploratory Tools"), and communication points.
Tlw following two sections give an ovnview of the graphieal facilities provided by CDDT.At first we survPy the extent of information available for visualization where a dear distinction between compile-time and post -rnortem information is necessary.BasPd on this dassification, two interacting suitPs of graphical displays and tools are introduced in the sequel.

1 Visualization Data
\\' e dassifv basic data for visualization into static and dynamic data.In our context, the term static charac-tPrizes data whi(:h can be collected beforp the program run.while (~ynamic data are aYailahiP only during or after run-time.
The most important kind of static data is a Vienna Fortran compilation unit consisting of several subroutines, functions, and a main program.Definitions of processor shapes and distributed data arrays and DiSTRIBUTE statements are of major intPrPot for visualization.There are two ways for CDDT to obtain the contents of a compilation unit: 1. Vienna Fortran source text.Tf only the soun~P text is availahlP.distributed data arravs and related information can be extracted from it by means of a rudimentary parser.The source text may be shown in a built-in text editor during visualization where feedback lwtwPen displays and corrPsponding source tPxt locations is permanently maintainPd (sPe Fig. 1 )..--1: , " : :

Visualization Toolkit
GDDT prm•ides a st~l of visualization tools which allow nploration of graphical array rPprescntations aJI(I diagnuns by IJWllliS of a few point-and-click operation~.ThP major aims of thf' toolkit an• to assi~t the user during t•xploration of large arrays" ith complex distri- but ions and to provide appropriate graphical interpretations derived from data conceming load distribution and <•ommtmication.Following these aims we classify CDDTs facilities into two interacting groups: exploratory toob and estimation tools.

Exploratory Tools
This group covers both static and tknamie data and concentrates on facile navigation within distributed data arrays and logical proct>ssor arrays emphasizing mapping rdationships between data and processors.

Exploration of Mappings. Visualization of array
stnwtures plays an important rolt> here.CDDT displays arrays with arbitrary numbers of dimensions where up to three dimensions can be shown simultaneously and remaining dimensions are hidden.Figure :3 shows example~ of array representations for each rank.
The hatched area with in arrav A 3D is called the selection plane.Since selection of array elements in thrt>f' dimensions may be ambiguous due to loss of infonnation during projt>ction onto the two-dimensional display screen.a two-dimensional slice is provided in order 1 o enable unambiguous selections.The plane may be positioned in parallel to the XY . .XZ. or YZ plane and moved along the third axis by means of a slider (Fig. '±).ln most cases the user cannot perceive characteristics of distribution layouts directly in the global representation so navigation tools arc required to perform basic view manipulation.For this purpose GDDT offers primitives for translation.zooming, and rotation.
Control of the tools conforms to the A VS environ-ment [ 15 J. so users familiar with this system can easily utilize GDDT s facilities.
Data arrays and processor arrays are shown in separate windows called array viewers.Array viewers can be opened by selecting arrays in the DSE window and pressing the T/cu• button.Curn•ntly only one data array vit'wer and one pnH'essor array viewer can be shmn1 simultaneously.It must be noted that data and processor arrays can be combined separately.i.e .. it is possible to shmv a data array's distribution on another processor array shape tlum the one given in the distribution specification.
\\'hen the user has opened data a nay and processor array viewers and adjustt>d the reprt'sentation according to his or her needs.the distribution can he explored by means of selections.Figure 4 shows relationships between a three-dimensional data array and a two-dimensional processor array.
The data arrav viewer shows the structm•t' of a particular distributed data array.The user selects array elements in order to know how they are mapped to logical processors.:\earest-neighbor mappings can be verified by evaluating the mappings of adjacent clements.\\'henever an element is selected.a colored cube (or a square in two dimensions) appears and the processor array viewer highlighb all logical processors which own the element in the same color.\\' e refer to the data element as the source., the owning processors are called targets.In Figure 4 on the lt>ft two arbitrary, neighboring elements-named C ( 17. 12. 1) and C ( 18. 12, 1)have been selected.Since they are lomted on the same processor (P2D (7,4))., the two sources and their targt't are highlightt'd in the sarne color.
The processor array viewer shows the shape of a -,; . "'~-j- particular logical processor array and provides the same facilities as the data array viewer.The selection of a logical processor lets the data array viewer highlight all data blocks assigned to this processor.This facility shows characteristic patterns for different distributions.In Figure 4 on the right processor.P2D(3, 4) and its four direct neighbors have been selected.1. Tlw cyclic distribution along the Y axis is rhar-acterizPd by three rPplications of the target shape (horizontally).
:2.The block distribution appears as an enlargenwnt of the target shape in thP X diiTction ( vertif'ally).\Yhile elementwise selection might suffice for small arrays.it is not reasonable for large data arrays or processor arrays that represent massively parallel systems.ln onkr to promotP scalability of the exploration facilities, multiple selection levels are provided: Fig. -t). .).Clobal st"lections (Select All in Fig. 4).
FurtlwnnmT a slicing facility for tlw graphical represent at ion is needed i 11 order to allow detailed invt"stigation of arbitrary array portions.Replay of Dynamic Distributions.Arrays whost' distribution may change at run-time are called dynamic in the tt>rminulogy of HPF and Vienna Fortran.Promising uses for such distributions include dynamic load balancing for codes that are executed on different architectures or where the size or structure of computational data varies.A good example for the latter (~ase are particle-in-cell codes [16] .
ln order to provifk an insight into the amount of redistributions performed during the program run.thP Vienna Fortran Engint' [17] records all events rt'lating to data migration and forwards them to CDDT.The distribution sequence can then be rt>played by thP animator tool.Tf the source text is available to GDDT.associations between events and corresponding lint's of the sourcP text are shown.been gairwct from the sourer text by consecutive execution of redistrilmtioJJ statenwnts.Sequenct's derived frorn actual trace data r<'qnirr sophisticated measurement.Facilities for tlu~ir generation arr expected to lw available in the fut urc.
RPgarding FigurP ? .it rnust lw noted that thf' exe-cutable> directiws REDISTRIBUTE and REALlCN of IIPF are both subsumed bv the DISTRIBUTE statement in Yienna Fortran.Thf' animator tool shows time.location.and specification of dynamic distributions.provides controls for stepwise and continuous r<'play.and allows the user to t'xclude events from replay.which may he useful for source-based sequenct's gained from source texts containing conditional statements.

Estimation Tools
From our point of view, the main purpose of visualization in the area of scientific paral!Pl computing is to reduce large amounts of complPx data to meaningful di:;plays and so to facilitate fJfTception of patterns.such displays should assist programmers to rnake clt>ar decisions between alternatiY<' data distributions . .assuming that they know hm\• to interpret thP graphical reprt'sentation.As soon as a tool lf'ts the prograrrmwr recognize a possibility to improve a distribution.it has proven to be usf'ful.\Vith GDDT.. estimations 1U<' guided by diagrams rt>lating to the key issues ofload distribution and com-mmJication.Since wP concentrate on compile-time estimations at the moment.the information base consists of static data including IIPF data structures and overlap descriptions.ln order to round out the usefulness of the visual toolkit.the ust'r may retum changes to tlw information base to the VFCS . .where thev can be utilized for furtlwr analysis or for code generation.
Load Distribution.For each data arraY.GDDT can create a two-dimensional chart callf'd a data load diagram which illustrates how t'venly the array has been distributed among all logical processors.FigurP 8  shows a data load diagram for array C from Figure 4.For t>ach processor.a bar depicts how many array t>lt>ments have bt>t>n mapped to it with respect to the maximal proct>ssor load.Tlw coloring tries to simplify the distinction of singlt> bars.ebpecially for the case of large processor numbers.Here tht> representation can abo bt> slict>d ,,•itll respt>ct to processors.Tht> info browser lwlow shows statistical inforrnation.t>.g ... absolute elt>mf'nt numbers if tht> user selects a single bar.T AJad imbalances as given in Figurf' S characterize regular mappings to two-dimensional processor arrays "•here processors with small indt>x values receive more elements than others.
Our f'xrwrit>nce has shown that the restrictt>d set of distribution functions provided by HPF hardly produces serious load imbalances.Bv "•av of contrast concerning Yit>nna Fortran.tht> availability of snch tools has been proven valuable since irrt>gular distribution ftuJct ions (hlot~ks of arbitrary sizt>s or distribution by means of mapping arrays) may produce distributions which art> hard to undt>rstand.
Overlap Communication.Overlap art>as around distribution blocks incrt>ase the efficiency of local computations based on rt>gular patterns.Communication rt>suhing from updates of overlap rt>gions is determint>d by tht> compiler aml dept>nds on the chost>n overlap dt>scription.During tlw npdate phase, t>ach processor rt>ceives contrilmtions to the overlap an~a of its block from the owning processors of adjacent blocks.Since tht> programmer can nt>ither control tht> computation of overlap descriptions nor estimatt> tht> impact of overlap cmmnunication on the overall f'X!~ctttion time.display of ovt>rlap cormnunication shows inefficit>ncies resulting from unsuitable data distributim•s or ill-cluJ-\ISliAJJZATION OF DATA STHn:n RES 123 sen overlap descriptions in an t>arly phase of dt>vt>lopment.
Overlap communication hetwt>f'n pairb of processorb is shown in GDDT ll\ means of a thn•e-dimensional chart called mTrlap commtu1ication displav.For each processor pair (Sender.Heceiwr) a bar imlicates the volume of data elenwnts to lw t>xchanged.Also lwre the coloring helps the user to distillf!:Uish single bars.In order to cope with thf' large amonnt of information, the display can be zoomed . .rotated.translated.and sliced.Bars nm lw selected to obtain dt>tailed information about conmwnication.Displays based on regular distributions (as :-;hown in Fig. 9) exhibit characteristic stt>ncils.which show differences with respect to the involved overlap area widths, distrilmtion functions.replications.and the processor array rank.Stencils with equal bar heights (as shown in Fig. <J) depict well-balanced owrlap communication.By way of contrast.ill-cluJsf'n overlap dt>scriptions or block distribntions with varying block sizes (possible in Vienna Fortran) may product> bars with diffnent heights, warning tht> user of imbalanct>d conununication.

RELATED WORK
Comp:ut>d to other activities occurring during parallel softwart> development-such as process mapping, performancf' analysis, and de hugging-data distribution has not beeu supported adequately by visnal tools yt>t.Furthennore, the lack of a taxonomy for existing rt>search tools oftt:n causes confusion with respect to thPir applicability.ln order to rate the contribution of our work \\T classify some well-known systt>ms for data distribution visualization accordi11g to the data structure lt>vel they operatf' on.
1. ProhlenJ-related stmcturt>.This class of tools visualizes mappings of scienti1lc data stntctun~b such as mt>shes and matrices.Data distribution is perfornwd automatically by partitioning algorithms and mainly aims at selected target an•hiteeturf's.Most popular represt>ntatives of this dass are DecTool [18].CraphTool [19], and DDT [20].MATRIX [21] also visualizes runtime statt>s.2. Computational structure.Tools that operate on this level are associated with a COTilT<'tf' programming languag<'.Existing work mainly concentrates on run-time behavior of data structures.Well-known rcprest>ntatives are VIPS [22] for dynamic data structtu-es in Ada and VISTA [12].which shows contents of program

CONCLUSIONS
lu this article \\ e motivated visual support for Pfficient utilization of IIPF's data distribution facilities.Our approach proposes a f!:ntphical toolkit ('(JllSlstmg of exploratory and estimation tools that assish the prograrmncr in designing and rating data distributions.We presPntcd a software tool which implements our nmcepts awl showed several opportnnities where visualization and navigation can contribute to the understanding of complex distributions.
An important dPsign issue is tlw separation of static and dyna1nic concerns.On the one hand . . it specifies to which extent the user can contrilmte to generation of pfficient parallel programs by means of graphical operations.On the other han(L it dPfine~ the amount of information available to the user for graphical evaluation.ln each case.interaction with extPrnal systems such as compilers or execution systPms is necessary in order to create meaningful displays and to obtain sPnsible ratings.
Summarizing our expPricnn's with GDDT and following [ 1 OJ.we dPfine tlw design goals for graphical systems with focus on data distribution as follows: 1. Ease of u ndPrstanding.:3.Portability.Data distribution systems should not depend on a particular target architecture.Systems aiming at HPF -like languages fulfill this requirement implicitly since data are distributed to virtual topologiPs.Concerning such systems, flexibility is an essential property in order to provide support for some important predecessors and derivatives of HPF and to cope with future language extensions.
For the future.we plan adaptation of GDDT to IIPF and increased utilization of mnununication information provided by the compiler.ln order to visual-izP HPF arrays . .an additional parser will be integrated which handles HPF's syntactical characteristics and creates an internal model based on Vienna Fortran's equivalent facilities.Regarding visualization of communication we plan to incorporate timings and sophisticated source-text associations.

.I
FIGURE 4 Exploration of a distribution.

: 3 .
Tfw rt'plieation of the two-dimensional distribution along thP Z axis is indicatPd hy means of bars (ascending frorn the selection plane).
Figure ;) shows an t'xamplc of array slicing.For the purposP of simplicity.the control clemenh of the data array viewer before and aftt>r slicing "~ert' omitted.Display of Overlap Areas.Overlap areas [ 6 J are a means to optimize communication in connection with regular computations on data arravs whose dimen-swns are distributed blockwise.Each block can be associated with an overlap area which denotes the smallest rectilinear contiguous area around the block (~Ont aining all nonloeal dements being accessed by the block's process.The extent of the overlap area is determined from an overlap dt"scription.Each overlap description is spe(~ifie to one singh' array and computed by the compiler during parallelization by means of tlw array•s distribution and refert'rH•.e patterns given in tlw code.Tht> overlap description is made available to CDDT together with tht: array• s distribution.Wht>n the block-sPkct ionlevd is chosPrL as shown in Figure 6 . .a block• s overlap area can be shown by means of a key-click combination.Furthermore.the overlap description nm be modified and returned to the VFCS.

Figure 7 FIGURE 6
FIGURE 6  Display and modification of overlap areas.

FIGURE 8
FIGURE 8 Display of load distribution.
This work is partially supported by the Austrian Ministry of Science, Research, and Art (BMWFK) project CEI-PACT, subproject Work Par:kage 1: ••Advancrd Compiler Technology." considf'rt:d balanced if arrav elements have been di~tribnted to thr procPssor~ evenly.2. Comnumieation between processors.Au array distribution implicitly defines coumwnication requirements according to the array's usage in the code.Communication can he krpt minimal Cf' [ 14].Jn our work.Wf' COIIC!'lltraI C on visualization of rt>distributions and realignnwnb of data arrays togdhcr "•ith tfw impact of such !'\Tilts on the ovnall t>xecution time.Hereby dynamic visualization data cover timings of rt>distrihutions and realignments causing data migration togetlwr with their locations in tlw sourTP tPxt.CDDT ntilizt•s such data.which art> provided by the \'FT~.in ordn to replay migration of data arrH\s.

1 .
Singlt'-t'lt'mt'nt selections via mouse clicks or textual specification.~. \I uhielernent selections via ~~rubber-banding~• or tPxtual specification by means of slices.:~.Distribution block sdections via mouse clicks (sec Fig.;)).-t.Plane selections in :3D ( Sf'lf'ct Pir:k Plane 111 Tlw user• s ahilit y to perceivt' distribution patterns.mapping relationships.and anomalies in distrilmtion.data load.or communication should he promotPd by rrwans of appropriate displays.A. BesidPs displays for distrilmtt'd data structures a separate display for thP target archi-tP.ctureshould be provided.B. Primitives for basic view manipulation (including rotation, zooming.and slicing) are essential iu oniPr to choose suitable settings for large or complex data structures.C. Relationships between data of various detail and processors shall bt> stressed by means of correlative linking.e.g.. by using equal colors.D. The user should be able to temporarily intersperse graphical representations with additional information concerning transformation and optimization, e.g., overlap areas around data blocks or communication points \vithin the source text.2. Ease of usage.This issue is particularly important for exploratory tools as herf' the user does not know a priori which setting will promote undnstanding.A. The user should be able to carry out basic view manipulations quickly by means of mouse operations or key-mouse combinations.B. Several selection levels should allow the user to choose at which level of detail a distributed data structure shall be explored (single elements.slices, distribution blocks, groups of Plernents or blocks).C. Data distribution svstems should be seamlessly integrated into compilation systems and so produce environments that provide pennanent associations between source code, data structure mapping, and estimation displays.