Many scientists who implement computational science and engineering software have adopted the object-oriented (OO) Fortran paradigm. One of the challenges faced by OO Fortran developers is the inability to obtain high level software design descriptions of existing applications. Knowledge of the overall software design is not only valuable in the absence of documentation, it can also serve to assist developers with accomplishing different tasks during the software development process, especially maintenance and refactoring. The software engineering community commonly uses reverse engineering techniques to deal with this challenge. A number of reverse engineering-based tools have been proposed, but few of them can be applied to OO Fortran applications. In this paper, we propose a software tool to extract unified modeling language (UML) class diagrams from Fortran code. The UML class diagram facilitates the developers' ability to examine the entities and their relationships in the software system. The extracted diagrams enhance software maintenance and evolution. The experiments carried out to evaluate the proposed tool show its accuracy and a few of the limitations.
Computational research has been referred to as the third pillar of scientific and engineering research, along with experimental and theoretical research [
In this critical type of software, Fortran is still a very widely used programming language [
One of the greatest challenges faced by CSE developers is the ability to effectively maintain their software over its generally long lifetime [
To address this objective, we developed and evaluated the
The contributions of this paper are as follows: the ForUML tool that will help CSE developers extract UML design diagrams from OO Fortran code to enable them make good decisions about software development and maintenance tasks; description of the transformation process used to develop ForUML, which may help other tool authors create tools for the CSE community; the results of the evaluation and our experiences using ForUML on real CSE projects to highlight its benefits and limitations; workshop feedback that should help SE develop practices and tools that are suitable for use in the CSE domain.
The rest of this paper is organized as follows. Section
This section first describes important CSE characteristics that impact the development of tool support. Next, it presents two important concepts used in the development of ForUML, reverse engineering and OO Fortran. Finally, because one of the benefits of using ForUML is the ability to recognize and maintain design patterns, the last subsection provides some background on design patterns.
This section highlights three characteristics of CSE software development that differentiate it from traditional software development. First, CSE developers typically have a strong background in the theoretical science but often do not have formal training about SE techniques that have proved successful in other software areas. More specifically, because the complexity of the problems addressed by CSE generally requires a domain expert (e.g., a Ph.D. in physics or biology) to even understand the problem, and that domain expert generally must learn how to develop software [
Second, some SE tools are difficult to use in a CSE development environment [
Third, CSE software typically lacks adequate development-oriented documentation [
Reverse engineering is a method that transforms source code into a model [
The transformation process in ForUML is based on the XMI format, which provides a standard method of mapping an object model into XML. XMI is an open standard that allows developers and software vendors to create, read, manage, and generate XMI tools. Transforming the model (Fortran code) to XMI requires use of the model driven architecture (MDA) technology [
The basic idea of using an XMI file to maintain the metadata for UML diagrams was drawn from four reverse engineering tools. Alalfi et al. developed two tools that use XMI to maintain the metadata for the UML diagrams: a tool that generates UML sequence diagrams for web application code [
Doxygen is a documentation tool that can use Fortran code to generate either a simple, textual representation with procedural interface information or a graphical representation. The only OOP class relationship Doxygen supports is inheritance. With respect to our goals, Doxygen has two primary limitations. First, it does not support all OOP features within Fortran (e.g., type-bound procedures and components). Second, the diagrams generated by Doxygen only include class names and class relationships but do not contain other important information typically included in UML class diagrams (e.g., methods, properties). Our work expands upon Doxygen by adding support for OO Fortran and by generating UML diagrams that include all relevant information about the included classes (e.g., properties, methods, and signatures).
There are a number of available tools (both open source and commercial) that claim to transform OO code into UML diagrams (e.g., Altova UModel, Enterprise Architect, StarUML, and ArgoUML). However, in terms of our work, these tools do not support OO Fortran. Although they cannot directly create UML diagrams from OO Fortran code, most of these tools are able to import the metadata describing UML diagrams (i.e., the XMI file) and generate the corresponding UML diagrams. ForUML takes advantages of this feature to display the UML diagrams described by the XMI files it generates from OO Fortran code.
This previous work has contributed significantly to the reverse engineering tools of traditional software. ForUML specifically offers a method to reverse engineering code implemented with modern Fortran, including features in the Fortran 2008 standard. Moreover, the tool was deliberately designed to support important features of Fortran, such as coarrays, procedure overloading, and operator overloading.
Fortran is an imperative programming language. Traditionally, Fortran code has been developed through a procedural programming approach that emphasizes the procedures and subroutines in a program rather than the data. A number of studies discuss approaches for expressing OOP principles in Fortran 90/95. For example, Decyk described how to express the concepts of data encapsulation, function overloading, classes, objects, and inheritance in Fortran 90 [
Object-oriented Fortran terms (adapted from [
Fortran | OOP equivalent | Fortran keywords |
---|---|---|
Module | Package | Module |
Derived type | Abstract data type (ADT) | Type |
Component | Attribute | — |
Type-bound procedure | Method | Procedure |
Parent type | Parent class | — |
Extend type | Child class | Extends |
Intrinsic type | Primitive type | For example, real, integer |
The Fortran 2003 compiler standard added support for OOP, including the following OOP principles: dynamic and static polymorphism, inheritance, data abstraction, and encapsulation. Currently, a number of Fortran compiler vendors support all (or almost all) of the OOP features included in the Fortran 2003 standard. These compilers include [ NAG ( GNU Fortran ( IBM XL Fortran ( Cray ( Intel Fortran (
Fortran 2003 supports procedure overriding where developers can specify a type-bound procedure in a child type that has the same binding name as a type-bound procedure in the parent type. Fortran 2003 also supports user-defined constructors that can be implemented by overloading the intrinsic constructors provided by the compiler. The user-defined constructor is created by defining a generic interface with the same name as the derived type.
Algorithm
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15) (16) (17) (18) (19) (20) (21) (22) (23) (24) (25)
Data abstraction is the separation between the interface and implementation of the program. It allows developers to provide essential information about the program to the outside world. In Fortran, the
With the increase in parallel computing, the CSE community needs to utilize the full processing power of all available resources. Fortran 2008 improves the performance for a parallel processing feature by introducing the Coarray model [
A design pattern is a generic solution to a common software design problem that can be reused in similar situations. Design patterns are made of the best practices drawn from various sources, such as building software applications, developer experiences, and empirical studies. Generally, we can classify the design patterns of the software into classical and novel design patterns. The 23 classical design patterns were introduced by the “Gang of Four” (GoF) [
In general, a design pattern includes a section known as
Several researchers have proposed design patterns for computational software implemented with Fortran. For example, Weidmann [
This section describes the rationale and benefits of developing ForUML and details the transformation process used by ForUML.
The CSE characteristics described in Section
Although there are a number of reverse engineering tools [
This work is primarily targeted at CSE developers who develop OO Fortran. The ForUML tool will provide the following benefits to the CSE community. The extracted UML class diagrams should support software maintenance and evolution and help maintainers ensure that the original design intentions are satisfied. The developers can use the UML diagrams to illustrate software design concepts to their team members. In addition, UML diagrams can help developers visually examine relationships among objects to identify code smells [ Because SE tools generally improve productivity, ForUML can reduce the training time and learning curve required for applying SE practices in CSE software development. For instance, ForUML will help developers perform refactoring activities by allowing them to evaluate the results of refactoring using the UML diagrams rather than inspecting the code manually.
Since Fortran 2003 provides all of the concepts of OOP, tools like ForUML can help to place Fortran and other OOP program languages on equal levels.
The primary goal of ForUML is to reverse engineering UML class diagrams from Fortran code. By extracting a set of source files, it builds a collection of objects associated with syntactic entities and relations. Object-based features were first introduced in the Fortran 90 language standard. Accordingly, ForUML supports all versions of Fortran 90 and later, which encompasses most platforms and compiler vendors. We implemented ForUML using Java Platform SE6 so that it could run on any client computing systems.
The UML object diagram in Figure
The Fortran model.
Figure
The transformation process.
The Fortran code is parsed by the Open Fortran Parser (OFP) (
We have customized the ANTLR libraries to translate particular AST nodes (i.e., type, component, and type-bound procedure) into objects. These AST nodes are only the basic elements of UML class diagrams. In fact, a UML class diagram includes classes, attributes, methods, and relations. The parsing actions include two steps. The first step verifies the syntax in the source file and eliminates source files that have syntax problems. It also eliminates source files that do not contain any instances of type and module. For example, ForUML will eliminate modules that contain only subroutines or functions. After this step, ForUML reports the results to the user via a GUI. In the second step, the parser manipulates all AST nodes, relying on the model described earlier. Note that ForUML only manipulates the selected input source files. Any associated type objects that exist in files not selected by the user are not included in the class diagram.
During the extraction process, ForUML excerpts the objects and identifies their relationships. ForUML determines the type of each extracted relationship and maps each relationship to a specific relationship’s type object. Based on the example code in Algorithm Composition represents the whole-part relationship. The lifetime of the part classifier depends on the lifetime of the whole classifier. In other words, a composition describes a relationship in which one class is composed of many other classes. In our case, the composition association will be produced when a type object refers to another type object in the component. The association refers to a type not provided by the user and as a result it does not appear in the class diagram. In the UML class diagram, a composition relationship appears as a solid line with a filled diamond at the association end that is connected to the whole class. Generalization represents an
We developed the XMI generator module to convert the extracted objects into XMI notation based on our defined rules for mapping the extracted objects to the proper XMI notation. The rules for mapping the extracted objects and XMI document are specified in Table
Fortran to XMI conversion rules.
Fortran | XMI elements |
---|---|
Derived type | UML: class |
Type-bound Procedure | UML: operation |
Dummy argument | UML: parameter |
Component | UML: attribute |
Intrinsic type | UML: DataType |
Parent type | UML: Generalization.parent |
Extended type | UML: Generalization.child |
Composite | UML: association |
Figure
Sample code snippet of Fortran supported by ForUML.
To visually represent the extracted information as a UML class diagram, we import the XMI document into a UML modeling tool. We decided to include a UML modeling tool directly in ForUML to prevent the user from having to install or use a second application for visualization. We chose to include ArgoUML as the UML visualization tool in the current version of ForUML. We had to modify the ArgoUML code to allow it to automatically import the XMI document. Of course, if a user would prefer to use a different modeling tool, he or she can manually import the generated XMI file into any tool that supports the XMI format.
After importing the XMI file, ArgoUML’s default view of the class diagram does not show any entities in the editing pane. Like the WYSIWYG (“what you see is what you get”) concept, the user needs to drag the target entity from a hierarchical view to the editing pane. To help with this problem, we added features so that ArgoUML will show all entities in the editing pane immediately after successfully importing the XMI document. Note that the XMI document does not specify how to present the elements graphically, so ArgoUML automatically adjusts the diagram when rendering the graphics. Each graphical tool may have its own method for generating the graphical layout of diagrams. The key reasons why we chose to integrate ArgoUML into ForUML are that (1) it has seamless integration properties as an open source and Java implementation; (2) it has sufficient documentation; and (3) it provides sufficient basic functions required by the users (e.g., export graphics, import/export XMI, zooming).
ForUML provides a Java-based user interface for executing the command. To create a UML class diagram, the user performs these steps. Select the Fortran source code Select the location to save the output. Open the UML diagram.
Figures
A graphical user interface of ForUML.
Selection of the Fortran code.
Generating the XMI.
Viewing the UML class diagram.
To assess the effectiveness of ForUML, we conducted some small experiments to gather data about its accuracy in extracting UML constructs from code. This section also provides some lessons learned from the studies and feedback from the SEC-HPC’13 workshop audience.
The following subsections provide the details of a controlled experiment to evaluate ForUML. The accompanying website (
We evaluated the
We performed the evaluations as follows. We manually inspected the source code to document the number of relevant objects in each package. Note that we performed this step multiple times to ensure that the numbers were not biased by human error. We ran ForUML on each software package and documented the number of relevant objects included in the generated class diagram. To compute To compute We investigated whether the generated class diagrams could present the design pattern classes existing in the subject systems.
The five software packages we used in the experiments were (1) ForTrilinos ( ForTrilinos: ForTrilinos consists of an OO Fortran interface to expand the use of Trilinos (
Table
Evaluation of ForUML: recall (extracted data/actual data).
Packages | Subpackages | Type | Procedure | Component | Inheritance | Composition |
---|---|---|---|---|---|---|
ForTrilinos | Epetra | 16/16 | 304/304 | 17/17 | 12/12 | 2/2 |
Aztecoo | 1/1 | 12/12 | 1/1 | 0/0 | 0/0 | |
Amesos | 1/1 | 7/7 | 1/1 | 0/0 | 0/0 | |
ForTrilinos | 48/48 | 11/11 | 139/139 | 4/4 | 4/4 | |
|
||||||
CLiiME | Model | 23/23 | 167/167 | 61/61 | 32/32 | 32/32 |
|
||||||
PSBLAS | Modules | 50/50 | 1309/1309 | 160/160 | 34/34 | 28/28 |
prec | 20/20 | 208/208 | 28/28 | 24/24 | 12/12 | |
|
||||||
MLDP4 | miprec | 11/11 | 0/0 | 67/66 | 0/0 | 10/10 |
|
||||||
MPFlows | Spray | 10/10 | 55/55 | 29/29 | 2/2 | 3/3 |
|
||||||
|
180/180 | 2073/2073 | 503/503 | 108/108 | 91/91 | |
(100%) | (100%) | (100%) | (100%) | (100%) |
Figure
The class diagram (partial): MPFlows.
The following subsections describe our experiences using ForUML on a real CSE project and discuss feedback on ForUML we received during the SE-HPCCSE’13 workshop.
ForUML played a significant role in the development of the CLiiME package [
This project also deployed three design patterns. Figure
The class diagram (partial): CLiiME.
The UML diagram must be properly arranged to foment design comprehension. A large class diagram that contains several classes and relationships requires additional effort from users’ as they try to assimilate all the information. Unfortunately, the built-in function layout in ArgoUML does not refine the layout in diagrams that contain numerous elements. Although ArgoUML provides the ability to zoom in or zoom out, large diagrams can still be difficult to view. Figure
Example of larger classes.
In addition to our own experiences, we can make some observations based on the discussions during the SE-HPCCSE’13 workshop regarding the use of UML in CSE applications. UML helps partition the coding workloads in large projects. For larger projects, especially libraries, it is a matter of dwelling on the “use cases” and designing an interface perhaps with UML. Then feature coding tasks can be distributed to other developers. In contrast, CSE has been reluctant to adopt object-oriented design, whereas in other standard mathematics, linear algebra design bears some similarly to OOP considering larger mathematical structures as objects. Many audiences believed that better SE practices, including adoption ForUML could lead to a better adaptation of codes to multiple architectures. However, one reason for the lack of advance SE in CSE is that CSE developers try to use UML for everything. The audience suggested that other domain specific languages (DSLs) could be useful targets for generating information from legacy code. Further, during the workshop’s discussion, there were some questions that inspired us to study the impact of ForUML on the CSE community. We believe that we can find answers to these questions by conducting human-based studies of ForUML. Below is a list of questions that arose during the workshop. Is UML really useful for CSE developers? Can ForUML and UML support larger application sizes and multiple developers? Many graphical design models serve multiple purposes. Some users can convey a high-level design for discussion, and others want to display the low-level of design. In the context of CSE software development, does UML serve all these needs well? Which aspects of the CSE application should be documented in the UML?
Based on the experimental results, ForUML provided quite precise outputs. ForUML was able to automatically transform the source code into correct UML diagrams. To illustrate the contributions of ForUML, Table
A brief comparison between UML tools.
Features | Rose enterprise [ |
Doxygen | Libthorin | ForUML + ArgoUML | Rigi [ |
---|---|---|---|---|---|
Visualization | UML | Graph | UML | UML | Graph |
Reverse eng. (Fortran) | No | No | Ver.90 | Yes | No |
Hide/show detail | Yes | No | Yes | No | No |
Inheritance | Yes | No | Yes | Yes | No |
Layout | A/M | A | A | A/M | A |
Note: automatically adjusted (A) and manually adjusted (M).
We believe that ForUML can be used by three types of people during the software development process, especially for CSE software. Stakeholders or customers: ForUML generates documentation that describes the high-level structure of the software. This documentation should make communication between developers and the stakeholders or customers more efficient. Developers: ForUML helps developers extract design diagrams from their code. Developers might need to validate whether the code under development conforms to the original design. Similarly, when developers refactor the code, they need to ensure that the refactoring does not break exiting functionality or decompose the architecture. Maintainers: they need a document that provides adequate design information to enable them to make good decisions. In particular, maintainers who are familiar with other OOP languages can understand a system implemented with OO Fortran.
However, ForUML has a few limitations that must be addressed in the future as follows. Provide more relationships: two other relationships that we frequently found in the Fortran applications are as follows. Dependency: in practice, dependency is most commonly used between elements (e.g., packages, folders) that contain other elements located in different packages. The relationship is represented by a dashed line with an arrow pointing toward a class that is an argument in a procedure that is bound to another class. Realization: it refers to the links between either the interface or abstract and its implementing classes. A dashed line is connected to an open triangle for a type that extends an abstract type. Note that although the current version of ForUML does not support these relation types, the users can edit the relationships in the ArgroUML after importing the XMI document. Incorporation of other UML visualization tools: currently, ForUML integrates ArgoUML as the CASE tool. We plan to build different interfaces to integrate with other UML tools, so users can select their tool of preference. Although many UML CASE tools support the use of XMI documents, there are several XMI versions defined by object management group (OMG) and different tools support different versions. We also plan to develop a plugin for Photran ( Generate UML sequence diagram: a single diagram does not sufficiently describe the entire software system. Sequence diagrams are widely used to represent the interactive behavior of the subject system [
This paper presents and evaluates the ForUML tool that can be used for extracting UML class diagram from Fortran code. Fortran is one of the predominant programming languages used in the CSE software domain. ForUML generates a visual representation of software implemented in OO Fortran in the same way as is done in other, more traditional OO languages. Software developers and practitioners can use ForUML to improve the program comprehension process. ForUML will help CSE developers adopt better SE approaches for the development of their software. Similarly, software engineers who are not familiar with scientific principles may be able to understand a CSE software system just based on information in the generated UML class diagrams. Currently, ForUML can produce an XMI document that describes the UML class diagrams. The tool supports the inheritance and composition relationships that are the most common relationships found in software systems. The tool integrates ArgoUML, an open source UML modeling tool to allow users to view and modify the UML diagrams without installing a separate UML modeling tool.
We have run ForUML on five CSE software packages to generate class diagrams. The experimental results showed that ForUML generates highly accurate UML class diagrams from Fortran code. Based on the UML class diagrams generated by ForUML, we identified a few limitations of its capabilities. To augment the results of experiments, we have created a website that contains all of the diagrams generated by ForUML along with a video demonstrating the use of ForUML. We plan to add more diagrams to the website as we run ForUML on additional software packages. We believe that ForUML conforms to Chikofsky and Cross II [
In the future, we plan to address the limitations we have identified. We also plan to conduct human-based studies to evaluate the effectiveness and usability of ForUML by other members of the CSE software developer community. To encourage wider adoption and use of ForUML, we are investigating the possibility of releasing it as open source software. This direction can help us to get more feedback about the usability and correctness of the tool. Demonstrating that ForUML is a realistic tool for large-scale computational software will make it an even more valuable contribution to both the SE and CSE communities.
The authors declare that there is no conflict of interests regarding the publication of this paper.
The authors gratefully thank Dr. Damian W. I. Rouson, at Stanford University, and Dr. Hope A. Michelsen, member of the Combustion Chemistry Department at Sandia National Laboratories, for their useful comments and helpful discussions which were extremely valuable.