A Fortran-Keras Deep Learning Bridge for Scientific Computing

Implementing artificial neural networks is commonly achieved via high-level programming languages like Python, and easy-to-use deep learning libraries like Keras. These software libraries come pre-loaded with a variety of network architectures, provide autodifferentiation, and support GPUs for fast and efficient computation. As a result, a deep learning practitioner will favor training a neural network model in Python where these tools are readily available. However, many large-scale scientific computation projects are written in Fortran, which makes them difficult to integrate with modern deep learning methods. To alleviate this problem, we introduce a software library, the Fortran-Keras Bridge (FKB). This two-way bridge connects environments where deep learning resources are plentiful, with those where they are scarce. The paper describes a number of unique features offered by FKB, such as customizable layers, loss functions, and network ensembles. The paper concludes with a case study that applies FKB to address open questions about the robustness of an experimental approach to global climate simulation, in which subgrid physics are outsourced to deep neural network emulators. In this context, FKB enables a hyperparameter search of one hundred plus candidate models of subgrid cloud and radiation physics, initially implemented in Keras, to then be transferred and used in Fortran to assess their emergent behavior, i.e. when fit imperfections are coupled to explicit planetary scale fluid dynamics. The results reveal a previously unrecognized strong relationship between offline validation error and online performance, in which the choice of optimizer proves unexpectedly critical; this in turn helps identify a new optimized NN that demonstrates a 500 fold improvement of model stability compared to previously published results for this application.


Introduction
The Fortran programming language was originally developed in the 1950s and published in 1957. It was created to help programmers implement solutions for scientific and engineering problems on the IBM 704 computer, which at the time needed to be written in machine or assembly language. Fortran has been regarded as revolutionary and possibly one of the most influential software products in history [18]. Having evolved many times since its creation, with the most recent release in 2018, each version adds new features and capabilities. Fortran initially gained popularity and remains a widely used language due to its fast and efficient computational ability. Additionally, a strength of Fortran is its backwards compatibility, which allows modern compilers to build code written in the 60s and 70s.
Though not as popular as it once was, Fortran is still used in specialized fields including oceanography, computational physics, climate modeling, and aerospace. Because of Fortran's continued use, a great deal of legacy code, as well as new code, exists. Unfortunately, not all existing code bases can be rewritten in more mainstream languages, due to their size and complexity. Therefore, when algorithms and extensive libraries are created in modern languages, backwards compatible methods must be developed in order to make them available in older legacy code, like Fortran. In recent years, the rise of machine learning and deep learning has lead to successful applications in a host of domains. Substantial improvements in the size of the training sets and available computing power have led to a new wave of implementations [32,45]. In turn, this success has increased the usage and dissemination of deep learning. These methods have been applied to a variety of domains,e.g. ranging from remote sensing [56,33] to computer vision [38,51,39,37,52], and to games [2,46]. The success and popularity of deep learning has inspired the creation of powerful software libraries written in several modern programming languages. However, Fortran is not among the modern languages benefited by these deep learning libraries. This absence leaves Fortran programmers with few options to implement deep neural networks.
The implementation of deep neural networks, in Fortran, may be achieved via two primary pathways. One solution is to rewrite all existing deep learning libraries in Fortran. The second solution is to leverage existing frameworks and bridge available functionalities to Fortran. The former is extremely arduous and time consuming, considering the size and scope of existing deep learning packages and the dizzying pace of their evolution [14,1,40]. The latter approach, which this paper describes, is to allow users to leverage the power of existing frameworks while providing a bridge between paradigms where deep learning resources are plentiful and those where they are scarce. In this way, we can leverage aspects of currently available deep learning software libraries, like Keras [14], and bring them to large-scale scientific computing packages written in Fortran. To this end, we propose the Fortran-Keras Bridge (FKB) -A two-way bridge that connects models in Keras with ones available in Fortran. The source code is publicly available and can be found here: https://github.com/scientific-computing/FKB. We begin by reviewing existing Fortran projects that would benefit from the integration of FKB.

Fortran Projects
FKB can be integrated with many existing large-scale and computationally intensive projects written in Fortran. These projects will benefit from the easy integration of neural network models, which FKB makes possible.
For example, Fortran is used to do a great deal of work in climate and ocean modelling. For instance, the US-produced Community Earth System Model [25] is written in object-oriented Fortran-90; this is the most widely used climate model in the world. So are the other climate simulation codes used by the US Department of Energy [21] and the National Oceanographic and Atmospheric Administration's Geophysical Fluid Dynamics Laboratory [23]. Meanwhile, the Nucleus for European Modelling of the Ocean (NEMO) engine is used for studying ocean circulation problems on regional and global scales [49] and making future predictions, is also written in Fortran. The Hybrid Coordinate Ocean Model (HYCOM) [55], also used for ocean modelling, extends traditional ocean models to allow for a smooth transition from the deep ocean to coastal regimes. Researchers have also developed models for the modeling of waves and wind stress [16].The Weather Research and Forecasting Model (WRF), is arguably the most widely used numerical weather prediction models for regional decision support [42]. Since its release in 2000, the number of WRF registrations has grown to over 36,000. WRF produces atmospheric simulations with support for special applications including air chemistry, hydrology, wildland fires, hurricanes, and regional climate, and is again a Fortran-based model.
The list goes on. Code Saturne [3], developed byÉlectricité de France, and NEK5000 [41], are Fortran open-source computational fluid dynamics packages. Code Saturne allows for user customization via Fortran subroutines which is just one application domain for FKB. NEK5000 is actively used in the Center for Exascale Simulation of Advanced Reactors (CESAR) projects. Fortran has also been continually used for molecular modeling within chemistry and physics. The Chemistry at Harvard Macromolecular Mechanics (CHARMM) Development Project has produced a powerful molecular simulation program in Fortran [12]. This simulation program primarily targets biological systems but can also be used for inorganic materials. A similar tool, NWChem, has been developed by the Molecular Sciences Software Group at the Pacific Northwest National Laboratory [53]. NWChem is a computational chemistry software which includes quantum chemical and molecular dynamics functionalities. Within the molecular physics domain, Fluktuierende Kaskade (FLUKA) is a proprietary tool for calculations pertaining to particle transport and interactions with matter [17].
The aforementioned models and projects can leverage the FKB library in order to leverage neural networks within their code bases. For example, neural networks have proven useful in modeling sea surface temperature cooling for typhoon forecasting [27], therefore the integration of FKB with tools like NEMO, HYCOM, or WRF models is a possibility. In a recent study of computational fluid dynamics, Ling et al. solves the Reynolds-averaged Navier-Stokes equations, similar to Code Saturne and NEK5000. By implementing deep neural networks, the authors report that the architecture improved prediction accuracy [35]. Finally, the Fluka tool contains a wide range of molecular physics applications    [54]. For global climate simulation, there is proof that deep neural networks can offer skillful alternatives to assumption-prone approximations of sub-grid cloud and turublence physics in the atmosphere [44,10]. We hope that the FKB library enables Fortran users to expand their research and projects to include neural networks.
Having reviewed a number of Fortran based projects that can leverage FKB, we now introduce the two sides of this bridge. The following sections will develop the foundations on which to anchor each side of this two-way bridge. We start by introducing the deep learning anchor.

The Python Anchor (Deep Learning)
Many programming languages offer tools and libraries for implementing artificial neural networks. However, in recent years, Python has emerged as the clear favorite within this domain. Metrics in Figure 1a display Python's dominance. Python is used nearly 50% more than the second most popular language, R. Python's ubiquitous presence in machine learning makes it the obvious choice to leverage existing libraries for Fortran. The question then becomes, which available software library, within Python, is best suited to bridge with Fortran?
Of the available deep learning libraries, Keras [14] is the most popular among practitioners ( Figure  1b). Keras is an Application Programming Interface (API) built on top of Tensorflow [1], that provides users the ability to implement, train, and test networks quickly. This convenience encapsulates much of the low-level complexity one must manage when implementing deep networks from scratch. Keras is able to abstract many of the complicated aspects of Tensorflow while still providing customizability and ease of use. This combination makes Keras the first choice of many for deep learning applications. As a result of its popularity and ease of use, Keras is the clear choice on which to build one end of the two-way bridge. Figure 2, depicts the positioning of the Python anchor, FKB/P, in relation to the deep learning ecosystem. The Keras API leverages Python to build deep neural networks. FKB/P resides on top of Keras, so that it can access models produced from Keras and transmit them to the Fortran anchor, FKB/F. This allows for integration with Fortran applications that wish to leverage deep neural network architectures. Having described the deep learning anchor within Python, the next section develops the foundation on which to anchor the bridge in relation to Fortran.

The Fortran Anchor (Scientific Computing)
Several attempts have been made to implement neural networks in Fortran, with some success [15,6,7,11,36]. However, many implementations resort to hacking a single-use neural network by hand, or binding code from other languages [36]. Along these lines, one may consider accessing Python functionality directly from Fortran, by running a Python instance within Fortran. While providing flexibility and ease of use, this is vulnerable to extreme deficiencies in speed and computational resources. As a result, this solution becomes untenable for large-scale computation projects like the ones described in Section 2.
There are a small number of existing neural network libraries in Fortran [36,34,15]. The most recent and well developed library is Neural Fortran [15], a light weight neural network library, written natively in Fortran. The Neural Fortran library provides the ability to implement artificial neural networks of arbitrary size with data-based parallelism. Additionally, in benchmark studies Neural Fortran was shown to have comparable compute performance with Keras while maintaining a lower memory footprint. This library offers a foundation to anchor the Fortran side of the two-way bridge, FKB/F. By extendingand building on top of -Neural Fortran we are able to convert Keras models to ones readily available in Fortran and implement them in existing Fortran projects.
The positioning of FKB within the scientific computing ecosystem is shown in Figure 2. The Fortran anchor, FKB/F, is able to use models originally constructed and trained in Keras, which can then be transferred to Fortran via FKB/P. In order to use these models, the Fortran side of FKB implements a neural network library. This portion of FKB can be used within large-scale scientific computation software, like the projects identified in Section 2.
By leveraging FKB it becomes seamless to train networks in Python and transfer them to Fortran, to run inside large scale simulations. Similarly, neural network models constructed in Fortran can be transferred to Python for additional analysis, expansion, and optimization -including hyperparameter searches using available tools in Python [24,47,5]. As both sides of the bridge have been properly introduced, the following section will describe specific features and functionalities of FKB.

Features of FKB
Once a neural network is trained in high level APIs like Keras, the practitioner has few practical avenues for using this model in Fortran-based projects. One approach may be to hard code network operations inside Fortran while manually moving parameters from the Keras model. Several examples of this can been seen in climate modeling [44,10,20,19].
To provide one specific example, in [44], the authors trained a DNN to represent sub-grid cloud and convective energy transport processes, in Keras. To assess its credibility, they needed to test the DNN's two-way interactions when thousands of replicates of it were embedded within a coarse-resolution global atmospheric model, written in Fortran -neural network emulated clouds interacting with determinstic physical calculations of planetary geophysical fluid dynamics. As the global atmospheric simulator does not offer native neural network support the authors hard coded their DNN model into the global simulation software framework. This approach has obvious disadvantages. Every minor change made to the model in Keras requires rewriting the Fortran code. If one wishes to test a suite of models in Fortran this approach becomes untenable. As each network may require different hyperparameters and as a result, necessitates rewriting and compiling the Fortran code for every new model. This drastically limits the breadth of available models to be tested within the simulator. This is currently a major roadblock to ongoing debates in the climate simulation community more broadly, about whether or not to use DNN representations of subgrid physics in next-generation climate modeling. Insufficient testing of diverse candidate NNs means that little is known about how minor imperfections in the fit of one NN can amplify when the NN is coupled to fluid dynamics, an issue that is just beginning to be explored [9].
These issues demand a solution, in the form of a bridge between Keras and Fortran. The FKB software solves these issues via two key elements. First, it provides a neural network library implemented in Fortran (FKB/F). Second, it offers the ability to parse existing Keras models into formats consistent with the Fortran neural network library (FKB/P). As a result, users can switch, seamlessly, back and forth between Python and Fortran. This context provides a way for iterative neural network tuning (Python) and testing (Fortran), with a simple way to translate between the two software environments. Additionally, FKB offers currently unavailable Fortran specific features for neural networks. It will be useful to highlight those new features while documenting the format to which FKB adheres. The following subsections describe the features of the Python and Fortran anchors, FKB/P and FKB/F respectively.

FKB/P
Keras models -once built, trained, and saved -are stored in Hierarchical Data Format 5 (HDF5) files. These files contain the network architecture, weights, biases, and additional information -optimizers, learning rates, gradients, etc. From the HDF5 file, FKB/P parses the network architecture, extracting the number of layers, activation functions, nodes per layer, and all weights and biases. This information is converted to match the Fortran neural network configuration in FKB/F. This allows users to build an equivalent network in Fortran which can easily be loaded and used within a Fortran environment. If any modifications to the model are made inside Fortran, FKB/P will parse this back into the equivalent HDF5 file to be used in Keras once again.
On the other hand, networks may be initially constructed in Fortran. After initial training and testing a user can switch to Keras for further evaluation. From Keras, users can conduct additional testing or hyperparameter tuning where these tools are readily available [24].
The ability to seamlessly pass neural network architectures between Python and Fortran is essential for any practitioner working in this space. This bridge allows users to take advantage of the high level Keras API -training on computationally efficient GPUs -then to insert their trained model into a Fortran code base. The functionality provided bridges the chasm between Keras and Fortran.

FKB/F
The Fortran anchor of FKB leverages and extends the original Neural Fortran library. Below we introduce new implemented features in order to make Neural Fortran more flexible and able to communicate on the two-way bridge.

Custom Layers
In order to implement neural networks in Fortran, FKB leverages and extends the Neural Fortran library [15]. The format of the prototype Neural Fortran library that we build on was only capable of implementing a fully connected layer. Forward and backward operations occurred outside this layer -in the network module. An example of this is shown in Listing 1. From the listing, one can observe hard coded matrix multiplication of layer weights, addition of biases, and the use of activation functions inside the network module. This network level subroutine accesses and modifies individual layer attributes. Not only is this rigid format inconsistent with modern neural network implementation paradigms [14,1,40], but it makes it impossible to implement other layers or custom operations. In order to increase the flexibility of the library, operations must be encapsulated inside the layer, consistent with current practice. In FKB we introduce an extendable layer type module (Listing 2). In order to implement a layer, one simply extends the layer type and specifies the construction of the forward and backward functions. Adhering to this format offers several advantages. By restructuring the format of the library, we offer the ability to implement arbitrary layers. Additionally, in the network module all layers are stored in an array of pointers. This leads to the encapsulated version shown in Listing 2 wherein a forward pass, in the network module, calls the layer-specific forward function. In this way, all operations are confined to the layer module, and the output from one layer may be passed as input to the next. function output(self, input) result(last_layer_output) ... ! iterate through layers passing activation forward do n = 1, size(layers) call layers(n) % end do ! get output from last layer last_layer_output = layers(size(layers)) % end function output Listing 2: Forward pass in the FKB network module. Each layer simply calls its own forward function. The technical operations occur within each layer. FKB supports fully connected or dense layers, dropout [48,4], and batch normalization [26]. An example of extending the layer type in order to implement a Batch Normalization layer is shown in Listing 3. This format translates to increased functionality and customizability to the user. As a result, more standard layers from Keras are available, while giving users the flexibility to implement their own custom operations.

Training in Fortran
It is necessary to distinguish between the terms offline versus online for the following section. These terms serve to distinguish two different settings in which a neural network can be used in a Fortran computing package. Both settings can make use of historical or simulated data to train an artificial network. The distinguishing feature is how the predictions of a model are used. In an online setting, predictions from the model are used to evolve a physical process. The predictions at one time step effect how the system acts at the following time step. As a result, inputs to the model will change based on how the model acted in the past. In offline settings, this is not the case. Predictions made in the past do not effect the input to the model in the future.
In many cases offline training may be sufficient to learn a model, if enough prior data is available. However, in some cases online training may be the method of choice. To this end, FKB is equipped to handle backpropagation for gradient descent optimization of a specified cost function.
The aforementioned layer encapsulation of forward and backward operations (Section 5.2.1) become extremely valuable in training. Instead of all operations occurring within the network module [15], they are contained in layer-specific functions. Much like the forward pass, backward operations occur in the layer. In this fashion, each layer is responsible for computing its own gradients with respect to its parameters and returning the gradient with respect to the layer below it.
Online training can serve a variety of purposes. First, a neural network model may be learned entirely in Fortran, for instance based on the evolving state variables during the integration of a physical dynamical system simulation, and then transferred to Keras after the fact. In this setting the ground truth, from the simulator, is passed to the network for it to calculate its errors and update its parameters accordingly through backpropagation. Second, online training could serve to provide gentle corrections to an imperfect pretrained model, for instance to hedge against the amplifications of its imperfections that are only revealed once the NN is coupled to other physical calculations. Here a model is trained offline in Keras and transferred to Fortran (Section 5.1). In some cases, for a variety of reasons, the offline training data may have a differing distribution than that of the online data. In such a setting, it proves beneficial to offer slight corrections to the network. Finally, a secondary model may be constructed to learn and compensate for the deficiencies in the primary model. In this way the two networks work together to balance out any instability issues.
The ease of use and proper format directly results from the encapsulation of layer operations. Online training offers a solution to tackle a suite of potential problems. As a result, models may be updated with slight corrections or learned entirely online.

Custom Loss Functions
In many applications practitioners may wish to optimize a unique quantity -a function other than a mean squared error or cross entropy. This is common when target variables interact or additional information is known about their relationship in a desired application. For example, in modeling any physical system, predictions from a neural network must not violate physical constraints -energy cannot be created or destroyed in the system. To satisfy this restriction a loss function can be written so as to quantify the amount of violation of physical properties. This construction can then be minimized to alleviate constraint infractions [8].
The implementation of custom loss functions is common enough that high level APIs like Keras, Tensorflow, and PyTorch provide this ability in their code base [14,1,40]. As FKB is designed for those working in the physical sciences where environmental, physical, or application-specific constraints are common, it provides the ability to implement custom loss functions. In order to take advantage of this functionality, users must implement their desired loss function, just as they would in Keras. As FKB does not provide automatic differentiation, the derivatives with respect to the input are also required for training. Once these functions have been specified they can be dropped into the existing framework and run normally, much like Keras. This capability is demonstrated through the implementation of the crossentropy loss function in Listing 4. In order to implement this previously unavailable loss function, we first declare two functions. First, the crossentropy scalar loss is. Second, the loss with respect to the input logits is derived. These two functions are then referenced as the loss and d loss, respectively. By providing this functionality users may leverage a variety of loss functions which can be used to minimize application-specific quantities. Once described, they may be included with the existing framework and used during online training.

Ensembles
Ensembles consist of different models, each trained on the same, or bootstrapped, data. The output of the ensemble will be an average of all its member's predictions. In machine learning, ensembles of models typically perform better than any one of its members alone. The ensemble strategy exploits the fact that each model will make different errors. Therefore, when averaged together these predictions become more accurate, as certain errors get smoothed out. A general consensus from machine learning practitioners is ensembling gives 1-2% improvement in performance [13].
As a result of this averaging, ensembles provide a boost in performance as well as additional robustness. In domains where physical constraint violations yield stability issues, ensembles may be applied to dampen these problems. By averaging across many networks, the instability of any one model will be drastically reduced in the presence of more sound predictions.
The functionality provided requires the user to specify a directory that contains the models of interest and a desired amount of noise. The ensemble type will read in each model and construct a network corresponding to each of them. To get a prediction from the ensemble, an input vector is passed to it. For non-zero amounts of noise, Gaussian noise is applied to the input vector each time it is passed to an ensemble member. This allows each member to see a slightly different variant of the input, increasing the robustness of prediction around that point. This operation runs in parallel using OpenMP, where each network can be given its own thread to expedite computation; such an approach could easily be adapted via OpenACC for GPU-based threading of large ensemble network calculations. Following the computation, the predictions are averaged together and the final output is given.

Case Study
The following section provides a case study demonstrating an application of FKB to experimental nextgeneration climate modeling. The Superparameterized Community Atmospheric Model version 3.0 (SP-CAM3) is used for all simulations in this study. SuperParameterization is an approach that confronts the decades-long problem of representing subgrid cloud physics in climate models by embedding thousands of limited-domain explicit sub-models of moist convection within a conventional planetary-scale model of the large scale atmosphere [22,31,30,50]. This approach tends to involve two orders of magnitude more computational intensity per unit area of the simulated earth, but recently Rasp et al. used a deep neural network to emulate all of the expensive subgrid cloud resolving models' (CRM) influence on the planetary host at drastically reduced computational expense [44]. This study, along with others in the emerging climate modeling literature [10] have demonstrated the potential advantages of a data-driven approach for addressing the important unresolved effects of clouds and convection on planetary climate, as compared to previous, heuristic based, approximations to subgrid physics. However, the idea of emulating turbulence in climate simulation is still an emerging one with unclear trade-offs including frequent instabilities when NN emulators are coupled with fluid dynamics, which the community is seeking to learn how to control [10]. It has even been questioned whether offline skill of such emulators during their training is actually predictive of their online performance [43], an important open question.
These questions are understudied largely due to the lack of the simple software interface that FKB now enables for climate scientists to test diverse candidate neural networks, and ensembles thereof, within planetary climate models.
To illustrate an advance on this front we now apply FKB to shed new light on two related questions currently in debate:  2. Which neural network hyperparameters most affect online performance?
Using FKB, the study can be broken into two stages. First, a suite of over a hundred candidate neural network models of convection are trained, via Keras, on simulated data from the SPCAM3. Second, the models are converted to Fortran and run online (i.e. coupled to planetary fluid dynamics) in the SPCAM3 simulator, where a preliminary metric of performance is evaluated by the number of steps until catastrophic failure. It is clear that in the absence of the FKB library, running one hundred or more candidate neural network submodels of convection within the Fortran based model of the rest of the planet's atmosphere would be nearly impossible, because each network contains various hyperparameters, each with different weights and biases learned during training. In order to leverage the FKB library with SPCAM3, we simply compile the neural network library in advance and link it to the compilation of SPCAM3. Documentation steps for the implementation of this case study are provided here: https://github.com/scientific-computing/FKB/SPCAM_Instructions.md.
The input to this neural network model is a 94-dimensional vector. Features include vertically resolved vectors representing the large scale (host model) temperature, humidity, and meridional wind vertical structure, as well as surface pressure, incoming solar radiation, sensible heat flux, and latent heat flux scalars. The output of the network is a 65-dimensional vector composed of the embedded models' influence on their host -i.e. the sum of the CRM and radiative heating rates, the CRM moistening rate, the net radiative fluxes at the top of the atmosphere and surface of the earth, and the precipitation. The training data come from an enhanced version of the CRM training data that was applied successfully in [44], in which each embedded CRM is quadrupled in horizontal extent (from 8 km to 32 km) to improve its physical realism; machine learning in this limit of data quality that has never been coerced to produce successful online results beyond a few simulated weeks (see discussion of "NN-unstable" by [10] for details).
Our working hypothesis is that historical instabilities in this limit of higher quality CRM training data simply reflects a broader issue of insufficient hyperparameter tuning in climate model applications.
To address this, we conducted neural network optimziation via a random search using SHERPA [24], a Python library for hyperparameter turning. We detail the hyperparameters of interest in Table 1, as well as the range of available options during the search. The hyperparameters of interest consisted of whether or not to use batch normalization, the amount of dropout, the leaky ReLU coefficient, learning rate, nodes per layer, the number of layers, and the optimizer. The random search algorithm has the advantage of making no assumptions about the structure of the hyperparameter search problem and is ideal to explore a variety of hyperparameter settings.
More than one hundred candidate neural network model configurations were thus attained, each trained for 25 epochs with early stopping monitoring the validation loss. Following the offline training stage, the neural network models were then converted into their Fortran counterparts and run inside SPCAM3. We underscore that this critical step would have been prohibitive using standard tools that have required manual translation of each candidate model. But by leveraging the FKB library each model was loaded independently into Fortran and run as the subgrid physics emulator inside SPCAM3's host planetary model of the large-scale atmospheric state, i.e. coupled to fluid dynamics, to run a wide ensemble of prognostic tests across an unprecedented diversity of candidate neural network architectures. Each of the over hundred candidate neural network models -with their varyious numbers of layers, layer-specific settings (batch-normalization, relu magnitude, etc), nodes per layer, weights, and biases -were run online, all without rewriting any Fortran code.
In order to address the first question and evaluate a neural network model's performance we compare its validation MSE during training with the time-to-failure of the online tests in which thousands of  Table 2: Spearman correlation of corresponding hyperparameter with online performance, and associated p-value.
instances of the NN are coupled interactive to their host global atmospheric physical model. This yields Figure 3a, which sheds new light on the offline vs. online relationship. Results in this figure demonstrate a relationship between offline validation error and online performance. There is an obvious, negative, relationship between offline MSE and online stability (Spearman correlation of −0.73; p = 4.961e −19 . Importantly, this contradicts the recent speculation by [43] that such a relationship might not exist, or be easily obscured. Clearly sufficient hyperparameter tuning is critical to solving problems of chronic instability in climate model applications of DNNs for subgrid physics. The second question naturally arises as to which of the hyperparameters are most impactful to the online performance. To assess this, Figure 2b-i decompose the sensitivity of the baseline relationship to individual hyperparameter choices. The choice of optimizer is shown to correlate most strongly with online performance (Figure 3i). This finding is confirmed by Spearman correlation values, shown in Table 2. The optimizer hyperparameter has the largest, absolute, correlation value with online performance. No other hyperparameter shows the clear distinction in correlation that is evident in the choice of optimizer, including the network depth and total number of parameters, which are known to be important to offline fits for this problem [gentine2018], but which are surprisingly not as predictive of coupled skillas the choice of optimizer, whose impact has not previously been isolated (for this application).
Further investigation into the specific optimizer used, reveals the SGD optimizer to perform poorly; NNs fit with SGD never run longer than 1,000 steps when coupled online (Figure 3i). Again the visual intuition from Finally, after answering the two questions motivating this case study we are able to compare the results of the best performing model with that of previously published models of [44] when applied to the challenging limit of CRMs with 32-km horizontal extent. The model proposed by Rasp et al. was a single deep neural network. The hyperparameter space of this model was not fully explored online in large part due to the laborious process required to transfer those models into Fortran. The Rasp et al. model (provided by the authors) ran for 128 steps before crashing due to instability issues. The best model achieved in this study ran for more than 50,000 steps. This 500-fold improvement is a direct result of the ease with which a wide variety of models (identified by SHERPA) can be transferred between Python and Fortran (thanks to FKB). We also note that this method is preferable to another approach that was recently proposed to begin stabilizing the same model, through small-amplitude Gaussian input perturbation [9] -a strategy that, while promising, adds computational expense and introduces out-of-sample extrapolation issues that can be avoided with the brute-force optimization and wide-ensemble prognostic testing path to stabilization we have outlined here.
This case study has investigated two closely entangled questions: 1) Does offline performance correspond to online model performance? 2) What neural network hyperparameters most effect online performance? Both of these questions were answered by leveraging the FKB library. The library offers the ability to expeditiously transfer models trained in Keras to Fortran, where they may be run online in existing simulators. In the absence of FKB neither one of these questions could be approached without unreasonable human intervention.

Conclusion
The ubiquitousness of deep learning has resulted from extensive free and open source libraries [14,1,40]. Deep learning's success and popularity merit its integration in large-scale computing packages, like those written in Fortran. Instead of rewriting all existing libraries in Fortran, we introduced a two-way bridge between, low-level, Fortran and Python through the FKB Library. The library provides researchers the ability to implement neural networks into Fortran code bases while being able to transfer them back and forth with Keras. Fortran, which has been a staple within computationally intensive fields for decades, will no doubt see continued use due to its fast computational ability and vast amounts of legacy code. The FKB library enables users to access to many features of the Keras API directly in Fortran, including the ability to create custom layers and loss functions to suit their needs. We demonstrate the integrability of FKB through our case study involving the SPCAM3 simulator. An advantage of FKB is its ease of use, demonstrated by its ability to be compiled in advance and once linked can be easily leveraged in existing large scale simulators, as we have illustrated for the application of multi-scale physical simulations of the global atmosphere.