A Simple Method for Causal Analysis of Return on IT Investment

This paper proposes a method for examining the causal relationship among investment in information technology (IT) and the organization’s productivity. In this method, first a strong relationship among (1) investment in IT, (2) use of IT and (3) organization’s productivity is verified using correlations. Second, the assumption that IT investment preceded improved productivity is tested using partial correlation. Finally, the assumption of what may have happened in the absence of IT investment, the so called counterfactual, is tested through forecasting productivity at different levels of investment. The paper applies the proposed method to investment in the Veterans Health Information Systems and Technology Architecture (VISTA) system. Result show that the causal analysis can be done, even with limited data. Furthermore, because the procedure relies on overall organization’s productivity, it might be more objective than when the analyst picks and chooses which costs and benefits should be included


INTRODUCTION
Within one organization and for a single investment, the typical approach to evaluation of return on investment is to use financial ratios such as Return On Investment (ROI).This ratio divides the current value of future returns to current value of future investments.The return on investment procedure is fraught with difficulties that could lead to erroneous results.The key problem is that managers and vendors selectively use possible costs and benefits that distort the findings.For example, in analysis of returns on investment for an electronic health record, one might assume savings due to reduction in use of paper, savings from hiring fewer medical-record clerks and savings due to smaller need for storage space [1,2,3,4].The cost and benefits included depend on who is asked [5], and often important costs and benefits (e.g., impact on productivity and mortality [6]) are ignored or based on less precise self reports [7].Selective inclusion leads to contradictory situations, where some costs, e.g., cost of training, is included and other related costs, e.g., cost of employees sitting in training sessions, is ignored.The resulting ROI ratio is a rosy forecast of what might happen [8,9].One study may forecast breakeven point for the investment in 3 years and another study, based on what costs and returns are included, may forecast the breakeven point a decade later [10].It is not surprising that much of the claimed returns on information technology (IT) projects fail to materialize and many projects have cost overruns.In this paper, we provide a method of calculating return on investments that examines the impact of IT projects based on productivity of the organization and, as such, it does not selectively choose a subset of likely savings.This approach sidesteps the need to pick specific costs or benefits as the total organization's revenue reflects many tangible and intangible costs and benefits.The portion of organization's revenue per patient that can be allocated to the investment in IT can then be calculated and reported.
This paper demonstrates how a firm's overall productivity can be used to analyze return on investment.The idea of examining the value of IT by examining its impact on total organization's productivity is not new and in fact most economic studies do so [11-12, 13, 14, 15, 16, 17, 18].These studies examine the impact of IT across firms and not within a firm.The current paper is one of the few papers that examine productivity within one firm.When the analysis is restricted to one firm, since productivity and cost of IT are observational data, it is difficult to infer whether one causes the other.Until recently, it was not possible to decipher a causal relationship between two streams of observational data.This situation has changed.It is now possible to analyze observational data and make causal inferences from these data [19,20].We propose to use these new tools to test assumptions behind claims that investment in IT has led to improved productivity within one firm.

METHODS
We examine the causal impact of IT investment on productivity within one firm assuming that the quality of services remains constant.We do so in four steps.
Step 1: Data Collection Data are collected over at least three time periods (months, quarters or years).On one hand, the more data points the stronger the possible conclusions.Statisticians typically insist on having large data sets.On the other hand, the more distant time periods may be less relevant to today's information systems.Statisticians select the number of data points based on power of the test.Financial analysts typically rely on ratios derived from the lifecycle of the information system [21].
If an organization is implementing a new system, they have no data to rely upon.Analysts for these organizations should rely on experience of peer organizations that have implemented the same system.Many vendors provide contact information for non-competing organizations that have implemented their systems; these organizations are excellent sources of the needed data.In addition, HIMSS Analytics (http://www.himssanalytics.org/)has surveyed hospitals across a geographic area to report their success in implementing information systems.These data can be examined to report impact of systems [22].
In order to work out the causal relationship between investment in IT and organization's productivity, we need to introduce a third variable that describes the mechanism by which the improved productivity occurs.This is typically a variable that measures use of IT.The use of IT can be measured through a number of surrogate measures, such as the employee's time on the machine, the time on the network, the number of calls to specific software, or the number of records in a database, etc.These are likely to be highly correlated variables and the final choice of one among them is left to the analyst.It should be clear that the introduction of the third variable is to identify a "catalyst" (in the statistical literature, a mediating variable).Proper causal inferences require experiments in which the cause is introduced and withdrawn, and the effect is observed.In the absence of this experiment, the catalyst variable should be selected in such a manner that a natural experiment is seen in the data.We prefer "use of IT" as our third variable because a natural experiment emerges where the effect of IT is not seen unless it is used.
Sometimes, longitudinal data on impact of IT on organizational outcomes is not available.In these circumstances, the analysis can be performed by using data from the experience of a cross-section of various functional units within the organizations.In this case, at least three functional units that are at different stages of implementing the technology should be examined.One unit should be included that does not use the IT systems at all.Another unit should be included that is heavily using IT systems.A third unit somewhere in between these functional units should also be included.The variations in the cost of operating the technology, the use of the technology, and the productivity of the unit can be used to examine the relationship among these three variables.If the functional units are chosen appropriately, then the cross-sectional data can be used to verify what would have happened to productivity when a unit did not receive similar investment in IT (the so called counterfactual assumption to be discussed later in step 3).Moreover, different from longitudinal data, these units might be exploited to control for eventual external factors that might distort the causal relationships.
Step 2: Examination of Association Among Investment, Use, and Productivity After gathering the appropriate data, the associations among IT costs, IT use, and organizational productivity should be determined.These relationships are demonstrated visually by a scatter plot and by the calculation of the correlation coefficients.Three correlations are needed: • Correlation between the investment in IT and use of IT, r ˆi,u • Correlation between the use of IT and the organization's productivity, r ˆu,p • Correlation between investment in IT and organization's productivity, r ˆi,p A strong correlation between cost of IT, use of IT and productivity is necessary but not sufficient for establishing a causal link between IT and profitability.In examining impact of IT over a long period of time, external factors (general economy, change in management) may affect an organization's productivity and therefore distort the relationship between use of information systems and increased productivity.In these circumstances, it is important to explicitly model the changes or to shift from a longitudinal to cross functional unit analysis, where the impact of historical events is the same across all functional units and therefore the estimated correlations are independent of the recent events.
Step 3: Verification of Causality What distinguish a causal relationship from merely an association between two variables are listed below [23]: 1.
Cause and effect must be associated.

2.
There should be a clear mechanism from a cause to an effect through the identification of a catalyst.In our case, we take advantage of available data on use of IT to describe the mechanism by which IT leads to increased productivity.

3.
Cause must precede effect.The sequence of the cause and effect can be established through the examination of conditional independence, also known as a vanishing partial correlation, among any of the three variables.In any serial arrangement of a triplet of nodes, the beginning and end node are independent given the middle node.This principle is demonstrated by listing each permutation as it relates to the three variables: IT investment (cost), IT use, and organization's productivity.Then, data and logic are applied to each permutation to determine if the sequence of events is feasible.Some of the permutations are the following: 1. IT investment leads to IT use which leads to productivity gains.2. IT investment leads to productivity gains which lead to use.
3. Productivity gains lead to IT investment which leads to use of IT. 4. Productivity gains lead to use of IT which leads to IT investment. 5. IT use leads to IT investment which leads to productivity gain.6. IT use leads to productivity gain which leads to IT investment.Among these permutations, there are specific situations that do not make logical sense.For example, one would expect that IT investment precedes its use, because one cannot use a technology that is not yet available.Therefore, permutations where use precedes investment are not included in the analysis, such as permutations "4," "5" and "6." Permutation "2" is also excluded from the analysis because it is not logical to expect that productivity will be the direct cause of IT use.The only viable methods in which these three variables might have an effect on each other are through permutations "1" and "3."Both of these permutations will be evaluated.Depending on which permutation will have more support from the data, one of them will be selected to be used in the ROI calculation.
If the effect of IT investment on productivity gains is through IT use, then the partial correlation between productivity and IT investment for given values of IT use should be zero.In other words, if we know the level of IT use, we should be able to predict the productivity gain independent of any information about IT cost.Knowing the extent of use of IT should be sufficient in understanding the impact of IT investment on productivity.If this were the case, then the partial correlation should be zero: Mathematically, this is the case if: (2) If the reverse situation is true, meaning that productivity gains have financed expansion of IT, which has then affected the use of IT, then the following formula should be used: (3) If the data support one of these two equations, then they support the causal assumption of independence behind the equation.We could use Fisher's ztransform of the partial correlation to test the hypothesis that the partial correlation is significantly different from zero [24,25].If the partial correlation is significantly different from zero, then we reject the assumption of independence.Once this assumption is violated, then the associated causal model is no longer supported by the data.

4.
Counterfactual.This assumption says that if it was not for the investment in IT, there would not have been an increased use of IT and subsequently, there would not have been an improved productivity.The easiest test is an experiment in which IT is removed and impact on revenues is observed.Such experiments can be done but organizations find these experiments disruptive.Therefore, it is important to assess what would have happened to productivity if it was not for recent IT investment.Such scenarios can be examined by comparing productivity from the time period that IT investment was low to the time period that IT investment was high.Then the difference of observed productivity will be a test of the counterfactual assumption.An easy method of accomplishing this test is to divide IT investment into two equal frequency sections of low and high investment.Likewise, we categorized the productivity into two equally-frequent low and high productivity levels.In the model where investment is assumed to lead to productivity gains, we can test the counterfactual assumption by comparing the conditional probability of high productivity gains given high investment in IT to the conditional probability of high productivity gains given low investment in IT.

Step 4: Calculation of Return on Investment (ROI)
If a causal relationship between IT investment and the organization's productivity has been determined, then the ROI can be calculated.But when a causal relationship is not found, ROI calculations will be misleading.

RESULTS: IMPACT OF VETERANS ADMINISTRATION'S OFFICE OF INFORMATION
The proposed method was applied to evaluate the impact of the Veterans Administration (VA) electronic health record: VistA.It has been reported that during the VistA's growth period, the quality of care improved and the cost of health services was reduced [26,27,28].However, these reports did not examine the causal relationship between the implementation of VistA and the resulting improvements in outcomes.Table 1 provides the data from 1998 through 2004.The data reports the budget of the Office of Information.For the years in which Office of Information did not exist, data were estimated by combining the budget of units that eventually were absorbed in the Office of Information.Our method of synthetically combining budgets does not reflect cost savings that resulted from combining the different pieces of the organization that ultimately became the office of information.We also assume that this office had its major impact on productivity of the Veteran's Administration through its development of VistA and not through many other activities and responsibilities of the office.As a measure of use of VistA, Table 1 exhibits number of patient records within VistA.The question is whether the money spent on Office of Information has paid off in better productivity for the entire system.The claim is that the mechanism through which improved productivity occurred was through increased use of VistA.The scatter plot in Figure 1 indicates the relationship between the budget of the IT office and the Cost-Per-Patient (CPP) Index, a measure of productivity calculated as the percent change from previous year in cost per patient.This plot shows a strong association between these two variables (correlation of 0.796), but which is the cause and which is the effect?
In order to answer the question of causality, we analyzed the relationship of these two variables with the number of records in VistA.The correlation between the size of the VistA database and the CPP index was 0.57, which shows a moderate relationship between the two variables.The correlation between the size of the VistA database and the budget of IT office was 0.76, indicating a large association between the two variables.
Our hypothesis was that the budget of Office of Information led to use of VistA which led to improved productivity.If this was the case, then CCP index should be independent of IT budget for given levels of size of VistA database and have a partial correlation of zero.The partial correlation between IT investment and productivity given use was 0.86.The z transform of this correlation was 1.29.The hypothesis that the partial correlation was zero was rejected at alpha of 0.05.
The alternative hypothesis was that the improvements in productivity led to higher budget for the Office of Information, which led to more use of VistA system.If this was the case, then use of VistA could be independent of CPP index, for the given budget of Office of Information.The partial correlation was −0.36, which was not statistically different from zero (z = −.38,alpha = 0.05).Neither calculation is exactly zero, but the data support more the assumption that the reduction in the CPP index led to the growth in the IT budget and subsequent use of VistA.In the years examined, IT investment did not lead to increased productivity but the reverse.
We can also test whether the limited data that we have support the counterfactual assumption for the accepted causal structure.Using the median, we divided IT investment into low and high levels, and did the same for the productivity level.Table 2 provides conditional probability of investment in IT for different levels of productivity.
The conditional probability of high investment in IT drops from 0.33 to zero when productivity drops from high to low.These data support the counterfactual scenario.Ceteri s paribus, if productivity was low, we would have been unlikely to see the high level of IT investment that we observe in later years.

DISCUSSION
This paper examines how causation can be built explicitly into calculations of return on investments within one firm.Since researchers usually tend to stay away from causal language, it is impossible to make sense of return on investment calculations by implicitly assuming that the returns were caused by the investment.The practice has typically been assuming causality but not testing for it.This is unfortunate because not testing fundamental assumptions leaves room for large errors.Causal claims are different from association claims.Association claims can be mostly tested through classical statistics, such as correlation and regression analysis.Causal claims need experimental studies where the cause is introduced and withdrawn to measure the impact of the cause on the effect.In absence of such direct experiments, causal claims can be made from observational studies, if four criteria are met: (1) causes must be associated with effects, (2) causes precede effects, (3) in the absence of the cause there should be no effect, and (4) there is a clear mechanism through which cause leads to the effect.All of these assumptions can be empirically tested.
In our view, this is what is accomplished by introducing the "use of IT" as a third variable.This variable is a mechanism through which one expects the investment in IT to affect productivity.It also allows us to examine sequence of cause and effect by examining the partial correlations.Finally, one can examine if failure to use IT leads to lower productivity.Thus, the introduction of the third variable allows us to test various assumptions of causality.
While we focused on only one causal mechanism, it is possible to hypothesize multiple processes.For example, we measured use of VistA by number of records in VistA.It is also possible to measure use by number of calls to the system or number of clinicians using the system.Alternative measures, and perhaps multiple measures, should be tried.There is certainly a degree of subjectivity in the choice of a measure of IT use.What matters in the choice of the third variable is that it produces natural experiments where the impact of IT investment disappears if it is not used.Since it is generally agreeable that impact of IT depends on its use, our choice of the intermediary variable should be reasonable.
In causal analysis, it is always possible to claim that some other time-varying variable is correlated with all variables studied, and therefore the observed correlations are not real but a function of a hidden cause (e.g., improvements in the economy).This

Table 2. Conditional probability of investment in IT
for different productivity levels Low levels are below and high levels are above median.

Unknown High Low
Low Investment in IT .86 .67 1.0 High investment in IT .14.33 0.0 is generally known as the well-recognized but not always well-handled issue of confounding.Confounding issues can be corrected for when these variables are known and measured and can be directly made part of causal analysis [29,30].Critics may say that this is not always possible.We agree that science cannot analyze all possible confounding variables, and trying to include all possible confounding is not analysis but paralysis.Perhaps other important factors should and could improve the analysis.Some may argue that our approach to analysis of business value of information technology is too simple.Keep in mind that the current approach to return on investment is a ratio; as such, it is even simpler than our proposed causal analysis.
There are many ways our proposed causal analysis can be improved and made more complex.We could measure factors associated with the external environment and control the influence of changes in regulation or general economy.We could assume a lag between IT investment, use and generation of new organization's revenues.We could empirically test the correctness of counterfactual assumptions within causal models.In particular, we could empirically examine revenues generated when IT was not in use.We could rely on larger data sets; in particular, we could rely on data from multiple organizations and many time periods.All these are possible and we believe that, at times, it is desirable to do so.This paper has tried to show how with little data, key aspect of causal analysis of return on investment can be carried out.More sophisticated approaches are possible but the approach we are proposing is easy and less speculative than the current method.

CONCLUSIONS
Financial analysts have been using ROI ratios for many years.This paper has presented an alternative to calculation of ratios.This alternative relies on causal analysis of the data.We showed that various causal models can be proposed and tested against observational data.The approach we have proposed does not require extensive data collection.It is possible to conduct the analysis with as little as three time periods.An application of the method to evaluation of the Veterans Administration's electronic health record showed that, contrary to general assumption, during the interval examined, the investment in IT followed gains in productivity and not vice versa.Ratio analysis may be misleading because it allows selective use of possible financial benefits and costs.In contrast, we proposed to rely on impact of the IT project on the organization's overall productivity, no longer picking and choosing specific benefits or costs.The measures of overall organization's productivity are readily available.Overall organization cost is also widely available.The use of these measures will guarantee that all projects will have the same end points and thus there will be less gaming of the calculation of return on investment.The approach we have proposed is more objective because it removes the speculative elements of which costs or benefits are included.

1 Figure 1 .
Figure 1.Relationship between IT investment (in million dollars) and Cost-Per-Patient Index.