An Online Causal Inference Framework for Modeling and Designing Systems Involving User Preferences : A State-Space Approach

We provide a causal inference framework to model the effects of machine learning algorithms on user preferences. We then use this mathematical model to prove that the overall system can be tuned to alter those preferences in a desired manner. A user can be an online shopper or a social media user, exposed to digital interventions produced by machine learning algorithms. A user preference can be anything from inclination towards a product to a political party affiliation. Our framework uses a state-space model to represent user preferences as latent system parameters which can only be observed indirectly via online user actions such as a purchase activity or social media status updates, shares, blogs, or tweets. Based on these observations, machine learning algorithms produce digital interventions such as targeted advertisements or tweets. We model the effects of these interventions through a causal feedback loop, which alters the corresponding preferences of the user. We then introduce algorithms in order to estimate and later tune the user preferences to a particular desired form. We demonstrate the effectiveness of our algorithms through experiments in different scenarios.


Introduction
Recent innovations in communication technologies, coupled with the increased use of Internet and smartphones, greatly enhanced institutions' ability to gather and process an enormous amount of information on individual users on social networks or consumers in different platforms [1][2][3][4].Today, many sources of information from shares on social networks to blogs, from intelligent device activities to security camera recordings are easily collectable.Efficient and effective processing of this "big data" can significantly improve the quality of many real life applications or products, since this data can be used to accurately profile and then target particular users [5][6][7].In this sense, abundance of new sources of information and previously unimaginable ways of access to consumer data have the potential to substantially change the classical machine learning approaches that are tailored to extract information with rather limited access to data using relatively complex algorithms [8][9][10][11].
Furthermore, unlike applications where the machine learning algorithms are used as mere tools for processing and inferring using the available data such as predicting the best movie for a particular user [12], the new generation of machine learning systems employed by enormously large and powerful data companies and institutions have the potential to change the underlying problem framework, that is, the user itself, by design [8,13].Consider the Google search engine platform and its effects on user preferences.The Google search platform not only provides the most relevant search results but also gathers information on users and provides well-tuned and targeted content (from carefully selected advertisements to specifically selected news) that may be used to change user behavior, inclinations, or preferences [14].
Online users are exposed to persuasive technologies and are continually immersed in digital content and interventions in various forms such as advertisements, news feeds, and recommendations [15].User decisions and preferences are affected by these interventions [16].We define a feedback

Advertisements
Message and news feeds Age, income, education. . .framework in which these interventions can be selected in a systematic way to steer users in a desired manner.In Figure 1, we introduce "The Digital Feedback Loop" on which we base our model.To this end, in this paper, we are particularly interested in the causal effects of machine learning algorithms on users [17,18].Specifically, we introduce causal feedback loops to accurately describe effects of machine learning algorithms on users in order to design more functional and effective machine learning systems [18,19].We model the latent preferences and/or inclinations of a user, as an unknown state in a real life causal system, and build novel algorithms to estimate and, then, alter this underlying unobservable state in an intentional and preferred manner.In particular, we model the underlying evolution of this state using a statespace model, where the latent state is only observed through the behavior of the user such as his/her tweets and Facebook status shares.The internal state is causally affected by the outputs of the algorithm (or the actions of the company), which can be derived from the past observations on the user or outputs of the system.The purpose of the machine learning algorithm can be, for example, (i) to drive the internal system state towards a desired final state, for example, trying to change the opinion of the population towards a newly introduced product; (ii) to maximize some utility function associated with the system, for example, enticing the users to a new and more profitable product; or (iii) to minimize some regret associated with the disclosed information, for example, minimizing the effects of unknown system parameters.Alternatively, the machine learning system may try to achieve a combination of these objectives.
This problem framework readily models a wide range of real life applications and scenarios [18,19].As an example, an advertiser may aim to direct the preferences of his/her target audience towards a desired product, by designing advertisement using data collected by consumer behavior surveys [18].This framework is substantially different from the classical problem of targeted advertisement based on user profiling.In the case of targeted advertising, the goal is to match the best advertisement to the current user, based on the user's profile.Another part of the classical problem is to measure the true impact of an ad (a "treatment" or an "intervention" in the general case) and thus find its effectiveness to help the ad selection for the next time or the next user as well as for billing purposes.Here, we assume that the underlying state, that is, the preferences of the consumers, are not only used to recommend a particular product but also intentionally altered by our algorithm.As in some of the earlier works [12,17,20], we use a causal framework to do our modeling.We then take it a step further to mathematically prove that the impact of a treatment can be predesigned and the user can, in theory, be swayed in accordance with the designer's intent.To the best of our knowledge, this is unique to our work.We can further articulate the difference between our work and some of the earlier works using an example in the context of news recommendation.The classical approach tries to show the user news articles he/she might be interested in reading, based on their profile and possibly some other contextual data.A separate process collects information on whether the user clicked on a particular news item and what that item's context is.This collected data is then used to augment the user's profile so that the recommendation part of the process makes a better decision the next time or for the next user.The connection between separate decisions is mainly the enhanced user profile.In reality, the recommended news articles have impacted the user's news preferences to some degree.This is a classical counterfactual problem [8].While the user preferences themselves are latent and cannot be directly measured, the impact manifests itself in a number of ways that are observable.For instance, the user might tweet about that news with a particular sentiment or buy a book online which is related to the topic in the news item.What we prove with our framework is that, using the observable data and our model, one can produce a sequence of actions which will influence and steer the user's preferences in a pattern that is intended by the recommender system.These actions can be in the form of content served to the user such as news articles, social media feeds, and search results.
In different applications the preferences can be the state and the advertisements (content, the medium of the advertisement, the frequency, etc.) are the actions or output of the machine learning algorithm.In a different context, the opinions of the social network users on Facebook of a particular event or a new product can be represented as a state.Our model is comprehensive such that the relevant information on the user such as his/her age, gender, demographics, and residency is collectively represented by a side information vector since the advertiser collects data on the consumer such as the spending patterns, demographics, age, gender, and polls.
A summary of our work in this paper is as follows, with the last bullet being our key contribution: (i) We model the effects of machine learning algorithms such as recommendation engines on users through a causal feedback loop.We introduce a complete state-space formulation modeling: (1) evolution of preferences vectors, (2) observations generated by users, and (3) causal feedback effects of the actions of algorithms on the system.All these parameters are jointly optimized through an Extended Kalman Filtering framework.
(ii) We introduce algorithms to estimate the unknown system parameters with and without feedback.In both cases, all the parameters are estimated jointly.
We emphasize that we provide a complete set of equations covering all the possible scenarios.
(iii) To tune the preferences of users towards a desired sequence, we also introduce a linear regression algorithm and introduce an optimization framework using stochastic gradient descent algorithm.Unlike all the previous works that only use the observations to predict certain desired quantities, as the first time in the literature, we specifically design outputs to "update" the internal state of the system in a desired manner.
The rest of the paper is organized as follows.In the next section, we present a comprehensive state-space model that includes the evolution of the latent state vector, underlying observation model and side information.In the same section, we also introduce the causal feedback loop and possible variations to model different real life applications.We then introduce the Extended Kalman Filtering framework to estimate the unknown system parameters.We investigate different real life scenarios including the system with and without the feedback.We present all update and estimation equations.In the following section, we introduce an online learning algorithm to tune the underlying state vector, that is, preferences vector, towards a desired vector sequence through a linear regression and causal feedback loop.We then demonstrate the validity of our introduced algorithms under different scenarios via simulations.We include our simulation results to show that we are able to converge on unknown parameters in designing a system which can steer user preferences.The final section includes conclusions and scope of future work.

A Mathematical Model for User Preferences with Causal Feedback Effects
In this paper, all vectors are column vectors and denoted by lower case letters.Matrices are represented by uppercase letters.For a vector u, is the  2 -norm, where u  is the ordinary transpose.For vectors a ∈ R  and b ∈ R  , a  is the transpose and [a; b] ∈ R + is the concatenated vector.Here, I represents an identity matrix, 0 represents a vector or a matrix of all zeros, and 1 represents a vector or a matrix of all ones, where the size is determined from the context.The time index is given in the subscript; that is, x  is the sample at time t.  is the Kronecker delta functions.
We represent preferences of a user as a state vector p  , where this state vector is latent; that is, its entries are unknown by the system designer.The state vector can represent affinity or opinions of the underlying social network user for different products or for controversial issues like privacy.The actual length and values of the preferences depend on the application and context.As an example for the mood of a person in a context of 6 feelings (happy, excited, angry, scared, tender, and sad), the preference vector might be [0, 1, 0, 0, 0, 0]  .
The relevant information on the user such as his/her age, gender, demographics, and residency is collectively represented by a side information vector s  .The side information on users on the social networks can be collected based on their profiles or their friendship networks.We assume that the side information is known to the designer and, naturally, change slowly so that s  = s is constant in time.
The machine learning system collects data on the user, say x  , such as Facebook shares, comments, status updates, and spending patterns, which is a function of his/her preferences p  and the side information s, given by where the functional relationship (⋅) will be clear in the following.Since the information collection process may be prone to errors or misinformation, for example, untruthful answers in surveys, we extend (2) to include these effects as where n  is a noise process independent of p  and s.We can use other approaches instead of an additive noise model; however, the additive noise model is found to accurately model unwanted observation noise effects [21].We use a time varying linear state-space model to facilitate the analysis such that we have where F  is the observation matrix [22] corresponding to the particular user and n  is i.i.d. with where R is the autocorrelation matrix.The autocorrelation matrix R is assumed to be known, since it can be readily estimated from the data [22] in a straightforward manner.
We do not explicitly show the effect of s on F for notational simplicity.
Based on prior preferences, different user effects and trends, and the preferences of the user change, we represent this change as with an appropriate   (⋅) function.To facilitate the analysis, we also use a state-space model where G  is the state update matrix, which is usually close to an identity matrix since the preferences of user cannot rapidly change [19,20].Here, k  models the random fluctuations or independent changes in the preferences of users, where it is i.i.d. with and Q is the autocorrelation matrix.The autocorrelation matrix Q is assumed to be known, since it can be readily estimated from the data [22] in a straightforward manner.The model without the feedback effects is shown in Figure 2.
Remark 1.To include local trends and seasonality effects, one can use k  = B  u  , where B  may not be full rank when local trends exist (local trends can cause some data points to be derived from others).Also, u  is an i.i.d.noise process.Our derivations in the next sections can be generalized to this case by considering an extended parameter set.
In the following, we model the effect of the actions of the machine learning algorithm in the "observation" (4) and "evolution" (7) equations.

Causal Inference through the Actions of the Machine Learning System.
Based on the collected data x  , the algorithm takes an action represented by   .The action of the machine learning system or the platform can be either discrete or continuous valued depending on the application [21].As an example, if the action represents a campaign advertisement to be sent to a particular Facebook user, then the set of campaign ads is finite.On the other hand, the action of the machine learning system can be continuous such as providing money incentives to particular users to perform certain tasks such as filling questionnaires.We model the action as a function of the observations as where (⋅) may correspond to different regression methods [21].To facilitate the analysis, we model the action generation using a linear regression model as If we have a finite set of actions, that is,   ∈ {1, . . ., }, we replace (10) by which is similar to saturation or sigmoid models [23], where (⋅) is an appropriate quantizer.The linear model in ( 11) can be replaced by more complex models since x  can contain discrete entries such as gender and age.However, we can closely approximate any such complex relations by piecewise linear models [24].The piecewise linear extension of ( 11) is straightforward [24].
Based on the actions of the machine learning algorithm (and prior preferences), we assume that the preferences of the user changes in a linear state-space form with an additive model for the causal effect [18][19][20], which yields the following state model: where c  is the unknown causal effect.The complete linear state-space model is illustrated in Figure 3.Although there exists other models for the feedback, apart from the linear feedback, the linear feedback was found to accurately model a wide range of real life scenarios provided that causal effects are moderate [19], which is typically the case for social networks; that is, advertisements usually do not have drastic effects on user preferences [19,20].Our linear feedback model can be extended to piecewise linear models to approximate smoothly varying nonlinear models in a straightforward manner.
Remark 2. We can also use a jump state model to represent the causal effects for the case where   is coming from a finite set.In this case, as an example, the causal effects will change the state behavior of the overall system through a jump state model as Our estimation derivations in the following sections can also be extended to cover this case using a jump state model [22].Remark 3.For certain causal inference problems, the actions sequence   may be required to be predictive of some reference sequence   , in a traffic prediction context, to sway driver preferences p  in a certain direction by disclosing estimates   for a certain road   , using some publicly available data x  .To account for these types of scenarios, we complement the model in (12) by introducing where   is i.i.d.In this case, the feedback loop will be designed in order to tune   to a particular value.
In the following, we introduce algorithms that optimize w  so that the overall system behaves in a desired manner given the corresponding mathematical system.However, we emphasize the overall system parameters including the feedback loop parameters are not known and should be estimated only from the available observations x  .Hence, we carry out both the estimation and design procedures together for a complete system design.

Design of the Overall System with Causal Inference
We consider the problem of designing a sequence of actions {  } ≥1 in order to influence users based on our observations {x  } ≥1 , where behavior of the user is governed by his/her hidden preference sequence {p  } ≥1 .The machine learning system is required to choose the sequence {w  } ≥1 in order to accomplish its specific goal.The specific goal naturally depends on the application.As an example, in social networks, the goal can be to change the opinions of users about a new product by sending the most appropriate content such as news articles and/or targeted tweets.In its more general form, we can represent this goal as a utility function and optimize the cumulative gain: where   =   (p  ) is an appropriate utility function for a specific application.To facilitate the analysis, we choose the utility function as the negative of the squared Euclidean distance between the actual consumer preference p  and some desired state q  .We emphasize that, as shown later in the paper, our optimization framework can be used to optimize any utility function provided that it has continuous first-order derivatives due to the stochastic gradient update.In this case (15) can be written as The overall system parameters, {F, G, c}, are not known and should be estimated from our observations.We introduce an Extended Kalman Filtering (EKF) approach to estimate the unknown parameters of the system.We separately consider the estimation framework without the feedback loop, that is, w = 0, and with the feedback loop, that is, w ̸ = 0. Clearly the estimation task for {F, G} can be carried out before we produce our suggestions w.In this case, we can estimate these parameters with a better accuracy without the feedback effects since we need to estimate a smaller number of parameters under less complicated noise processes.However, for certain scenarios where this feedback loop is already active, we also introduce a joint estimation framework for all parameters.A system with feedback is more general, realistic, and comprehensive.And feedback is needed in order to tune or influence the preferences of a user in a desired manner.However, a system with feedback is more complex to design and analyze.Therefore, we first provide the analysis for a system without feedback and build on it for an analysis of a system with feedback.After we get the estimated system parameters, we introduce online learning algorithms in order to tune the corresponding system to a particular target internal state sequence, which can be time varying, nonstationary, or even chaotic [23,25].

Estimating the Unknown Parameters of the System without
Feedback.Without the feedback loop, the system is described by where k  and n  are assumed to be Gaussian with correlation matrices Q and R, respectively.We then define where G  (:) is the vectorized G  ; that is, the columns of G  are stacked one after another to get a full column vector.To jointly estimate p  and   , we formulate an EKF framework by considering where   is the noise in estimating   through the EKF.Then, using ( 17) and ( 20) and considering p  and   as the joint state vector, we get where are the corresponding nonlinear equations so that we require the EKF framework.The corresponding EKF equations to estimate the augmented states are recursively given as where are EKF terms that approximate the optimal "linear" MSE estimated values in the linearized case and H  and D  are the gradients for the first-order Taylor expansion needed to linearize the nonlinear state equations in ( 21) respectively.Here, L  is the gain of the EKF and P  is the error variance of the augmented state.The complete set of equations in (23) defines the EKF update on the parameter vectors.We next consider the case when there is feedback.

Estimating the Unknown Parameters of the System with
Feedback.For estimating the parameters of the feedback loop, that is, c  (please see Figure 3), we have two different scenarios.In the first case, where we can control w, we set w = 0, estimate {F, G}, and then subsequently estimate c for fixed w.For scenarios where the feedback loop is already present (or we cannot control it), that is, w ̸ = 0, we need to estimate all the system parameters under the feedback loop.Naturally, in this case the estimation process is more prone to errors due to compounding effects of the feedback loop on the noise processes.We consider both cases separately.
Using (10) in ( 12), we get Hence, the complete state-space description with causal loop is given by In ( 29), w  is known; however, all the parameters including c are unknown.We have two cases.
Case 1.Since we can control w, we set w = 0 and estimate  as F and G as in the case without feedback.Then, use these estimated parameters in (29) yielding To estimate c  , we introduce an EKF framework by considering c  as another state vector: where   is the noise in the estimation process, yielding where is the corresponding nonlinearity in the system.In the state update equation (32), unlike the previous EKF formulation, the process noise depends on c  as c  w   n  , which is unknown and part of the estimated state vector.Hence, the EKF formulation is more involved.
After several steps, we derive the EKF equations to estimate the augmented states for this case as where are EKF terms that approximate the optimal "linear" MSE estimated values in the linearized case and H  and D  are the gradients for the first-order Taylor expansion needed to linearize the nonlinear state equations in (32): ) , respectively.Here, L  is the gain of the EKF and P  is the error variance of the augmented state.
To obtain an expression for Q in terms of w  , we define the composite error vector b  for the state update equation so that with After straightforward algebra, we get where These updates provide the complete EKF formulation with feedback.In the sequel, we introduce the complete estimation framework where we estimate all the parameters jointly.
Case 2. We can define a superset of parameters and formulate an EKF framework for this augmented parameter vector with which yields where are the corresponding nonlinear equations so that we require EKF.
After some algebra, we get the complete EKF equations as where To obtain an expression for Q in terms of w  , we define the composite error vector b  for the state update equation so that with After straightforward algebra, we get where Given that the system parameters are estimated through the EKF formulation, we next introduce learning algorithms on w  in order to change the behavior of the users in a desired manner.

Designing a Causal Inference System to Tune User Preferences
After the parameters are estimated through methods described in the previous sections, the complete system framework is given by with the estimated Our goal in this section is to design w  such that the sequence of preferences p  are tuned towards a desired sequence of preferences q  ; for example, one can desire to sway the preferences of a user to a certain product.
In order to tune the user preferences, we design w  so that the difference between the preferences p  and the desired q  is minimized.We define this difference as the loss between the preferences and desired vectors as where (⋅) is any differentiable loss function.As an example, for the square error loss, this yields To minimize the difference between these two sequences, we introduce a stochastic gradient approach where w  is learned in a sequential manner.In the stochastic gradient approach, we have where  > 0 is an appropriate learning rate coefficient.The learning rate coefficient is usually selected as time varying with two conditions: for example,   = 1/.If these two conditions are met, then the estimated parameters w  through the gradient approach will converge to the optimal w (provided that such an optimal point exists) [21].To facilitate the analysis, we set and get In (58), since p  is unknown, we use p |−1 from the causal loop case, that is, with feedback, and get To get we use the EKF recursion as where Using (61), we get a recursive update on the gradient as From ( 59), (61), and (63), we get the complete recursive update as This completes the derivation of the stochastic gradient update for online learning of the tuning regression vector.

Experiments
In this section, we share our simulation results to show that estimated parameters of the system converge to the real values, proving that a system can be designed with the right parameters which allows a sequence of actions or interventions to tune the preferences of a user in a desired manner.Since our goal is mainly to establish a pathway to the possibility of designing a system that can steer user preferences in a desired manner, we consider our basic simulation set to be sufficient based on the mathematical proof we provided in the form of EKF formulations.The true parameters of the system are known to us since we are running our experiments in the form of simulations.Specifically, the preferences of the user, which are not directly observable in real life, are known in case of simulations.We run simulations for the EKF formulations we derived in the previous sections to show that our estimation of the preferences converges to the real preference values.We illustrate the convergence of our algorithms under different scenarios.
In the first scenario, we have the case where the corresponding system has no feedback.As the true system, we choose a second-order linear state-space model, where G = 0.95I and F = I with Q = 3 × 10 −3 I and R = 3 × 10 −3 I.For the EKF formulation, we choose two different variances for   , for example, 10 −3 and 10 −4 , to demonstrate the effect of this design parameter on the system.We emphasize that neither F or G are known; hence, as long as the system is observable, particular choices of F and G only change the convergence speed and the final MSE.However, we choose F to make the system stable.
In Figure 4, we plot the square error difference between the estimated preferences and the real preferences with respect to the number of iterations, where we produce the MSE curves after averaging over 100 independent trials.We also plot the cumulative MSE normalized with respect to time, that is, to show that as the iteration count increases, the averaged MSE steadily converges.The plot includes both the average MSE and the cumulative MSE normalized in time for estimation of F and G.We observe that the estimation of F and G is more prone to errors due to the multiplicative uncertainty, single observation, and state update equations.However, both the estimated preferences vectors as well as the system parameters converge.
In the second set of experiments, we have feedback present; that is, w ̸ = 0.For this case, we now have similar parameters as in the first set of experiments, except G = 0.9I to give more decay due to presence of feedback.For this case, we choose two different scenarios, where w  and c  are fixed or randomly chosen provided that the overall system stays stable after the feedback; that is, (G+c w  F) corresponds to a stable

Data length
Figure 5: Estimation of the underlying vector of preferences and the feedback parameters when there is feedback.The results are averaged over 100 independent trials.Two different configurations are simulated for the feedback as well as for the linear control parameters, for example, the fixed and random initial cases.For both scenarios, our estimation process converges to the true underlying processes.
system.Note that this can be always forced by choosing an appropriate w.However, we choose randomly initialized w to avoid any bias in our experiments.Here, although w is known to us, the feedback amount c and the hidden preferences are unknown.In Figure 5, we plot the MSE between the Journal of Electrical and Computer Engineering estimated preference vectors and the true ones.We observe from these simulations that although the feedback produces a multiplicative uncertainty in the state equation and greatly enhances the nonlinearity in the update equation, we are able to recover the true values through the EKF formulation.We observe that although due to feedback we have more colored noise in the state equation, we recover true values due to the whitening effects of the EKF.The MSE errors between the estimated feedback and the true one are plotted, where the MSE curves are produced after 100 independent realizations.

Conclusions
In this paper, we model the effects of the machine learning algorithms such as recommendation engines on users through a causal feedback loop.To this end, we introduce a complete state-space formulation modeling: (1) evolution of preference vectors, (2) observations generated by users, and (3) the causal feedback effects of the actions of machine learning algorithms on the system.All these parameters are jointly optimized through an Extended Kalman Filtering framework.We introduce algorithms to estimate the unknown system parameters with and without feedback.In both cases, all the parameters are estimated jointly.We emphasize that we provide a complete set of equations covering all the possible scenarios.To tune the preferences of users towards a desired sequence, we also introduce a linear feedback and introduce an optimization framework using stochastic gradient descent algorithm.Unlike previous works that only use the observations to predict certain desired quantities, we specifically design outputs to "update" the internal state of the system in a desired manner.Through a set of experiments, we demonstrate the convergence behavior of our proposed algorithms in different scenarios.
We consider our work as a significant theoretical first step in designing a system with the right parameters which allows a sequence of actions or interventions to tune the preferences of a user in a desired manner.We emphasize that the main goal of our study is to establish a pathway to designing such a system.We achieve this by first providing mathematical proof and then through a basic set of simulations.A next step in future studies can be to make the system more stable and also to make the design process easy and practical for system designers.Further analysis on the convergence of the system and more simulations, experiments, and numerical analyses are needed to take our results to the next level.A direct comparison to previous studies is not possible for this first step of our study since, to the best of our knowledge, this is the first time a task of this nature is being undertaken.Our main success criterion is the fact that estimated parameters converge to the real parameter values.However, as our framework evolves, we will be able to track its relative performance.
Another area of focus for future studies is the optimal selection of action sequences.This can be particularly challenging since user preferences can change over time due to the abundance of new products and services.Algorithms to optimally select actions may require online learning and decision making in real time to accommodate these changes.

Figure 1 :
Figure 1: The Digital Feedback Loop.

Figure 2 :
Figure 2: A state-space model to represent evaluation of the user preferences without feedback effects.

Figure 3 :
Figure 3: A complete state-space model of the system with action generation and feedback effects.

Figure 4 :
Figure 4: Estimation of the underlying preferences vector when there is no feedback.The results are averaged over 100 independent trials.Here, we have no feedback and parameters of both the state equation and the observation equation are unknown.The results are shown for two different noise variances for the EKF formulation.