1. The Data Sharing Situation and the Data Cost Game
This paper broadens the game theoretic approach to the data sharing situation initiated by Dehez and Tellone [1]. The origin of their mathematical study is the data and cost sharing problem faced by the European chemical industry. Following the regulation imposed by the European Commission under the acronym “REACH” (Registration, Evaluation, Authorization and restriction of Chemical substances), manufacturers and importers are required to collect safety information on the properties of their chemical substances. There are about 30,000 substances and an average of 100 parameters for each substance. Chemical firms are required to register the information in a central database run by the European Chemicals Agency (ECHA). By 2018, this regulation program REACH requires submission of a detailed analysis of the chemical substances produced or imported. Chemical firms are encouraged to cooperate by sharing the data they have collected over the past. To implement this data sharing problem, a compensation mechanism is needed.
This data sharing problem can be specified as follows. A finite group of firms agrees to undertake a joint venture that requires the combination of various complementary inputs held by some of them. These inputs are nonrival but excludable goods, that is, public goods with exclusion such as knowledge, data or information, and patents or copyrights (the consumption of which by individuals can be controlled, measured, and subjected to payment or other contractual limitations). In what follows we use the common term data to cover generically these goods. Each firm owns a subset of data. No a priori restrictions are imposed on the individual data sets. In addition, with each type of data, there is a replacement cost corresponding to it, for example, the present cost of duplicating the data (or the cost of developing alternative technologies). Because these public goods are already available, their costs are sunk. In summary, the data sharing situation involves a finite group of agents and data sets owned by individual agents, as well as a discrete list of costs of data.
In the setting of cooperative attitudes by chemical firms, the main question arises how to compensate the firms for the data they contribute to share. The design of a compensation mechanism, however, is fully equivalent to the selection among existing solution concepts in the mathematical field called cooperative game theory. In fact, the solution part of cooperative game theory aims at solving any allocation problem by proposing rules based on certain fairness properties. For that purpose, the data and cost sharing situation needs to be interpreted as a mathematical model called a cooperative game by specifying its fundamental characteristic cost function. We adopt Dehez and Tellone’s game theoretic model in which the cost associated to any nonempty group of agents is simply the sum of costs of the missing data, that is, the total cost of data the group does not own. In this framework, no costs are charged to the whole group of agents. The so-called data cost games are therefore compensation games to which standard cost allocation rules can be applied, such as the Shapley value [2, 3], the nucleolus [4], and the core. The determination of these game theoretic solution concepts may be strongly simplified whenever the underlying characteristic cost function satisfies, by chance, one or another appealing property. The main purpose of this paper is to establish the so-called 1-concavity property for the class of data cost games, which has not yet been revealed. The impact of the 1-concavity property is fundamental for the uniform determination of both solution concepts the core and the nucleolus [5].
Definition 1 (see [1] with adapted notation).
(i) A data and cost sharing situation is given by the 3-tuple 𝒟𝒞=(N,𝒟,𝒞), where N is the finite set of agents, 𝒟=(Di)i∈N a collection of sets Di⊆D, i∈N, of data, and 𝒞=(cj)j∈D a collection of costs of data. So, D=⋃i∈NDi denotes the whole data set.
(ii) Given the set N of agents, let 𝒫(N)={S∣S⊆N} denote the power set of N. For every coalition S⊆N, S≠∅, let DS=⋃i∈SDi denote the data set of S. For every subset A⊆D of data, let c(A)=∑j∈Acj denote its additive cost, whereas c(∅)=0.
(iii) With every data and cost sharing situation 𝒟𝒞=(N,𝒟,𝒞), there is the associated data cost game 〈N,C𝒟𝒞〉, of which the characteristic cost function C𝒟𝒞:𝒫(N)→ℝ is given by C𝒟𝒞(∅)=0 and for all S⊆N, S≠∅,
(1)C𝒟𝒞(S)=∑j∈D∖DScjShortly,C𝒟𝒞(S)=c(D∖DS)=c(D)-c(DS).
By (1), the so-called data cost C𝒟𝒞(S) of coalition S equals the additive cost of duplicating the missing data, that is, costs of data the coalition does not own. Without loss of generality, it is tacitly supposed that there exist no overall missing data; that is, D=DN; otherwise the data cost of every nonempty coalition S would increase with the same cost amounting c(D∖DN)=c(D)-c(DN). In our framework, no data costs are charged to the whole set of agents; that is, C𝒟𝒞(N)=0. Obviously, every data cost game 〈N,C𝒟𝒞〉 satisfies both the (decreasing) monotonicity (i.e., C𝒟𝒞(S)≥C𝒟𝒞(T) for all S⊆T⊆N, S≠∅, due to DS⊆DT) and subadditivity as well (i.e., C𝒟𝒞(S∪T)≤C𝒟𝒞(S)+C𝒟𝒞(T) for all S,T⊆N with S∩T=∅).
Definition 2 (see [5–7]).
A cooperative cost game 〈N,C〉 with player set N is said to satisfy the 1-concavity property if its characteristic cost function C:𝒫(N)→ℝ satisfies
(2)C(N)≤C(S)+∑i∈N∖SΔi(N,C) ∀S⊆N,S≠N,S≠∅,(3)hhhhhhhhhhhhhhhhhhhC(N)≥∑i∈NΔi(N,C)where Δi(N,C)=C(N)-C(N∖{i}) ∀i∈N.
Condition (2) requires that the cost C(N) of the formation of the grand coalition N can be covered by any coalitional cost C(S) together with the marginal costs Δi(N,C), i∈N∖S, of all the complementary players. According to condition (3), all these marginal costs are weakly insufficient to cover the overall cost C(N). In the framework of data cost games, the latter condition (3) holds trivially due to the compensation assumption C𝒟𝒞(N)=0.
For 1-concave or convex games (N,v), its core and nucleolus have very nice structures, respectively. Its core is the convex hull of the extreme points, which are given by b→v-gv(N)·e→i, i∈N, where biv=v(N)-v(N∖{i}) and gv(N)=bv(N)-v(N), while its nucleolus agrees with the center of gravity of the core.
The next section is devoted to one significant proof of the 1-concavity property for data cost games.
2. 1-Concavity of the Data Cost Game
Theorem 3.
Every data cost game 〈N,C𝒟𝒞〉 of the form (1) satisfies 1-concavity.
Proof.
Let 〈N,C𝒟𝒞〉 be a data cost game. Fix coalition S⊆N, S≠N, S≠∅. We establish the 1-concavity inequality (2) applied to 〈N,C𝒟𝒞〉. Because of the compensation assumption C𝒟𝒞(N)=0, the condition (2) reduces to
(4)C𝒟𝒞(S)≥∑i∈N∖SC𝒟𝒞(N∖{i}) or equivalently,by (1),c(D)-c(DS)≥∑i∈N∖S[c(D)-c(DN∖{i})].
Write N∖S={i1,i2,…,in-s} where n-s denotes the cardinality of N∖S. Define, for every 0≤k≤n-s, the data set Aik=DS⋃ℓ=1kDiℓ, where Ai0=DS, Ain-s=DN=D. In this setting, using a telescoping sum, (4) is equivalent to
(5)∑k=1n-s[c(Aik)-c(Aik-1)]≥∑k=1n-s[c(D)-c(DN∖{ik})].
In view of (5), it suffices to show the following: for all 1≤k≤n-s(6)c(Aik)-c(Aik-1)≥c(D)-c(DN∖{ik}) or equivalently,(7)∑j∈Aik∖Aik-1cj≥∑j∈D∖DN∖{ik}cj.
In view of (7), in turn, it suffices to show the inclusion D∖DN∖{ik}⊆Aik∖Aik-1 for all 1≤k≤n-s. Finally, note that j∈D∖DN∖{ik} means j∈Dik, but j∉Diℓ for all ℓ≠k, ℓ∈N yielding j∉DS holds for any S∌k. Thus, j∉Aik-1 and j∈Aik.
Notice that the equivalence of (6) and (7) in the proof of Theorem 3 is due to the additive cost assumption in that c(A)=∑j∈Acj for any data subset A⊆D. We claim that the 1-concavity property is still valid when the characteristic cost function C𝒟𝒞:𝒫(N)→ℝ is of the following generalized form: there exists a real number β∈{1,1/2,1/3,…} such that
(8)C𝒞𝒟(S)=[∑j∈Dcj]β-[∑j∈DScj]β ∀S⊆N,S≠∅.
By (8), the data cost of coalition S equals the surplus of costs of data that the coalition does not own; where the surplus is measured by some concave utility function u(x) of the form x1/α such that α is any natural number (the case α=1 agrees with the additive cost setting).
Theorem 4.
Every generalized data cost game 〈N,C𝒟𝒞〉 of the form (8) satisfies the 1-concavity property.
Proof.
It suffices to prove the equivalent version of (6) as follows: for all 1≤k≤n-s (9)[∑j∈Aikcj]β-[∑j∈Aik-1cj]β≥[∑j∈Dcj]β-[∑j∈DN∖{ik}cj]β.
Write α=1/β. We make use of the fundamental calculus relationship: (10)x-y=[xβ-yβ]·[∑p=0α-1(xβ)α-1-p·(yβ)p] ∀x,y∈ℝ.
Fix 1≤k≤n-s. This fundamental calculus relationship applied to the validity of (6) yields
(11)[[∑j∈Aikcj]β-[∑j∈Aik-1cj]β]·A ≥[[∑j∈Dcj]β-[∑j∈DN∖{ik}cj]β]·B,
where the two real numbers A and B are given by
(12)A=∑p=0α-1[∑j∈Aikcj](α-1-p)/α·[∑j∈Aik-1cj]p/α,B=∑p=0α-1[∑j∈Dcj](α-1-p)/α·[∑j∈DN∖{ik}cj]p/α.
Note that A≤B due to the sum of increasing functions xq, where q>0. From (11), together with A≤B, we conclude that (9) holds.
Corollary 5.
According to the theory developed for n-person 1-concave cost games 〈N,C〉 [5], the so-called nucleolus cost allocation y→=(yi)i∈N∈ℝN for any data cost game 〈N,C𝒟𝒞〉 is given by
(13)yi=Δi(N,C𝒟𝒞)-1n·[∑j∈NΔj(N,C𝒟𝒞)-C𝒟𝒞(N)].
Because C𝒟𝒞(N)=0, it holds Δi(N,C𝒟𝒞)=-C𝒟𝒞(N∖{i}) for all i∈N and so, (13) simplifies as follows: for all i∈N,
(14)yi=-C𝒟𝒞(N∖{i})+Δ(N,C𝒟𝒞)n,where Δ(N,C𝒟𝒞)=∑j∈N C𝒟𝒞(N∖{i}).
According to (14), a player i receives a compensation which equals C𝒟𝒞(N∖{i}) and loses the average of the total coalitional cost the amount of which is (1/n)·∑j∈NC𝒟𝒞(N∖{i}). In particular, yi<0 if and only if C𝒟𝒞(N∖{i})>Δ(N,C𝒟𝒞)/n. In words, according to the nucleolus, a player i receives a compensation if and only if the coalitional cost C𝒟𝒞(N∖{i}) strictly majorizes the average of such expressions; that is, the (n-1)-person coalition not containing player i owns sufficiently few data.
3. 1-Concavity of Bicycle Cost Games
Throughout this section write N={1,2,…,n} and suppose that the individual data sets Di⊆D, i∈N, are nested which fits particular situations like, for instance, joint ventures between firms whose R+D programs are at different stages of progress [8].
Let us consider the decreasing sequence of individual data sets in that D1⊇D2⊇⋯⊇Dn. Under these circumstances, the data cost game 〈N,C𝒟𝒞〉 of the form (1) satisfies the increasing sequence 0=C𝒟𝒞({1})≤C𝒟𝒞({2})≤⋯≤C𝒟𝒞({n}), as well as C𝒟𝒞(S)=0 for all S⊆N with 1∈S, in particular C𝒟𝒞(N∖{i})=0 for all i∈N∖{1}, whereas C𝒟𝒞(N∖{1})=c(D1∖D2). Additionally, this type of data cost game satisfies the following relationship (which remains valid in case of an increasing sequence of individual data sets):
(15)C𝒟𝒞(S)=mini∈SC𝒟𝒞({i}) ∀S⊆N,S≠∅.
Write S={i1,i2,…,is} such that i1<i2<⋯<is. Because Di1⊇Di2⊇⋯⊇Dis, it holds C𝒟𝒞({i1})≤C𝒟𝒞({i2})≤⋯≤C𝒟𝒞({is}). Moreover, DS=Di1, and therefore, C𝒟𝒞(S)=C𝒟𝒞({i1})=mini∈SC𝒟𝒞({i}). The purpose of the remainder of this section is to show that the 1-concavity property remains valid for cost games 〈N,C〉 of the form (15) with arbitrary (not necessary zero) stand-alone costs C({i}), i∈N.
Definition 6.
A cooperative cost game 〈N,C〉 with player set N is called a bicycle cost game and an airport cost game [9] if its characteristic cost function C:𝒫(N)→ℝ satisfies
(16)C(S)=mini∈SC({i}) respectively C(S)=maxi∈SC({i})hhhhhhhhhhhhhhhhhhhhhhhhhhh∀S⊆N,S≠∅.
In the setting of owners of bicycles, any group of cyclists is not willing to spend more than the cheapest repairing cost of the best bicycle. In the setting of landings by different types of airplanes at some runway, the largest type needs the longest runway, yielding the highest stand-alone cost.
Theorem 7.
Every bicycle cost game 〈N,C〉 of the form (16) satisfies 1-concavity.
Proof.
Let 〈N,C〉 be a bicycle cost game. Without loss of generality, suppose that the stand-alone costs are ordered such that 0≤C({1})≤C({2})≤⋯≤C({n}). We establish the 1-concavity inequalities (2) and (3) applied to the bicycle cost game. Firstly, C(N)=C({1}) and secondly, the marginal costs satisfy Δi(N,C)=C(N)-C(N∖{i})=0 for all i∈N∖{1}, whereas Δ1(N,C) = C(N)-C(N∖{1})=C({1})-C({2}).
Fix coalition S⊆N, S≠∅. We distinguish two types of coalitions S. In case 1∈S, then Δi(N,C)=0 for all i∈N∖S, whereas C(S)=C({1})=C(N) and, in turn, the 1-concavity condition (2) is met as a system of equalities. In case 1∈N∖S, then (2) reduces to C(N)≤C(S)+C(N)-C(N∖{1}) or, equivalently, C(S)≥C({2}) and hence, the 1-concavity property holds too if 1∉S. This proof technique illustrates that the largest stand-alone costs C({k}), 3≤k≤n, do not matter for the 1-concavity property as long as their truncation remains above the second smallest stand-alone cost C({2}). In this setting, (3) holds trivially.
Corollary 8.
According to the nucleolus cost allocation (13) applied to bicycle cost games, the second smallest stand-alone cost C({2}) is charged equally to all players, except for the player with the smallest stand-alone cost who receives a compensation amounting the difference between both stand-alone costs. In formula, μi(N,C)=C({2})/n for all i∈N∖{1} and μ1(N,C)=μ2(N,C)-[C({2})-C({1})].
The proposed new basis has been introduced and developed in [10] as a subclass of 1-concave n-person games, which are called complementary unanimity cost games.
Definition 9 (see [10] with adapted notation).
With every coalition T⊆N, T≠N, T≠∅, there is associated complementary unanimity cost game 〈N,CT〉 given by
(17)CT(S)={1if S≠∅, S∩T=∅;0if S=∅ or S∩T≠∅.
In addition, the complimentary unanimity cost game 〈N,C∅〉 is given by C∅(∅)=0 and C∅(S)=1 otherwise. Note that CT(N)=0 for all T⫋N, except T=∅.
Corollary 10.
As shown in [10], the well-known Shapley cost allocation charged to the agents of any n-person complementary unanimity cost game 〈N,CT〉 amounts
(18)Shi(N,CT)=1n ∀i∈N∖T,Shi(N,CT)=1n-1|T| ∀i∈T.
Theorem 11.
Suppose without loss of generality 0≤C({1})≤⋯≤C({n}). Every n-person bicycle cost game 〈N,C〉 can be decomposed as the following linear combination of a number of complementary unanimity cost games with nonnegative coefficients:
(19)C=∑j=0n-1[C({j+1})-C({j})]·CLj,hhhhwhere L0=∅, Lj={1,2,…,j}hhhhhhhhhhhhhhhhhhhhh1∀j∈N.
The Shapley cost allocation Sh(N,C) for an n-person bicycle cost game 〈N,C〉 equals
(20)Shi(N,C)=-∑j=inC({j+1})-C({j})jhhhh∀i∈N, where C({n+1})=0.
The Shapley cost allocation Sh(N,C) for an n-person airport cost game 〈N,C〉 equals
(21)Shi(N,C)=∑j=0i-1C({j+1})-C({j})n-jhhhhhh∀i∈N, where C({0})=0.
Proof.
Fix coalition S⊆N, S≠∅. Write C(S)=C({k}) such that k∈S and ℓ∉S for all 1≤ℓ<k. Given any 0≤j≤n-1, the following equivalences hold: CLj(S)=1 if and only if S∩Lj=∅ if and only if 0≤j<k. From this, we derive the validity of (19). The validity of (20) is left for the reader, applying the additivity property of the Shapley cost allocation to (19) and taking into account (18) as listed in Corollary 10.
Because of the relationship maxi∈SC({i})=C({n})-mini∈S[C({n})-C({i})] for all S⊆N, S≠∅, every n-person airport cost game with stand-alone costs C({i}), i∈N, ordered as an increasing sequence, is associated with a bicycle cost game with adapted stand-alone costs C({n})-C({i}), i∈N, to be ordered as an increasing sequence. In this setting, (21) is a direct consequence of (20) applied to this latter bicycle cost game.
Remark 12.
It is left for the reader to check that the Shapley value of the form (20) can be written alternatively as follows:
(22)Shi(N,C)=C({i})i-∑k=i+1nC({k})k·(k-1) ∀i∈N.
According to the Shapley value of Bicycle Cost Games, we can understand it as follows: for the ordered cyclists with C({1})≤C({2})≤⋯≤C({n}), in the beginning, there is only one cyclist 1; the cost of the repairing fee for him is C({1}); then player 2 is involved in which makes the cost of repairing fee of player 1 less and the decreasing amount equals C({2})/2 while the cost of player 2 is C({2})/2; after that, player 3 joins in which makes the cost of players 1 and 2 less and the total decreasing amount is C({3})/3 which is divided equally between players 1 and 2 while the cost of player 2 is C({3})/3;…; finally, player n joins in; the cost of him equals C({n})/n, while this amount is divided equally among the other n-1 players.