N-Tier Soft Set Data Model: An Approach to Combine the Logicality of SQL and the Flexibility of NoSQL

To process a huge amount of data, computing resources need to be organized in clusters that can be scaled out easily. Still, traditional SQL databases built on the relational data model are difficult to be put to use in such clusters, which has motivated the movement namedNoSQL. However, NoSQL databases have their limits by using their own data models. In this paper, the original soft set theory is extended, and a new theory system called n-tier soft set is brought up. We systematically constructed its concepts, definitions, and operations, establishing it as a novel soft set algebra. And some features of this algebra display its natural advantages as a data model which could combine the logicality of the SQL model (also known as the relational model) and the flexibility of NoSQL models. 0is data model provides a unified and normative perspective logic for organizing and manipulating data, combines metadata (semantic) and data to form a self-described structure, and combines index and data to realize fast locating and correlating.


Background.
After entering the 21st century, with the outbreak of Internet applications, the total amount and complexity of digital information possessed by human beings have witnessed an explosive increase at an unprecedented speed, showing many new features. Some professionals believe that we have entered into the era of big data [1,2]. Databases, as the core part of information infrastructures, play a key role in this historical change. However, relational databases, which previously dominated the market, begin to appear inadequate to cope with some problems of big data [3].
In order to quickly process large volume, fast flowing, and complex data in a limited time to generate value, more computing resources must be acquired. ere are usually two schemes: scale up and scale out.
Scale up means configuring better performance hardware for a single computer, such as more and stronger CPUs, and larger and faster memories and disks, but without increasing the number of computers. However, the performance of computer hardware that can be obtained from the market in a period of time has its up limit, and the performance-price ratio of high-end products is usually low, which incurs high cost.
By increasing the number of computers rather than the performance of single computer, scale out incorporates a large number of high cost-effective, low (or mid)-end computers into a cluster to increase computing power. at not only reduces costs in comparison but also makes the cluster more resilient, namely, even if some of the computing nodes failed, the entire cluster can continue to provide services.
Firstly, according to the definition of the relation model and its normalization theory, a tuple is an ordered list of atomic values that cannot be nested or contain collection types (set, list, and so on) which is difficult to represent complex structures, but there is no such restriction on variables used by application programming, thus resulting in an "impedance mismatch" (a metaphor for the mismatch between the data forms of the relational data model and the application programming model). At present, this problem is usually adjusted by using the middle layer called ORM (object relational mapping).
Secondly, the relational model uses normalization to reduce redundancy and avoid exceptions and ensure the integrity of databases. In a relational database that follows the third (or higher) normal form, the data involved in an unit process of an application are typically scattered across different tables. In order to ensure the ACID (refers to the four basic elements of the correct execution of a database transaction, namely, atomicity, consistency, isolation, and durability) requirements of a transaction and the integrity constraints required by the normal form, a series of locks and resources are costed. In a situation of high concurrency or huge volume, that can egregiously affect the performance and availability of the database.
Moreover, the relational model is algebraically based on relation rather than mapping, which cannot express index by itself (while a mapping is a natural abstraction of an index in mathematics). at renders indexes are external structures, separated from data in implementation, which not only increases the demand of storage space but also makes data difficult to locate each other on their own. To correlate the data between different tables, it is necessary to write complex SQL queries and use expensive Join operation. And in order to support Join operation between tables, the related tables must be placed in a same node, which is not conducive to data dispersion in cluster and usually needs manual design for sharding, making relational databases difficult to scale out.
Meanwhile, a relational database needs a rigid predefined schema. One has to predefine the structures and constraints of tables. And the schema is very difficult to change in reality, falling short of dealing with changing sources and requirements.

NoSQL Databases.
ose problems of relational databases have motivated the development of some database products called NoSQL and inspired a new round of innovation for database theory and practice [3,6,7]. Different NoSQL products try to solve problems of relational databases from different aspects. According to the data models they use, NoSQL products can be divided into four main types: key-value store, column family, document, and graph [7,8]. Except for graph databases using graph as a data model, the data models of the first three are based on keyvalue structures. Key-value store databases are composed of simple key-value pairs, column family databases organize data into two-levels (or more) key-value mappings by row keys and column keys, etc., and document databases organize key-values into documents with accessible internal structures that can be nested with each other. e main reason why these databases convert the view of data from relations to key-value structures (including simple key-value, column, and document) is for dealing with aggregates. Unlike tuples in relational databases, aggregates are usually designed and used by upper applications (not by databases). It organizes all the data needed in a single processing unit to be accessed together, eliminating expensive and complex SQL queries and table Joins. Aggregates, as natural and independent data distribution units, also make data dispersed easily in a cluster. e form of aggregate is also free, which can easily add or delete content. So, impedance mismatch can be solved without ORM intermediate layers.
Although key-value typed databases have partly solved some problems of relational databases, they do not have rigorous mathematical foundations and there is no connectivity between aggregates, resulting in the difficulties of complex querying and understanding connections among data. On the other hand, relational databases with rigorous and precise algebraic foundation may use a powerful query language based on relational algebra to analyze and reason data freely and logically in the case of a small amount of data on a single machine. However, in the case of big data or in a cluster, it is also difficult to dig out value from the connections among data by using Join operation. So, the graph database, based on graph theory, is designed to explore the connections among data expediently. e graph model represents data as a set of nodes, node attributes, and edges, providing fast and efficient performance of traversing the whole graph with index-free adjacency. However, the graph model focuses on connections and networks, and it is not good at expressing entity and its attribution (mathematically, nodes in a graph have no attributes, and on the implementation, simple key-value pairs are used to store attributes), so it has a specialized range of application and lack of generality [9,10].
At present, the database models used by the mainstream are the relational model (SQL) and NoSQL (key-value, column family, file, graph, etc.) model. ey are proposed to solve the problem that the relational model is too rigid to change the database schema (especially in vast amounts of data) and difficult to distribute. However, the new NoSQL models sacrifice the mathematical rigor of the relational model and the freedom of query expression.
A model that combines the same mathematical logic foundation as the relational model and uses a key-value class data structure urgently requires studying. It can be easy to distribute and also change the mode. We think that this improvement can use the "key-value pair" data structure in a distributed environment to realize a database with rigorous algebraic logic, which combines the advantages of SQL and NoSQL, and has a specific practical significance.
1.3. Our Approach. All these problems motivate us to explore a new data model which will not only maintain the merits of key-value structures, lend data the ability to describe itself, and can be easily located and moved in a cluster but also have an appropriate normalization and a rigorous algebraic basis like the relational model that can enable a powerful query language independent of products to be applied freely and logically. At the end, we focused on an algebraic theory called soft set. Soft set theory is a mathematics theory proposed by Russian mathematician Molodstov in 1999 in order to solve uncertainty problems. e basic idea is to provide semantic parameterized sets by using a generalized set-value mapping [11].
Just because a soft set is a mapping that allows fuzzy semantics for its parameters and sets for its return values, and mappings in mathematics has natural connection with key-value structures, and sets as return values can have internal structures that can be manipulated, we finally saw the hope that soft set could be used as a mathematical abstraction for an intricate key-value structure [10,[12][13][14][15].
Molodstov gave the initial definition of soft set and a general operation and introduced several possible applications in [11]. Maji et al. studied the theory of soft sets in more details [16], introduced the concepts of subset, intersection, union, and complement of soft sets, and discussed their properties (but Yang and Ali et al. pointed out that these properties were incorrect and improved them [17,18]). Subsequently, a variety of operations and algebraic properties of soft set have been proposed and studied [18][19][20][21]. Original soft set has been extended by combining it with other uncertainty theories such as fuzzy set and rough set [22][23][24][25][26][27][28][29][30][31], and by using algebraic properties of soft set, new algebraic structures have been constructed [21,[32][33][34][35][36]. Cagman and Enginoglu gave a new definition of soft set in a form of the extension of set-valued mapping which is different from the original one. Base on that, several related operations have been proposed, a new theory system has been constructed, and a new decision-making method has been presented [37]. At present, soft set theory is widely used in parameter reduction and decision making [38], and a large number of methods for parameter reduction [39][40][41][42][43] and decision making [44][45][46] have been developed.
In the second section, we will review the soft set theory. Because previous soft set theories are not suitable to be the algebraic basis of the data model we need, we will extend the original soft set theory from the basic structure and systematically introduce a new soft set algebra called n-tier soft set, including its definitions, operations, and related concepts, which will form a complete system and provide the theoretical basis for the later data model. In the third section, we will illustrate why and how to use n-tier soft set to build a data model, define the infrastructure and modeling principles, and finally, explain its features and advantages.

N-Tier Soft Set Theory
(1) is definition is slightly different from Molodtsov's initial one [11], and it is more similar to Cagman's definition [37]. Generally, we prefer to define a soft set as a special mapping directly rather than an ordered pair consists of a mapping and a parameter set.
A mapping also can be treated as a set of ordered pairs, so an equivalent definition is given.
Definition 2. Let a nonempty set Ube a universal set and P(U)be a power set of U. A nonempty set Xis called a parameter set and { }is a Cartesian product of Xand P(U), Fis called a soft set if and only if F ⊂ X × P(U), and each x ∈ Xappears and appears only once as the first item in an ordered pair, which is e definitions above point out that mapping and set are two equally views of soft set. So, for soft set, general properties and operations about set are also suitable (for example, intersection, union, and complement in the sense of a general set). However, the results of these operations may not be enclosed in soft set (like the union operation ∪ under general sets may destroy mapping condition of soft sets). When applying these general set operations, we treat soft set as a general set directly. In addition, in the following discussion, we will frequently apply both mapping operations and set operations on soft sets to avoid introducing too many notations. For example, there are two soft sets F, G: X ⟶ P(U). F(x)is the image of x(an element in X) under the mapping rule by soft set F, while F ∩ Gis the intersection of two soft sets as the sets of ordered pairs. And (F ∩ G)(x)is the image of xby the intersection (noticed that the intersection of two soft sets preserves a mapping).
ose notations are concise and enable us to see an important property of soft sets clearly, that is, the ability to maintain mapping after some splitting, merging, or deformation operations.
Meanwhile, because a soft set can be seen as a set-valued mapping, and we also can consider a soft set as a set. Such definition provides a crucial recursive way to construct a new structure, which furnishes the soft set theory with a new and richer content. Next, we will introduce a new notation Sto represent a kind of sets of soft sets and define n-tier soft set.
Firstly, we define n-tuple, n-ary Cartesian product, and some other related concepts and introduce some notations to facilitate the following discussion.
In this paper, we use n(x)to denote the arity of x. Let i ∈ 1, . . . , n(x) { }, x[i]denote the i-th component from the right to the left in tuple x. x[∖i]is used to denote the new tuple obtained from the tuple xby removing the i-th component from the right to the left. Definition 4. Let X � (X n , . . . , X 1 )be an n-tuple which is composed of nsets, the n-ary Cartesian product is defined as follows: n � 1, Using the usual notation ×, it also can be denoted as X n × · · · × X 1 . e n-ary Cartesian product defined here is flat, noncommutative, and associative. Namely, that let X, Y, Zbe three sets: . . , X 1 )is called underlying sets of the nCartesian product ΠX, and the subset of n-ary Cartesian product is called an n-ary relation.
In particular, when n � 1, then Π(X 1 ) � X 1 , so the unary Cartesian product with only one set is equal to the set itself. Its elements and subsets are called unary tuple and unary relation, respectively ((x 1 )and x 1 are different representations of the same element). And if ∃X i � ∅, i � 1, . . . , n, then we can get Π(X n , . . . , X 1 ) � ∅from the definition directly.
When n � 1, we define Among which, P(X)refers to the power set of X, that is, the set of all subsets of X.
When n > 1, we define Among which, [X ⟶ Y]refers to the set of all mappings from the domain set Xto the codomain set Y.
(U n , . . . , U 1 )is called the underlying domains of F, denoted as und(F).
In this paper, n(F)refers to the arity of soft set F, that is, And dom(F) � U n , cod(F) � S(U n−1 , . . . , U 1 ), and ran(F) � F(x)|x ∈ U n are the domain, codomain, and range of F, respectively.
When n � 2, then any element in S(U 2 , U 1 ), which can be a mapping F: U 2 ⟶ S(U 1 ), is a binary soft set defined in Definition 1.
When n � 1, then F ∈ P(U 1 ), so Fdegenerates into a subset of U 1 . Following the name of unary relation, we call it unary soft set, and the underlying domain of it is a unary tuple, that is, und(F) � U 1 .
]is a set consisting of ternary soft set whose underlying domains are (C, B, A).
Next, we will define some other important concepts related to soft set.
Among which, f: refers to a mapping f, whose domain and codomain are Xand Y, respectively, and it maps x, an element of X, to f(x). Sometimes, we simply denote it as follows: Definition 7. Soft universal set Ο: let nbe a positive integer and U � (U n , . . . , U 1 )be an n-tuple consisting of nonempty domains. OUis called the soft universal set of SUif and only if Definition 8. Soft subset ⊂ : let F, G ∈ SU, in which U � (U n , . . . , U 1 )is an n-tuple consisting of nonempty domains and we call Fa soft subset of G, denoted as F ⊂ G, if and only if It is important to note here that in earlier soft set theory, the conditions of soft subset can be summarized as follows: [16]. By regarding soft sets as sets of ordered pairs, this definition means every ordered pair of Fis also in G, which can be expressed directly by the subset relation ⊂ of general sets. However, soft subset is a kind of special inclusion relations of soft set. When the mapping value of soft set is still soft set (rather than a simple set), we compare them by pairs that need recursion as the soft set of values until the mapping values are general sets. In addition, it is also important to note that, at present, we do not consider infinite situation, but only n-tier soft set related to finite n-tuple of domains so all recursive judgments are bound to end. However, how to generalize it to the infinite situation will be discussed in the future study.
is a subset relation of two soft sets in the sense of general set. e ordered pairs in the first set are all in the second set (but it automatically satisfies the definition of the soft subset at the same time), and is an example of soft subset, because the mapping values determined by the first soft set is subsets of the corresponding values of the second soft set, but none of the elements in the first set is in the second set.
is an n-tuple consisting of nonempty domains. We consider that Fis equivalent to G, denoted as F � Gif and only if Proof 1: By using the inductive method, we prove the base case of induction firstly. According to the definition, when n � 1, F, G ∈ S(U 1 ), then en, when n � 1, the proposition is true. Next, we prove the inductive step: if when n � k, the proposition is true, then, according to the definition, when According to the inductive hypothesis, and then

Mobile Information Systems
So, if the proposition is true when n � k, then the proposition is also true when n � k + 1. So, according to the induction principle, the proposition is true for any positive integer n, q.e.d.

□
Definition 10. Soft power set P: let F ∈ SU, in which U � (U n , . . . , U 1 )is an n-tuple consisting of nonempty domains. Set F ′ |F ′ ⊂ F}of all soft subsets of Fis called the soft power set of F, denoted as P(F).
It is easy to prove the following properties of soft subset and soft power set by using similar inductive methods in Proof 1.
For any F ∈ SU, in which U � (U n , . . . , U 1 )is an n-tuple consisting of nonempty domains, then F ⊂ F, ∅ ⊂ F ⊂ Ο , F ∈ P(Ο), SU � P(Ο), and for any F 1 , e specific proof is similar to Proof 1 and will not be repeated.

e Operations of N-Tier Soft Set
Definition 11. Soft union ∪ : let F, G ∈ SUin which U � (U n , . . . , U 1 )is an n-tuple consisting of nonempty domains and then F ∪ Gis called the soft union of Fand Gif and only if In addition, let S ⊂ SU, then ∪ Sis called the soft arbitrary union of Sif and only if Definition 12. Soft intersection ∩ : let F, G ∈ SU, in which U � (U n , . . . , U 1 )is an n-tuple consisting of nonempty domains, then F ∩ Gis called the soft intersection of Fand Gif and only if In addition, let S ⊂ SU, then ∩ Sis called the soft arbitrary intersection of Sif and only if Definition 13. Soft difference ∖: let F, G ∈ SU, in which U � (U n , . . . , U 1 )is an n-tuple consisting of nonempty domains, then F∖Gis called the soft difference set of Fand Gif and only if Definition 14. Soft complement c: let F, G ∈ SU, in which U � (U n , . . . , U 1 )is an n-tuple consisting of nonempty domains, then OU∖Fis called the soft complement of F, denoted as F c .
Definition 15. Soft symmetry difference △ : let F, G ∈ SU, in which U � (U n , . . . , U 1 )is an n-tuple consisting of nonempty domains, then (F∖G) ∪ (G∖F)is called the soft symmetry difference of Fand G, denoted as F△ G. e above operations of n-tier soft set have the following properties.
Definition 16. Soft range ran: let F ∈ SU, in which U � (U n , . . . , U 1 )is an n-tuple consisting of nonempty domains, and then ran(F)is called the soft range of Fif and only if Definition 17. Key set key: let F ∈ SU, in which U � (U n , . . . , U 1 )is an n-tuple consisting of nonempty domains, and then key(F)is called the key set of Fif and only if Definition 18. Value set val: let F ∈ SU, in which U � (U n , . . . , U 1 )is an n-tuple consisting of nonempty domains, and then val(F)is called the value set of Fif and only if is an n-tuple consisting of nonempty domains, and P(t n , . . . , t 1 )is n-ary predicate, so Please note that an n-ary predicate is reduced to an n − 1predicate when its variable is fixed. For example, suppose a 3-ary predicate P(t 3 Because of no ambiguity, we use the same token for the n-tier soft set and the n-tuple, and the reader can distinguish them from each other by context.
In particular, when Fis a binary soft set, the only domain rise F〈1is called the reverse of F.

Definition 22.
Uncurrying uc: let F ∈ SU, in which U � (U n , . . . , U 1 )is an n-tuple consisting of nonempty domains, and uc(F)is called the uncurrying of Fif and only if Uncurrying transforms an n-tier soft set into an n − 1-ary mapping.
Definition 23. Currying cu: let nis a definite positive integer, and U � (U n , . . . , U 1 )is an n-tuple consisting of nonempty domains. f ∈ [U n × · · · × U 2 ⟶ P(U 1 )]and cu(f)are called the currying of fif and only if Mind here, an n-ary function is reduced to an (n−1)-ary function when one of its variables is fixed. For example, set f(x, y) � (x/y)will be reduced to f(1, y) � (1/y)when the value of xis fixed. Generally, we obtain the (n−i)-ary function which can be denoted as f (c 1 ,...,c i ) by taking ivalues of the n-ary function from left to right, continuously, and Currying transforms an (n−1)-ary mapping into an n-tier soft set.
Particularly, Fis a set when n � 1, and F + G, directly denoted as G(F), is also called the restriction of Gunder F.
is an n-tuple consisting of nonempty domains and V � (V m , . . . , V 1 )is an m-tuple consisting of nonempty domains. We call F × Gis the soft direct production of Fand Gif and only if Definition 26. Soft mapping production ∞: let (U 2 , U 1 ), (V 2 , V 1 )be a binary tuple consisting of two We call F ∞ Gis the soft mapping production of Fand Gif and only if According to the definition, soft mapping production is associative, Definition 27. Soft relation: let F 1 , . . . , F n be nbinary soft sets and Ris called the soft relation whose underlying domain is Definition 29. Associated soft set of a relation: let R ∈ P(U n × · · · × U 1 )be an n-ary relation (if R is an empty set, according to its assumption and context, R ∈ P(U n × · · · × U 1 )can be considered as an n-ary empty relation whose underlying domain is (U n , . . . , U 1 ). fun(R)is called the associated soft set of a relation of Rif and only if Mind here, we indirectly used inductive definition of tuple. at is, any n-ary tuple could be considered as a nested binary tuple when n > 1. For an n-ary relation and a definite value c ∈ dom(R), y|(c, y) ∈ R is an (n − 1)-ary relation consisting of (n − 1)-tuples (if y|(c, y) ∈ R is an empty set, it can be seen as an (n − 1)-ary empty relation).
For mathematics, the n-tier soft set defined in this section and its operations have a wealth of contents to be studied.
ey have nice properties, soft intersection, soft union, and soft complement, and other operations satisfy all the properties of common set operations (commutation law, association law, etc.). However, this paper does not focus on the discussion of the mathematics. Next, we will focus on explaining why and how to use n-tier soft set as a data model for databases in the era of big data.

N-Tier Soft Set Data Model
ere is no natural expression for the existence of things or events. Only by purposeful selection, abstraction and simplification can we transform some specific aspects of irregular fields to structured and manipulatable objects. Data model describes the static characteristics and dynamic behavior of database system from the abstract level, providing a logical abstract framework for data representation and operation, and fundamentally determines how data are stored, organized, and manipulated.
erefore, the data model is the core and foundation of the database system, and all database systems must be based on a certain data model. e data model also constitutes a bridge between the upper applications, database system itself, and its underlying physical implementation, which enables them to view and use the data in a unified way.
We have already explained the problems of the relational data model and the most popular NoSQL data models in Introduction. In the second section, n-tier soft set is defined as a nested set-valued mapping that makes it possible to express complex key-value structures. Next, we will set up a new data model by using n-tier soft set algebra.
Just like that we often use a table to represent a relational model, in order to illustrate easily, we will introduce a plain text representation of n-tier soft set at first. It is similar to JSON and independent of specific programming languages, which is called SSSN (soft set serialization notation). e basic construction rules are as follows (just for a demo, the strict definition and parse method will not be discussed in this paper): (1) Representing strings with double quotation marks, numerical values with literal numbers, and Boolean values with true/false, for example, "hello World this is SSSN" #String 12345678 #Number true #Boolean (2) Representing tuples with contents enclosed in parentheses and separated in comma, for example, ("Joe," "Male," "New York") (3) Representing sets with contents enclosed in brace and separated in comma, in which the elements cannot be repeated, for example, {"Elephant," "Monkey," "Zebra," "Panda"} (4) Representing mappings with contents enclosed in brace, separated in comma and matched by colon (several-to-one ordered pairs, and the left side of colon cannot be duplicated.), for example, {"name":"Joe," "sex":"male," "address":"New York"} In the following discussion, we can see that when we express and pass n-tier soft set in this way, we can not only express and pass the semantics and data integrated by n-tier soft set but also express some important logical constraints.

Definitions of N-Tier Soft Set Data
Model. Next, we will define the n-tier soft set data model.  (52) erefore, all the objects defined in this section can be represented as an n-tier soft set consisting of a semantic set Sand a dataset A. So, the semantics, data, and relations can be fully expressed in an n-tier soft set system without the participation of external information. All the operations defined in Section 2can be directly and conveniently applied to a model or an instance.
For example, for the domain relation soft set Rin Example 5above, we can transform the binary function into a nested soft sets by currying operation (Definition 23) and raise the second domain (Definition 21), which can be denoted as uc(R)〈2, as follows:  [47], which is a multilevel mapping with domain "person" as row keys, and domain "name," "sex," etc. as column keys. And we can also select some of them to form a new 4-ary soft set by selection in Definition 19or delete a certain key level to make it a 3-ary soft set by domain remove operation in Definition 20. Or to form a deeper, larger n-tier soft set by product operations such as soft direct product (Definition 25), concatenate product (Definition 24), and so on. It should be noted that all these operations are defined recursively just for the rigor of mathematical logic and the convenience of proof and do not imply that they must be implemented by recursive algorithms.

Modeling with N-Tier Soft Set.
rough the above definitions, we get the basic components needed to build the n-tier soft set data model. Next, we use an example to demonstrate the evolution from the relational model to the four popular NoSQL models, then to the n-tier soft set model, to show why and how to use the n-tier soft set data model for modeling.
In the traditional modeling process for relational databases, the initial stage of modeling is to understand conceptual entities in the modeling domain and the relationship between them.
rough the discussion between domain experts and system architects and data architects, the results of these understandings often end up forming a so-called conceptual model, which is often represented by an ER diagram. Although the ER diagram is often used in the modeling process of the relational model, it can also provide a common conceptual starting point for all other models in our discussion.
We suppose that we have designed a conceptual model, as shown in Figure 1. It shows a simplified scenario of a common E-shopping site, which contains four entities: customer, order, order item, and product, and represented in a rectangle, respectively. Ellipses connected to an entity with undirected edges represent the attributes of each entity, and the underlined ID attribute uniquely identifies a particular entity. A customer can place multiple orders, each of which contains multiple order items, and each order item relates to a particular product. e relationships between these entities are represented by a diamond with undirected edges, and the quantitative relations are represented by n or 1 on both sides of the diamond. At last, customers can follow each other and know what their friends have bought in this shopping site. We represent the relationship in a diamond with both ends connected to the customer entity, and we use m and n on both sides of the diamond to indicate a multilateral relationship. In the following diagrams, we use rounded rectangles to represent data blocks, which can be atomic (if no internal structure is indicated) or composite (larger rectangles wrapped in other rectangles). Red represents semantics, straight lines represent undirected relations, and arrows represent directed relations.

Modeling with Relational Model.
en, firstly, let us see how to model the scenario with the relational model.
After obtaining a suitable conceptual model, the relational model transforms it into the structures and constraints of tables. As shown in Figure 2, the ID attributes that uniquely identify entities become the primary keys of the tables. e 1 : 1 and n : 1 relations among entities are represented by foreign keys inserted into the corresponding tables (such as customer_id in order table, or order_id in order_item table), and the m : n relationship will be implemented by adding a new relation table.
e advantages of relational model lie in its simple and intuitive expression, strict and nice mathematical foundation, and the freedom from the separation of logic and physics. Without any underlying implementation information, a relational database can freely express and obtain information contained in an existing dataset by a small amount of concise operations (relational algebra has been proved to be equivalent to first-order predicate calculus restricted in secure expressions). However, we can also observe several problems with the relational model: as you can see from Figure 2, a table is a regular two-dimensional rectangular array. It consists of tuples that contain the same number of indivisible atomic elements, and a single header provides semantic interpretation for tuples.
is form is simple and regular but can lead to the following problems: (i) Flat: a tuple is a flat and restricted structure, which can only contain indivisible elements. ese elements are regarded as atoms at the model level. ey have no internal structure and cannot be nested, which restricts its ability to express complex objects and brings the so-called impedance mismatching problem. (ii) Rigid: in a table, every tuple must contain a same fixed number of elements, and each element is rigidly coupled with its position, so even if there is actually no value in a position, its place shall be filled with the null value. (iii) Semantic and data separation: table heads as semantics and table bodies as data are separated. In the theory of the relational data model, table names and column names are defined by a metalanguage, and in a specific implementation, a relational database uses a data dictionary separated from the data to store these metadata. at makes it necessary to   process metadata separately before transferring data. is separation of semantic and data makes it difficult to transmit data in a network, while other data formats such as XML or JSON combining semantics with data can enable the transmission of complete information at the same time. (iv) Index and data separation: the relational model does not express information about how tuples are located or sorted. To find tuples containing certain values in a table, one has to scan and compare them one by one. is renders the relational model too reliable on the external index structure in real use. However, indexing is not a part of the relational model. It not only consumes large storage space but also incurs maintenance costs. (v) Data and data separation: whether in the same table or between different tables, tuples of relational models are separated from tuples. eir connections which need to be calculated dynamically are implicit in the value of specific data. Conceptually, this shows that the relational model does not directly express the relations between entities. To find links between entities, it is necessary to connect tables with Join operation, which is usually very timeconsuming.

Modeling with NoSQL Models.
ese problems in the relational model have prompted the development of NoSQL data models and database products.
(1) Key-Value Store. Let us first look at the simplest of these: key-value store. e data model of key-value store is very simple. As shown in the Figure 3, the whole database can be divided into two parts: the set of keys on the left and the set of values on the right. We use arrows to indicate the corresponding one-directional access. In our case, we use order id as the keys, and all information related to an order id is placed in its value. e specific content of a value is determined by the upper application, and the database is only responsible for access. eoretically, key-value store only focuses on the effective access of data, and values are not transparent to the database, which requires users to parse by themselves. If only a part of a value is required, it entails a process of extracting the entire value and filtering out unwanted content, which may be inefficient. So, the column family model and the document model add more internal structures to the values.
(2) Column Family. Logically, a column family model can be regarded as adding a secondary column name to value pairs in the values of a key-value store model, and these secondary pairs can also be grouped into column families. As shown in Figure 4, on the left, the primary keys are also called the row keys, which locate a virtual row. On the right, column name strings (characters enclosed in quotation marks) as secondary keys are located to the values (technically, tertiary keys may also be included, such as time stamps, version stamps, and so on, but skipping them does not affect our discussion). e prefixes in the column name strings divide them into different column families. e column family model can be regarded as a huge sparse two-dimensional table, which is more expressive than the key-value model. And because columns are represented by key-value pairs, they can be added and deleted freely. In our case, like the key-value store model, we also use order id as the row key. However, the value has a richer structure. We store all customer information by customer column family and all order items by order item family and merge product information into them (because product and order item are one-to-one relationships). Different order item information is distinguished by assigning a number to the column key.
(3) Document. e document model has more richer value structure than the column family model. As shown in Figure 5, a document database stores and retrieves all documents as a file cabinet. ese documents contain simple key-value pairs (similar to key-value store), nested key-value pairs (similar to the combination of row keys and column keys of column families), lists (returning by sequential numeric subscripts rather than keys), and other nestable contents. is makes the document model even more expressive, and a document can be easily converted into a programming object in an upper application. Like all keyvalue typed models, the form of documents is flexible, and various structures in documents can be added or deleted freely. In our case, all information of customers, orders, and products is included in a document, which looks like an actual order list.
Generally speaking, all above three models use key-value pairs as basic structures to organize data. Different models use different structural values, which provide different ways of aggregating information.
Key-value pairs are simple but essential. Keys can provide semantics for the values, which uncouple data and their positions, and eliminate the rigidity of system. A keyvalue pair is a self-described entirety that is no longer dependent on each other in form. At the same time, keys can also help locate values so that they can be accessed quickly.
is allows key-value pairs can be easily dispersed into a cluster, and their contents and forms can be very free and flexible. So, we can predetermine all the required content according to the convenience of the upper application and aggregate it together for fast access without Join operation.
at partly solves the problems of the relational model. However, key-value typed models also have some problems: (i) Values can only be accessed one way by keys, and keys cannot be retrieved by values reversely (we can see the directions and granularities of access for different models through the arrows shown in the figures). To find the specific key-value pairs by values, it is necessary to compile external indexes or use external frameworks such as MapReduce for scanning processing. (ii) ere is no connection between key-value pairs.
Discrete key-value pairs have many advantages, and they can be formed and operated independently, but we also hope that they can maintain their logical connections (we will see how to achieve this in the subsequent discussion about the n-tier soft set model). (iii) e form of key-value typed databases is changeful (known as schemaless databases), but it is not the case for query and reasoning (which is what the relational model good at). e contents of aggregates are prepared and stored for specific needs, and aggregates designed for an application are not necessarily suitable for others, which becomes another kind of inflexibility. (iv) Key-value typed models have no rigorous mathematical basis. A strict mathematical foundation not only makes the definition and expression of the model more rigorous but also facilitates the theoretical study of the model, the deduction of its properties and theorems (or makes use of existing results), and the recognition of its logical reliability and completeness. It is also easy to design a concise and general query language (for example, the relational model achieves a powerful logical expression with a few operations).
(4) Graph. Graph models focus on solving the problem of lacking connections in the relational model and key-value typed models. As shown in Figure 6, the graph model consists of nodes and edges. Nodes are connected by edges, which can be directed or undirected. Nodes and edges can have attributes, which makes each look like a row in the column family model or a document in the document model. However, nodes are not separated but linked together by edges. In contrast, the main point of graph modeling is not to express the attributes of nodes or edges but to describe the connections between nodes. In our case, in the upper part of Figure 6, the followship network can be clearly expressed and easily queried by using a graph model, which is difficult to implement with the relational model and other NoSQL models. Based on graph theory, the graph model has a mature mathematical foundation and a large number of forthcoming achievements (theorems and algorithms), which makes it have the ability to deal with Key order_id Value customer_id, order_time, …, product_id, product_name … … Figure 3: Key-value model for E-shopping.

Row key order_id
Column family customer Column family order_item "customer: id" "customer: id" "customer: id" "customer: name" "customer: name" "customer: name" "order_item:1: product_id" "order_item:1: product_id" "order_item:1: product_id" "order_item:1: product_name" "order_item:1: product_name" "order_item:1: product_name" connections easily and solve complex problems such as finding the shortest connection path between two nodes. However, when it comes to the issues that focus mainly on entities and their attributes (for example, classification or statistics reports), graph models have the same problems as other NoSQL models. For example, in order to count the proportion of male and female users in a followship network, we still need an external index to locate the nodes from attributes or count nodes by scanning the whole network.

Modeling with N-Tier Soft Set
Model. Various models have been discussed above, as well as their problems. Now, let us take a look at how to modeling with the n-tier soft set model (hereafter referred to as the NTSS model).   represented by a pair of domain relations whose names are reverse tuples (like ("customer_id," "e-mail") and ("e-mail, customer_id")) and values are reverse binary soft sets. e connection is undirected, and the data on both sides of the connection can be accessed symmetrically. For example, in Figure 9, the relation between "cus-tomer_id" and "e-mail" is a 1:n relation (one customer can have multiple e-mail addresses). So, the value of domain relation ("customer_id," "e-mail") is a common soft set, and the value of domain relation ("e-mail," "customer_id") is a single-valued soft set.
(2) Features and Advantages of the NTSS Model. e whole picture of converting the ER model in Figure 1to the NTSS model is shown in Figure 10. } So, if we use a hashtable to be the underlying implementation of an NTSS database, the information contained in the keys will be implied in storage addresses, and values will be hashed but maintain the logical structure of the database. In usage: through our formal definitions, for the upper application programming users, an NTSS database is just a function with a set of well-defined operations and uniform specifications. In fact, referring to the example mentioned above, let B be the database soft set which contains the "E−Shopping" database, and in upper programming languages, the database soft set B is just a function which return values are also functions. By giving a parameter "E−Shopping," B ("E−Shopping") returns the value (a domain relation soft set) of a database named "E−Shopping," which can still be regarded as a function. By giving a parameter ("customer_id," "name"), then B ("shopping") ("customer_id," "name") will return the value of the domain relation (still a function) between "customer_id" and "name." By giving a "customer_id" such as "0001," then B ("shopping") ("customer_id," "name") ("0001") will return the name of the customer. is is very natural to the language, which supports functional programming, and naturally constitutes a concise query language.
First, we show the performance advantages of the NTSS database over the relational database through a comparative experiment. We implemented a prototype database based on NTSS (using Python) and compared it to 8.0.15 version of MySQL on a computer with 2.6 GHz Intel Core i7, 16 GB 1600 MHz DDR3, and 512 GB PCI SSD. We built three experimental data tables, Customer, Product, and Buy, to express the records of customers purchasing products. Each time 10,000 records of data are written, the time consumption of write is recorded, then the names of the customers who purchased the random 5 products are queried, and the time consumption of read is recorded.
When MySQL was indexed, the insertion and reading time is O(log(n)) in theory (because the index of MySQL is usually implemented by B + tree), while NTSS is O(1). From the actual test data, we can see that the insertion and reading of MySQL increase with the increase of the amount of data, while the insertion and reading of NTSS fluctuate stably in a certain range.
For space usage, NTSS is about 2.73 times as large as a nonindexed MySQL database (NTSS: 402 MB, MySQL: 147 MB) to store the same data. However, if MySQL wants to query more freely (index all columns), its index space will be about 307 MB, so it will take up 147 + 307 = 454 MB in total, which is higher than that of NTSS.
We do not compare performance with the current NoSQL databases. As a prototype database implemented in Python, there is no comparability between NTSS and the mature NoSQL database that has evolved for many years in performance. Compared with the current NoSQL database, NTSS has the advantages of query freedom and mathematical logicality. Taking MongoDB as an example, as a popular database, MongoDB is widely applied in everyday applications and has extremely high performance in some queries, but it has no mathematical logicality and cannot query freely (strong at query key to value, but weak at query key to value). So, if you need to get the relationship between the values, it will cost a lot (need index structure or traverse scan). However, the NTSS model is a model with complete mathematical logicality and can query freely between key and value. e NTSS database cannot compete with Mon-goDB from an implementation perspective because NTSS only stays at the prototype level and will gradually approach the current mainstream NoSQL database through future improvements.
Based on the above experiments and previous discussions, we can clearly see that the NTSS model has the following advantages: (i) Efficient performance: as we have seen in the comparison experiment, MySQL is a relational database whose data and indexes are separate, and its performance depends on the design of indexes; the write and read performance and the convenience of query cannot be taken into account at the same time. However, an NTSS database can be transformed to key-value pairs and implemented as a hashtable directly; therefore, any data in it can be write or read with an average time complexity of O(1). (ii) Schemaless: the NTSS model represents entities or aggregates as interconnections between domains, rather than a fixed table. Connections in the NTSS model are logically represented by n-tier soft sets and implemented by key-values in the underlying, which are independent of each other and can be added or deleted at will without mutual influence. is solves flat and rigid problems in the relational model. For example, if we want to split the "name" domain which is connected to "customer_id" into "first_name" and "last_name," we only need to add two new connections between "first_name," "last_name," and "customer_id" and delete the original one. is does not affect other parts of the database neither logically nor physically. (iii) Semantic and data integration: the NTSS model represents semantics and data in an integrated way, which makes it is easier to move and disperse. It is no longer necessary to process metadata separately. (iv) Index and data integration: an instance of the NTSS model is a nested index structure, and each atomic datum has a unique logical access path. e data stored in a database formed by the NTSS model are a complete index system itself, and every domain in it can be used as index key to indicate   Figure 11: Performance comparison between NTSS and unindexed MySQL. NTSS write   1  5  9  13  17  21  25  29  33  37  41  45  49  53  57  61  65  69  73  77  81  85  89  93  97  0   1   2   3   4   5   6 Write (MySQL is not indexed) data in other domains connected to it, which solves the problem of index and data separation of the relational model. And it becomes the key to efficient performance and sufficient connections. (v) Sufficient connections: the atomic data in an NTSS database are no longer isolated, but in a network. In the NTSS model, entity domains are connected to each other, and attribute domains are connected to entity domains. ese connections are static states of the model, and each connection is bidirectional.

MySQL write
is solves the problems of lack of connection in the relational model or the key-value typed models, and the key-value typed model can only be accessed in one direction. (vi) Rigorous mathematical foundation: based on n-tier soft set theory, the NTSS model has a rigorous formal definition. at is not available in other key-value typed models. is not only makes the NTSS model more precise in definition and expression but also facilitates more in-depth theoretical research. It enables us to infer richer properties (or to use the existing mathematical research results of soft sets) and to understand its logical reliability and completeness. It is convenient to design a concise and general query language and achieve complete logical expression ability with as few operations as the relational model. (vii) Powerful query ability: through the rigorously defined operations, fast access brought by index and data integration, and sufficient connection between data, the NTSS model has the ability to query as freely and completely as the relational model but in a big data environments. In the comparison experiment with MySQL, we not only write and read key-values but also write the same logical structure as the relational model and implement the same query as the multi-table join SELECT SQL statement. (viii) Convenient for programming usage: from a programming perspective, all the structures that make up the NTSS model include tuples, sets, and dictionaries are built into most programming languages and can be processed natively.
(ix) Easy to modeling: from the similarity between the NTSS model and the ER model, it can be seen that the macroscopic view of the NTSS model is close to the original appearance of human thinking and modeling, so that modeling can be carried out intuitively. (x) Convenient for statistical use: each domain can be used as a statistical dimension, and most of the values related to it have become a set that can be directly obtained. For these sets: counts, sums, averages, and other statistical indicators are easy to calculate.
Using the conclusions in [3,[5][6][7][8], we summarize the difference between the relational model, the four NoSQL models, and the NTSS model as shown in Table 1. rough the discussion above, we can see that the NTSS model is indeed a data model suitable for dealing big data with 4 Vs. For Volume, an NTSS database is a discrete keyvalue structure and has natural support for distributed clusters. For Velocity, the underlying implementation of key-values provides fast and flexible data processing. For Variety, as a schemaless model, it can be altered at will, making it easy to respond to changing requirements or different data sources. For Value, the complete logical structure is preserved between the data and can be queried freely, and storing set values also facilitates statistics and data mining. Moreover, based on the features of the NTSS model, it is possible to realize an implementation with intelligent data distribution, which can automatically adapt to the status of the cluster, intelligently divide the soft aggregations, and still maintain the semantic and logical structure between the data, without manual sharding design or aggregation design.

Conclusion
e n-tier soft set theory and n-tier soft set data model have been proposed. We defined them in a strict formalized way and illustrated the process and design considerations. We explained why and how to use the n-tier soft set model to modeling, described the features and advantages of it. However, a lot of details have not been covered, such as richer algebraic properties and detailed implementation aspects, which will be progressively fulfilled in the future.
However, we believe that through this paper, we have not only expanded the frontier of soft set theory but also shed light on a promising prospect of developing a new database product based on the NTSS model to meet the challenge of big data. In the future, the database will be rewritten using Scala, unlike a theoretical verification based on Python Implementation currently and open-source to improve its ability.

Data Availability
e data used to support the findings of this study are available upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.