Grouping Optimization Based on Social Relationships

Grouping based on social relationships is a complex problem since the social relationships within a group usually form a complicated network. To solve the problem, a novel approach which uses a combined sociometry and genetic algorithm CSGA is presented. A new nonlinear relation model derived from the sociometry is established to measure the social relationships, which are then used as the basis in genetic algorithm GA program to optimize the grouping. To evaluate the effectiveness of the proposed approach, three real datasets collected from a famous college in Taiwan were utilized. Experimental results show that CSGA optimizes the grouping effectively and efficiently and students are very satisfied with the grouping results, feel the proposed approach interesting, and show a high repeat intention of using it. In addition, a paired sample t-test shows that the overall satisfaction on the proposed CSGA approach is significantly higher than the random method.


Introduction
Grouping optimization based on social relationships attracts more and more attention as social networks 1-11 have been growing rapidly in recent years.A social network is a special structure made of individuals or organizations .It includes the ways in which individuals are connected through various social familiarities 1 .Through the analysis of relational network and the measurement of social relationships, grouping optimization is obtainable and can achieve a group objective.
In dealing with the grouping optimization problem concerning social relationships, some important issues should be addressed.Firstly, there is a deep discrepancy between the official and the secret behaviour of members 12 .A good approach to the grouping optimization should be able to find out the hidden information behind a group.Relations can be conceptualized at three levels of social complexity: individual, dyad, and group 13 .The approach should be able to disclose the hidden information in these levels.In addition, the approach should be cost-effective.Consequently, a simple method which can explore indeed thinking of individuals is needed.Another issue is that most social networks such as Facebook employ a linear model to measure the links between group members.They use the number of connections that a node has to evaluate degree, betweenness, closeness, and network centralization 6 .However, a nonlinear relation model should be used since the real relationship tends to be nonlinear.For example, suppose that two nodes have the same number of connections.The connections for the first node are all its first choices, while for the second one the connections are its, for example, second and third choices, respectively.The disappointment in being grouped with a second choice partner over a first is unlikely to be linearly related to that of being grouped with a fourth choice over a third.A nonlinear model should be developed to suitably describe the relative importance of relationships.Thirdly, the approach should be able to help organizations improve their performances.Finally, but not the last, many grouping optimization problems subject to some constraints are known as NPhard 14-16 .In addition, the structure of social relations usually forms a complex network, causing the problem to be very difficult to deal with.Another cause to make the grouping optimization more complicated is the constraints that require each group to include some different attributes, such as ability, specialty, gender, and position.
To tackle the above-mentioned complex problem, a new approach based on a combined sociometry 17-21 and genetic algorithm GA 22-28 is employed.For convenience, we call this novel approach CSGA in short.To measure the social relationships between members and to explore the hidden information behind a group, a choice-making method 29 is employed.Group members are first asked to indicate their choices or preferences to other members in an index system that a one stands for the first choice, two the second choice, and the like until an allowed maximum number of choices is reached.Based on the choices, a sociogram can be drawn by using a sociometry tool 17-21 , as illustrated in Figure 1.Previous studies showed that sociometric choices do tend to predict such performance criteria as productivity, combat effectiveness, training ability, and leadership 30 .Moreover, some indices of social status can be calculated.In this paper, a new nonlinear model for measuring social status is presented.From the sociogram and social status indices, the socially vulnerable or the potential bullying members can be found.Then the grouping optimization can be done according to the social status of individuals and the objective of the organization.Results from this study show that the proposed approach is effective in dealing with the grouping problem mentioned above.Furthermore, experiments founded on real data also show high satisfaction and repeat intention with this method.
The remainder of this paper is structured as follows.The grouping problem is briefly introduced in Section 2. In the subsequent section, the sociometry is briefly introduced and a new modified model for measuring relations between members is presented.Then proposed approach is presented in Section 4. Subsequently, results and discussion are presented.Finally, concluding remarks are drawn in Section 6.

The Grouping Problem
The grouping problem belongs to a large family of problems which partitions a set of items U into a collection of mutually disjoint subsets U i of U, such that ∪U i U and U i ∩ U j ∅, i / j.In this study, the grouping is to optimally assign m members individuals to g subgroups with n members in each subgroup.Let M {1, 2, . . ., m} be a set of members, G {1, 2, . . ., g} a set of subgroups, P {1, 2, . . ., n − 1} a set of partners of a member, A {1, 2, . . ., a l } a set of attributes or position , and C {1, 2, . . ., N p } a set of choices, where m is the number of members in a group, g is the total number of subgroups, n is the number of members in each subgroup, and a l is the number of attribute l.The members can list a number of preferred members with whom they would like to work with or sit beside by an index system that a "1" stands for the first choice, "2" for the second choice, and the like, up to a preassigned maximum integer N p .Note that a member cannot select himself or herself as his or her own partner.Slavin suggested that the best group size is from two to six 31 .For i ∈ M, k ∈ P , define c ik as the preference given by member i to being grouped with his/her partner k and c ki as the preference given by partner k to being grouped with member i.If partner k is the first choice of member i, c ik is equal to one, the second choice, c ik is equal to two, and the like.A lower coefficient of preference means that a member has more preference with his/her partner.If member i does not include his/her partner k in their list of preferences, then c ik is assigned a relatively large penalty value b.The mathematical formulation, hence, can be expressed as follows: z il 1 if member i has an attribute l, 0, otherwise, 2.5 where R i k∈P {f c ik f c ki }, for all i ∈ M. The objective is to minimize the total scoring value, as shown in 2.1 .Note that the scoring value is composed of two main parts: the priority weight w i of member i and scoring function R i .The priority weight can be assigned by two ways: directly assigned by the manager of the organization or calculated based on the social relationships.As for the scoring function f • , a popular function used is the squared function 29 .Equation 2.2 requires that each member is assigned exactly one subgroup.Equation 2.3 ensures that one subgroup is composed of n members.Equation 2.4 requires that there are exact a l members with attribute l in a subgroup.Note that a l depends on attribute l.For example, if a basketball team is composed of a center attribute 1 , two forwards attribute 2 , and two guards attribute 3 , then a 1 1, a 2 2, and a 3 2, respectively.
As the number of members in a group increases, the number of possible grouping outcomes, u, grows at an unacceptably rapid rate; it grows exponentially.For example, for n 2, u 945 when m 10, u 6.55E 8 when m 20, and u 6.19E 15 when m 30.The optimal solution cannot be found in polynomial time.In addition, the relationship structure becomes a complex network as m increases.To deal with the optimization problems concerning a complex network, it is impractical to use an approach like the exhaustive method since it takes a huge amount of computation time.Instead, some useful algorithms such as a genetic algorithm 22-28 for finding approximate or nearoptimal solutions or strategies for improving efficiency, solution quality, exactitude, and more have been presented 32-35 .In this paper, a genetic algorithm 22-28 , which is proven to be very effective in dealing with complex optimization problems, is employed to find feasible solutions.

The Sociometric Analysis
Sociometry was developed by Moreno in 1934.Moreno defined sociometry as "the inquiry into the evolution and organization of groups and the position of individuals within them" 17 .It is a quantified method aiming at measuring and determining social relationships in groups 18 .Sociometric explorations disclose the hidden structures that give a group its form: the alliances, the subgroups, the hidden beliefs, the ideological agreements, the stars of the show, and so on.One of Moreno's innovations in sociometry was the development of the sociogram, a systematic method for graphically representing individuals as points/nodes and the relationships between them as lines/arcs 19 .Given choices or preferences within a group, sociograms were developed and the structure and patterns of group interactions can be drawn on the basis of many different criteria 19 : social relations, channels of influence, lines of communication, and so on.Some terms in sociometry are the following: 1 Stars: those who have many choices; 2 Isolates: those with few or no choices; 3 Mutual Choice MC : individuals who choose each other; 4 One-Way Choice: referring to individuals who choose someone but the choice is not reciprocated; 5 Cliques: groups of three or more people within a larger group who all choose each other.
There are many indices using values instead of categories to indicate the social status of individuals.Among the most popular indices, three indices are introduced in the following paragraphs.

Status Score Index (SSI)
One of the most critical events in the history of sociometric methods is the use of both positive and negative nominations 13 .The difference between the positive and negative nominations is called status score 20 .To easily figure out the social status of a member, a relative or a normalized score called SSI was developed and defined as where N TC denotes the total choice number by other members, N TR denotes the total rejection number by other members, and m is the total number of members in a group.Note that a member cannot choose himself or herself.Thus, m − 1 rather than m is used in 3.1 .The value of SSI is in the range −1, 1 .A higher value of SSI means that a member is more popular within a group.However, this index considers only one-way choice, and thus one cannot understand the mutual interaction between members from this index.

Index of Sociometric Status Score (ISSS)
To consider the mutual choices between members, an index called ISSS 21 is defined as ISSS 1 2 On the right part of the equation, N MC and N MR represent the number of mutual choice and the number of mutual rejection, respectively.N p is the allowed maximum number of choices that a member can make.The value of ISSS is also in the range −1, 1 .A higher value of ISSS indicates that a member is more popular within a group.

Modified Index of Sociometric Status Score (MISSS)
ISSS uses a linear function to express relationship.However, it fails to consider the relative importance of different choices.For instance, satisfaction about being grouped with a first-, a second-, or a third-choice partner should be different, and the satisfaction difference between the first to the second and the second to the third is generally quite unlike.Consequently, a modified model for measuring the status score should be presented.In consideration with this relative importance of different choices, a new index called MISSS is developed and defined as

3.3
MISSS is composed of four parts: the contributions from the total choices, the total rejections, the mutual choices, and the mutual rejections, respectively.z ∈ {0, 1} and z pi , z qi , z ri , and z si 1 if a member i is chosen by other members, rejected by other members, choose mutually with other members, and reject mutually with other members, respectively.w p , w q , w r , and w s are their priority weights, respectively.The value of MISSS is in the range −1, 1 .A higher value of MISSS means that a member is more popular within a group.MISSS gives a more real description of relative importance of relationships.An important issue about MISSS is how to express the relations in the equation.A scoring function f • , which is a function of choice preference , is used to measure the relations.A simple method to find out the function is the use of a questionnaire asking members to provide disappointment scores of being grouped with a second-, third-, fourth-, or fifth-choice partner relative to receiving their first-choice one.To illustrate the nature of the scoring function, choices were collected from two classrooms in a college and the values were calculated.The results are shown in Figure 2.
The scoring functions appear nonlinear rather than linear.As mentioned in Section 1, a linear scoring function is unreasonable.To measure the social relationships, a nonlinear model should be used.

The Procedure
The proposed CSGA approach is illustrated in Figure 3.
To begin with, the choices of members are collected and aggregated.A questionnaire is developed to collect the choices of the members and the reasons for choosing their partners.The questions are "whom in the group do you want to have a lunch or dinner with?" as a selection of lunch/dinner partner and "whom in the group do you want to sit beside and discuss with?" as a selection of learning partner.Table 1 illustrates a part of the questionnaire.
To be fair with all members, each member should fill in the same number of choices.Chen et al. 29 showed that a fewer number of choices filled in will have more chances to be assigned to their top choices.After aggregating the choices, a sociometric tool is employed to draw the sociogram and some social indices are measured.
If the remainder of m/n q / 0, n − q dummy members will be added and then grouping is performed using GA.Otherwise, GA is directly employed to group members.Adding dummy members does not influence the optimization since doing this only adds a constant value in the objective function.Note that the optimization can be done by different priority weights or the same priority weights.A higher value of the priority weight means that a member has a higher priority to be assigned his/her first choice.In the case of the same priority weight, all members have the same chance to be assigned to their top choices.

The GA Structure
The GA flowchart is illustrated in Figure 4.The details are depicted in the following paragraphs.

Encoding
The encoding of a chromosome is illustrated in Figure 5. Since there are m members, the number of genes is thus equal to m.Each gene is assigned a number which stands for a member.As illustrated in Figure 5, the values of the genes are in the order of 5, 2, 6, 10, . .., and 4. If n 2, that is, a subgroup has two members, then member 5 and member 2 are at the same subgroup, member 6 and member 10 are at the same subgroup, and the like.

Initialization of Population
The random method was employed to generate the initial solutions.Before producing the initial solutions, a mutual choice table was established.Table 2 illustrates the mutual choice.
A member say the 1st member is first randomly selected, and the program checks with whom they choose mutually.Then from the mutual choice table, possible partners are 2, 6, and 7. Of these three possible partners, one is randomly selected to be a partner of the 1st member.

Evaluation of Fitness Function
To evaluate the chromosomes, the fitness value for each chromosome in the population was computed.The function used to measure the fitness is A lower fitness value represents a better grouping, and the optimal fitness value is kept.After producing new chromosomes, we can evaluate the chromosomes based on the members' preferences.The fitness function contains w i , the scoring function f • , c ik and c ki , where w i represents the priority weight of student i, c ik represents the preference indicated by member i to partner k, and the like.If a member is grouped with a partner which is not in his/her preference list, the GA program will give c ik a sufficiently large value b.As for the scoring function f • , a common used function is the squared function 29 .For example, if a squared scoring function is employed, w i 1, and a subgroup is composed of three members whose partners are their 2nd and 3rd, 1st, and 3rd, and 2nd and 2nd choices, respectively, then the total score is 2 2 3 2 1 2 3 2 2 2 2 2 4 9 1 9 4 4 31.After computing the fitness of a subgroup j, the fitness of the chromosome is calculated as

Selection
The Roulette Wheel Method 25 was employed to select fitter individuals.The fitness values of all the chromosomes in the population were first calculated and were then sorted.A fitter chromosome with a lower value of fitness has a higher probability to be selected.For example, if there are 50 chromosomes in the population.The chromosome with the lowest fitness value has a probability of 50/ 1 2 3 • • • 50 to be selected, the second lowest has a probability of 49/ 1 2 3 • • • 50 to be selected, and the like.

Crossover
Two chromosomes were randomly selected first.The fitness values of these two chromosomes were compared and from the better one some genes were randomly selected and placed at the beginning positions of the offspring chromosomes, as illustrated in Figure 6.The number of the selected genes is the same as the number of members per subgroup, n.The rest of the genes in the offspring chromosome are filled in the order of fitter genes by the greedy method.

Mutation
In this study, the swap method was employed to mutate.The method is illustrated in Figure 7. Two genes were randomly selected and then their values were interchanged.The swap method has the advantage of avoiding value duplication.Since a member is exactly assigned one subgroup, the values in the genes should be different.

Elitism Strategy
In order to preserve the best chromosome in every generation, a simple elitism strategy 23 was employed.The best chromosome of each generation was duplicated to the next generation to ensure that it was preserved in the present population if it was the best when compared with other chromosomes of the population.This strategy assures that the best chromosome of each generation will be, even if not better, at least equal to the best chromosome of the previous generation.In addition, the elitism strategy did not lead the GA to converge prematurely since it was applied only to one chromosome of each generation.

Termination Condition
The termination condition we use in this study is the generation number defined by the user.The calculation will be repeated until the number of generation reaches the preassigned value.Once the termination criterion is satisfied, the solution is displayed with a chart.We can clearly know the result from the chart.

Results and Discussion
To numerically investigate the influences of grouping parameters such as the number of member per subgroup, the choice number, and the number of member in a group on the grouping results, a GA program was developed by Microsoft.NET 4.0.The program was run on an ASUS K52Jc Series notebook with Intel R Core TM i5 CPU M460 @2.53 GHz and 1.86 GB RAM.The operating system is Windows 7.
To evaluate the effectiveness of the proposed approach, three real datasets including three classes of undergraduate students were tested.The numbers of students are 54, 55, and 32, respectively, as shown in Table 3.For the dataset C, the method of grouping at the first time was by random and the second is by CSGA.A comparison can thus be made between these two grouping methods.For the dataset A, a maximum allowed number of choices, N p , was set to be 9 so that the complexity of relationship network can be observed.Students were asked to show their preferences to the dining partners in dataset A and learning partners in datasets B and C.
For easy description, the following variables are defined.The population size is represented as N size , the generation number as N g , the crossover rate as R c , and the mutation rate as R m .In addition, we use F min as the minimal fitness value in each generation, F best as the best solution at each trial.In each case, the GA program performs 30 trials.We designate the best result of 30 trials as OPT, their average value as AVG, and the coefficient of variation as C v .

Validation of GA
To ensure the validation of the GA program, some tests were made.The results from GA were compared with those from the Branch and Bound BB method that can obtain optimal solutions.The results are shown in Table 4.When the number of members m is small, the GA program can obtain optimal solutions that are the same as those from BB, indicating that the GA program is valid.As m increases, the BB method requires a huge number of computation time.For example, for m 12 and n 2, the required time is about 1100 minutes or 18.3 hours.However, the average runtime for GA is about 1.64 minutes, much smaller than that for BB shown in Table 4.

The Influences of Genetic Parameters
To evaluate the influences of genetic parameters on the results, a two-stage approach was used.The generation number and population size were decided at the first stage, while at the second stage crossover rate and mutation rate were decided.The variation of minimal fitness value at each generation with different population sizes, N size , is shown in Figure 8.As we can see from the figure N size 20, N g 30000 can obtain good results.At the second stage, the mutation rate as well as the crossover rate were changed and tested.Experimental results show that the best parameters are R c 0.9 and R m 0.05.In the following experiments, therefore, R c 0.9 and R m 0.05 were used.

The Influences of Choice Number
To easily observe the solutions, the sociograms relationship structure are drawn and shown in  member is represented with a circle, while a male member is represented with a square.From this figure we can see that the relationship network is quite complicated, especially for those with a large value of N p .An optimal grouping is hard to be achieved by intuition or exact search methods.

The Influences of the Number of Members in a Subgroup
As the number of members in a subgroup n increases, the computation time will also increase.
To observe the influences of n on the results, n was changed from two to five.Note that in Table 5 the number of choice N p was fixed at 5, whereas in Table 6 the number of choice N p was set to be equal to n.

The Influences of the Number of Members m in an Organization
As the number of members in a group m increases, the complexity of grouping also increases.An apparent influence is the increase of the computation time, as illustrated in Table 7.Note that the computation time increases linearly with m, showing the time efficiency of the GA program.Originally, the computation time grows exponentially.Consequently, GA is an efficient approach to solve grouping optimization problems considering social relationships.

Homogeneous versus Heterogeneous Grouping
A good grouping approach should be able to meet the requirements of a group, such as a homogeneous or heterogeneous grouping.To investigate the effectiveness of the proposed

The Influences of Scoring Function and Penalty Value
The penalty value b was changed to investigate its influences on the grouping results.The results are shown in Table 8.From the table we see that the effect of penalty value b is not very apparent.As for the effect of the scoring function, a squared function converges sooner than a linear function, as illustrated in Figure 11.

Overall Satisfaction and Repeat Intention
A survey was conducted to understand the overall satisfaction and repeat intention of using the proposed approach after the learning partner groups were decided.A typical five-level Likert item was employed.The question is "In overall, are you satisfied with the method?" and the format of the item is "very satisfied, satisfied, neither satisfied nor dissatisfied, dissatisfied and very dissatisfied."Most of the students are satisfied or very satisfied with the new method.The average points are 4.39 and 4.28, respectively, in dataset B and dataset C, indicating that the students have high satisfaction with the proposed method see Table 9 .
As for repeat intention, the question is "Do you agree to use the new method to select learning partners next time?"and anchored by "strongly agree," "agree," "neither agree nor disagree," "disagree," and "strongly disagree."The average points of this question are 4.52 and 4.09, respectively.The high values indicate that the repeat intention of using the proposed method is also very high.The high overall satisfaction and the high repeat intention, in part, reflect the effectiveness of the present approach.A paired sample t-test was performed on dataset C to see if there is no difference between these two methods.Hypothesis 1.There is no difference in satisfaction between using the CSGA approach and using the random method.
The result shows that the overall satisfaction of using the proposed CSGA approach is significantly higher than using the random method, as shown in Tables 10-12.

Conclusions
In this paper, we have employed a novel approach to group members based on their social relationships.Members are first asked to show their preferences choices to partners with whom they would like to be together in an index system, where a one stands for the first choice, two for the second choice, and so on up to a preassigned number.Subsequently, the choices of members are aggregated and a combined sociometry and genetic algorithm CSGA scheme is employed to optimize the grouping.To investigate the effectiveness of the Mathematical Problems in Engineering proposed approach, three real datasets collected from a university were used.In addition, a nonlinear model is developed to calculate the social scores of individuals.Experimental results show that CSGA can optimize the grouping effectively and efficiently and students are very satisfied with the grouping results and show high repeat intention of using it.Moreover, a paired sample t-test shows that the overall satisfaction on using the proposed CSGA approach is significantly higher than using the random method.
For further studies, it is recommended to apply the combined sociometry and genetic algorithm approach to solve other kinds of grouping optimization problems.Methods of rating or paired comparisons as well as peer nominations can be used to measure social relationships.

Figure 1 :
Figure 1: A social network structure.

Figure 2 :
Figure 2: Characteristics of preferences among surveyed students.

Figure 4 :
Figure 4: The flowchart of the GA program.

Figure 5 :
Figure 5: Representation of a chromosome.

Figure 7 :
Figure 7: Mutation operation was performed by the swap method.

Figure 9 .N size = 20 N size = 30 NFigure 8 :
Figure 8: The variation of minimal fitness value in each generation with different population size, N size .

Figure 10 :
Figure 10: The grouping results are illustrated in a sociogram.

Figure 11 :
Figure 11: The variation of F min with generation at different scoring functions.

Table 1 :
An illustration of the questionnaire for collecting members' choices.

Table 2 :
A mutual choice table.

Table 3 :
The datasets tested.

Table 4 :
Comparison of results between Branch and Bound and GA.
* a huge amount of computation time is required if compared with GA.

Table 5 :
The influences of the number of members in a subgroup N p 5, N g 10, 000 .

Table 6 :
The influences of the number of members in a subgroup N p n, N g 10, 000 .

Table 7 :
The influences of the number of members n 2 .

Table 8 :
The Influences of scoring function.

Table 9 :
Overall satisfaction and repeat intention of the CSGA approach.wereperformedwith different priority weights, which were based on MISSS.The result is illustrated in Figure10.Members grouped into the same subgroup are circled.A solid line between two members means that members choose mutually.A dashed line with an arrow, on the other hand, represents a one-way choice, where the arrow-headed direction is the choice direction.The grouping can be done with different priority weights based on MISSS.For homogeneous grouping, MISSS can be used as a priority weight.For heterogeneous grouping, MISSS can be sorted and then divided into several parts, with the first part composed of the highest MISSS values, the second part composed of the second highest MISSS values, and the like.Using the proposed CSGA approach, homogeneous or heterogeneous grouping can be easily performed in fulfilling the requirements of a group.

Table 12 :
Significance of paired sample t-test.