Recognition of Design Fixation via Body Language Using Computer Vision

The main objective of this study is to recognize design ﬁxation accurately and eﬀectively. First, we conducted an experiment to record the videos of design process and design sketches from 12 designers for 15 minutes. Then, we executed a video analysis of body language in designers, correlating body language to the presence of design ﬁxation, as judged by a panel of six experts. We found that three body language types were signiﬁcantly correlated to ﬁxation. A two-step hybrid recognition model of design ﬁxation based on body language was proposed. The ﬁrst-step recognition model of body language using transfer learning based on a pretrained VGG-16 convolutional neural network was constructed. The average recognition rate achieved by the VGG-16 model was 92.03%. Then, the frames of recognized body language were used as input vectors to the second-step ﬁxation classiﬁcation model based on support vector machine (SVM). The average recognition rate for the ﬁxation state achieved by the SVM model was 79.11%. The impact of the work could be that the ﬁxation can be detected not only by the sketch outcomes but also by monitoring the movements, expressions, and gestures of designers, as it is happening by monitoring the movements, expressions, and gestures of designers.


Introduction
Administrator, professionals, teaching staff, and statesmen have indicated that making innovations is the key to our future [1]. Innovative products can make headlines and generate significant economic returns [2]. e focus of innovation is always on generating new ideas of products and/or services by designers [3]. More and more scholars have adopted an innovative method called "design thinking" [4]. However, design fixation is an inevitable phenomenon, which will have a negative impact on the design results, especially in the concept stage of design process [5]. e appearance of fixed design will make the designers tend to focus on the features of existing designs and then produce new designs similar to the previous example, which will lead to the reduction of novelty and diversity of ideas [6,7]. In recent years, research on design fixation has attracted more and more attention, especially in the field of design. Researchers around the world are studying the causes, influencing factors, effects, and indicators of design fixation [5,8]. Some methods have been developed to combat design fixation [5]. Some studies suggest that the representation of triggers can effectively reduce design fixation and improve creativity [5,[7][8][9][10]. Tseng et al. [11] found that the effectiveness of trigger in design depends on the timing of when the inspiring information is given. Design fixation is hard to combat because it occurs unconsciously [7]. Designers, even those who study and teach design regularly, do not know when they are being influenced or fixated by inadequate or misleading information [8].
Existing literature provides very limited insights regarding how to recognize fixation effectively. Based on these concerns, this study will focus on the recognition of design fixation based on the body language during the ideation stage of a design process.
is paper proposes a body-language-based method for the recognition of design fixation from body movements and presents a controlled experiment with designers, recording the videos of the design process and design sketches separately for 15 min. We selected eight common types of body language during the design processes according to the recorded videos. Subsequently, three types of body language with significant differences between fixation and no fixation were found. We converted videos to images and used them as input vectors to a body language recognition model based on the VGG-16 convolutional neural network. Besides, the frames of three types of body language during each sketch were extracted and used as input vectors to a design fixation classification model based on a support vector machine (SVM).

Design Fixation.
e term "fixated" first appeared in the experimental psychology literature, referring to a person's subconscious mind that focuses on a particular aspect of an object or task and ignores other aspects [12]. Definitions of fixation differ with the context of human activity, field of knowledge, or design objectives [5]. In 1991, Jansson and Smith first proposed the concept of "design fixation" [6,13], which expanded the fixation research from psychology to design. e design fixation was described as "the blind, sometimes counterproductive, adherence to a limited set of ideas in the design process." e novelty and variety of ideas will reduce the adherence to a limited set of solutions. Previous research has shown that introducing graphical examples of existing solutions limits a designer's ability to generate new solutions [14]. Jansson and Smith [13] showed designers an example solution to a problem and found that this reduced the designers' propensity to move effectively between the conceptual space (of abstract ideas) and the configuration space (of potential solutions). e interdisciplinary interest in design fixation from designers, cognitive scientists, engineers, computational modelers, architects, educators, and others around the emerging field of design science has resulted in increasingly broad definitions of design fixation [15,16]. In a narrow sense, design fixation can be considered as designers' overreliance on the functions and features in some cases [16], or their tendency to change the new design scheme to cohere with the familiar design paradigm [17]. In a broad sense, design fixation can refer to any cognitive intervention that influences design results [18] or any method that can influence design activities [19]. e broader definition holds that design fixation is a manifestation of the low level of creativity [20]. However, what is entirely consistent is that fixation is framed as an unfavorable phenomenon, with most of the studies presenting ways to avoid, mitigate, or overcome it [12].
Some methods have been developed to combat design fixation. Youmans and Arciszewski [10,16] describe that the increasingly broad definitions of the phenomenon might be undermining empirical research efforts, educational efforts to minimize fixation, and the acquisition and dissemination of transdisciplinary knowledge about fixation effects. To address these issues, they categorized the fixation phenomena into one of six classifications. Next, they proposed a system of orders of design fixation and recommended targeted methods for reducing fixation in inventive design. Moreno et al. [5] presented a review of defixation approaches and the metrics employed to understand and account for design fixation. e study then explored the relevant ideation approach of design-byanalogy (DbA) to overcome the design's fixation. Sio et al. [21] conducted a meta-analytical review of design studies examining whether and under what conditions the presence of examples will induce fixation or inspiration. Crilly [3] pointed out that recognizing fixation episodes and reflecting on them was described as the means by which designers could guard against such episodes in the future and thus be more creative. In 2017, Crilly and Cardoso [22] held an international workshop, and nine questions were outlined to stimulate renewed thinking about fixation and promote debate about where research should head next. In 2019, Crilly [23] promoted methodological diversity and theoretical integration for design fixation research.

Body Language in Design.
Behaviorism is primarily concerned with observable behavior, as opposed to internal events like thinking and emotion. Design fixation is also a mental state that is very difficult to identify by any means. We think that the internal design fixation state can be mapped through external body language.
Body language, as a form of nonverbal cues used in human-human interactions almost entirely subconsciously, accounts for more than half of human communication [24][25][26]. ere are a vast number of different information channels for body language, for example, facial expressions, gestures, body movements, eye gaze, head movement, posture, distance, and so on [27][28][29]. e probably two most relevant aspects of body language for design research are the ability to enact physical concepts and ideas and communicate emotion [29,30]. Some scholars have paid attention to design thinking research based on body language. Bezawada et al. [31] proposed a machine learning model of automatic facial feature extraction for predicting designers' comfort with engineering equipment during prototype creation.
rough a brief overview of existing work on the role of body language in engineering design and common tools for body language analysis not limited to engineering design, Wulvik et al. [29] called for further study of body language in an engineering design context using automatic data gathering tools. Cash and Anja [30] explored the many roles of gesture in the communication of design concepts through observing and videocoding four teams of engineering graduates during an ideation session. Sun et al. [32] reported an experiment that recorded the participants' eye movements to analyze their perception and examine whether the designers' perception during sketching is in accord with the creative segment theory.
Although previous research has utilized expensive and intrusive devices, such as physiological sensors, the automatic image recognition of gesture has become the most common because of the cost efficiency and noninvasiveness of cameras. Zhao [24] combined image information from the body language of the head to recognize the emotional and cognitive states using soft computing techniques. Behoora et al. [33] proposed a machine learning recognition method of emotional states of individual design team members based on capturing multiple skeletal joint images of body language. e methodology uses the link between body language and emotions to detect emotional states with accuracies above 98%. Recently, the development of deep learning has greatly improved the accuracy of image-based gesture recognition. Yun et al. [34] proposed data-driven convolutional neural networks (CNNs) based on the engagement recognition method that uses only facial images from input videos.
Crilly and Cardoso [22] called for more objective ways to capture design activities during fixation experiments, leading us to the hypothesis that design fixation might be correlated with gesture.
is manuscript focused on detecting design fixation states from the body language using computer vision techniques and transfer learning techniques. Python and Keras are used to implement the proposed methods and evaluate the results.

Summary and Breaking New
Ground. CNN was first proposed by Lecun et al. [35] and was applied to handwriting recognition. It is a variant of multilayer perception (MLP), which has been widely used in image recognition [36,37]. CNN can extract features through image input directly and avoid information loss caused by traditional manual feature extraction methods [38].
It is clear from this literature review that experiments have been deployed for studying fixation effects, underlying causes, and mitigating strategies for alleviating fixation within designers. We build on these studies and the associated results in this paper. Although much progress has been made, the much fertile ground has yet to be explored, especially in the domain of automatic recognition of design fixation. Our body languagebased method addresses these limitations, and we propose a two-step hybrid model for design fixation classification on the foundation of the previous research and advanced convolutional neural network (CNN).
e main objective of this study is to use multiple subtle behavioral cues as an indicator of design fixation state to construct the relationship between the typical body language during design process and the fixation states from sketch results. We conducted an experiment where we recorded a video of participants generating ideas for a very simple design problem. Using expert judgment, we identified the ideas that were resulting from design fixation. We found that when the participants fixate, they use certain body gestures; these body gestures show a significant relationship with design fixation. Further, we try to develop a CNN-based computational model to recognize design fixation in future design tasks.

Materials and Methods
In this study, the two-step hybrid model was used to recognize design fixation. Before the recognition model, we need to find out the types of body language with significant differences between fixation and nonfixation. e model was applied to the case of shape design of a mug. e conceptual framework of the study is shown in Figure 1 ere were four male participants and eight female participants. All the undergraduate students were recruited from the senior design classes. All subjects are majoring in industrial design and have basic design skills and experience in product design. e participants were offered payment as compensation at the end of the experiment. ey were instructed not to discuss any aspects of the experiment with their classmates to avoid bias.

Equipment and Materials.
is experiment was carried out in the industrial design laboratory, and the experimental environment is shown in Figure 2. During the experiment, to make subjects concentrate on the design task, it is required to keep the room quiet and keep no idle people in or out of the room. In Figure 2, "a" represents the camera for recording the video of sketches, "b" represents the camera for recording the video of body language during each design process, and "c" represents the green background.

Design Task.
e purpose of this experiment is to extract the body language of the subjects during the design process, rather than to test design ability, so the difficulty of the design task should not be too high. To ensure the emergence of design fixation, we chose the common products as the case. To ensure the variety of shape, we selected the simple products as the case. us, we chose "mug" as the case of the design task to ensure the emergence of design fixation and the variety of shape design.
All subjects were asked to deal with the same design tasks individually. e design task given in this study is displayed in Figure 3.

High-Definition Camera (HD Camera).
Two cameras (Logitech C270, Suzhou Logitech electronics Ltd., China) were used to collect the videos of the design process and sketches separately. Camera a ( Figure 2) for recording sketch video was fixed directly above the desktop, and the coverage can be shown between the green dotted lines in Figure 2. Camera b ( Figure 2) for recording body language during the design process was placed in front of the subjects, and the coverage can be shown between the purple dotted lines.

Green Background.
To allow easy recognition of the body language, the video background was set as green (c in Figure 2). e green screen background was made from a green fabric (100% polyester). During video acquisition, the subject is always within the range of the green screen.

Open Broadcaster Software (OBS).
e videos were recorded by Open Broadcaster Software (OBS; version: 21.0.1). e software is shown in Figure 4. Figure 4(a) shows the picture taken by camera a. Figure 4(b) shows the picture taken by camera b. We can manually adjust focusing  distance, exposure, white balance, color, and other configuration of the camera to make sure video works well. Moreover, H.264 high-definition video encoding is used for transcoding storage to ensure the clarity of the video.

Experimental Protocol.
e experiments were carried out for 12 days in February in 2018. ey were performed from 10 : 00 to 11 : 00 every day for each subject randomly. e experiment was divided into four stages: welcome stage, preparation stage, task stage, and final stage.
During the welcome stage, the design task, procedures, and equipment used in the experiment were introduced to the participants. e design task instructions were read to the participants, and they were given a brief tutorial on how to complete the task. All participants signed a consent form and completed a questionnaire asking about age, height, and weight.
During the preparation stage, the subject was asked to take a seat as shown in Figure 2. A rest of 3 min was used to make him/her relaxed and familiar with the design task and environment. e position of desk, chair, pen, and paper was adjusted to suit the subject. For the task stage, we prepared enough paper and pen for each subject. en, the position and angle of the two cameras were adjusted properly to ensure the coverage of the subject matter ( Figure 5).
During the task stage, after the subject fully understood the experimental task, he/she was required to complete the  Mathematical Problems in Engineering design task for 15 min. Meanwhile, the experimenter started the OBS to record the videos from two cameras synchronously. When the remaining time was 5 min, the subject was given a timely reminder. During the final stage, after 15 min, the experimenter saved the collected video data. en, the sketches drawn by the subject were numbered and archived in order.

Manual Video Coding.
Studies of the role of body language in engineering design heavily rely on manual video coding. Most of them are focused on which role gesture plays in design activities [29,30,39]. In this study, after video data collection, we sorted out the body language in the design activities through manual video coding. Body language mainly can be divided into four categories: gesture, facial expression, eye contact, and posture [33,40,41].
us, during the behavioral analysis of the design process, we paid attention to the movements and gestures of hands, eyebrows, mouth, and head.
During the manual coding stage, to avoid observer bias, the reliability analysis shall be preceded through comparing observations by different coders [42]. e video recordings were segmented according to the participant's behaviors. A behavior starts when the coder detects its appearance, and it ends when a new behavior is detected. All videos were coded by one independent coder who was a postgraduate student majoring in industrial design. A second independent coder analyzed 25% of the data, for coding validation. Each recording was time consuming and persisted for approximately 1-2 h for each coder to code. When the results of the two independent coders show reliability more than 85%, credibility across the data set can be ensured [42].
Note that when two behaviors overlap, we will end the recording of the previous behavior and start to record the later one. If there are two consecutive drawing behaviors, we will segment them by the simultaneous sketch video recorded by camera a. A drawing behavior starts when the participant starts drawing a sketch, and it ends after the sketch is completed.

Fixation Evaluation of Design Sketches
(1) Evaluation Methods. Subjective evaluation is a common evaluation method in psychological experiments, which has the advantages of low cost, simple procedure, and being noninvasive [43]. In this study, six professional staff members from industrial design companies were selected to compose an expert panel. ey subjectively evaluated whether the sketches were fixated or not through professional judgments.
Before the experiments, the experts were trained to look for the attributes or metrics to identify fixation. e sketches with lower variety, novelty, and originality indicate fixated designs [5,44,45]. We used a picture of the most common mug as a stimulus ( Figure 3) to fixate upon. e experts were trained to find out the sketches fixated on the stimulus.
e evaluation experiments were carried out for 12 days in March in 2018. ey were performed from 09 : 00 to 12 : 00 every day for each subject' sketches randomly. Every day, six experts take turns to vote for the fixated sketches of one subject produced during the design fixation to ensure that the evaluations are conducted independently and not influenced by others. e evaluation experiment can be divided into four stages: welcome stage, preparation stage, task stage, and final stage.
During the welcome stage, the experimenter provided an expert with enough sticky notes. An expert was invited to determine which design sketches were produced in the case of design fixation and put a note on the fixated sketches.
During the preparation stage, the experimenter attached one subject's sketches to the wall in chronological order. In one day, the sketches of one randomly selected subject were posted.
During the task stage, after the expert fully understood the evaluation process, he/she was required to complete the evaluation in 30 min.
During the final stage, the experimenter recorded and managed the voting results for each sketch in chronological order. After the 12-day experiment, the experts were thanked and offered payment as compensation.

Selecting the Body Language Significantly Affected by
Fixation. After the fixation evaluation of design sketches, the sketches voted by more than 50% of experts (4-6 votes) were classified into the fixation set, and the remaining sketches were classified into the nonfixation set. en, the videos of the design process were checked again to record the frequency of common types of body language during each sketch. To evaluate the significance of differences of two fixation states (fixation and nonfixation) in terms of frequency of common types of body language, one-way ANOVA was conducted in SPSS 20, and a statistical significance was accepted when P < 0.05.

Inter-Rater Agreement Statistic.
Multiple experts rated the sketches of the participants for design fixation. An interrater agreement statistic was conducted to judge the extent of agreement between the experts who rated the data. We calculated the percentage agreement for design fixation. is measure indicates how often raters who rated the fixation item on the same sketches choose the same response category. We considered the highest number of similar ratings per sketch as agreement, and the other ratings as nonagreement. e percentage agreement was calculated by dividing the number of ratings with agreement on all sketches by the total number of ratings on all sketches for which that measurement property (fixation state) was assessed. A percentage agreement >80% was considered appropriate (arbitrarily chosen).
In addition, we used kappa coefficients to calculate the reliability of the items. is is a measure that indicates how well sketches can be distinguished from each other based on the given item score. Dichotomous items were analyzed using intraclass kappa coefficients [46]; fixation was given a score of one and nonfixation, zero. e schematic procedure of the two-step hybrid model is illustrated in Figure 6. e deep convolution neural network (CNN) was used for the construction of body language recognition model based on the images converted from videos of camera b. e support vector machine (SVM) was used for the construction of design fixation classification model based on the behavior statistics that is calculating the frequency of recognized body language during each sketch. e tuning parameters included learning rate, activation function, normalization, and data segmentation ratio. A single variable was firstly changed to obtain a range of its best influences.

Two-Step
en, a balanced optimal solution was obtained according to the coordination relationship between the variables above.

Step 1: Body Language Recognition Model
(1) Video Preprocessing. e data for the recognition model construction were design behavior videos collected from camera b. To improve the robustness of the recognition model, the original data need to be preprocessed.
At first, data cleansing is needed, because some videos are not qualified as training data due to the poor shooting angle and the subjects' failure to conduct experiments as required. To improve the recognition accuracy, these videos were excluded. en, three body language types that were significantly correlated to fixation were set as the target vectors. Besides, we set an extra category "others," to contain other remaining body languages.
After data cleansing, Adobe Premiere CC 2017 (Figure 7) was used to cut out the video segments containing the selected types of body language with significant differences between fixation and nonfixation and rename them manually for labeling. e remaining video segments were marked as "others." Finally, these marked video segments were sorted and classified according to the labeling. e original video was collected at a rate of 30 frames per second. However, the design behaviors are always lowspeed, with large repetition of adjacent frames. e video contains a large number of redundant frames. To reduce redundant data and accelerate the training process, the video needs to be resampled with a low frame rate. Python image processing libraries PIL and Imageio were used to resample the video segments at a rate of 8 frames/second, and then, the data were saved as JPG images. Each 1-minute video was converted to 480 images in terms of time series, and these images were saved.
e construction of this model was based on Python language and Keras. Keras [47] is a high-level neural network application programming interface (API) developed by Google artificial intelligence expert, Francois Chollet. Keras support convolutional neural network (CNN), and GPU can be used to accelerate model training. Keras is a good choice, especially for beginners since it can use either TensorFlow or eano as a backend, and it provides a simpler model for development [48].
Our body language recognition model was based on CNN. e training process requires a computer with high performance.
(3) Transfer Learning Based on VGG-16. At first, according to holdout cross-validation, the dataset was divided into three parts, which are, respectively, used for training, validation, and testing. Data augmentation was used to expand the data volume of the training set. en, based on the pretrained VGG-16 model, a deep neural network was constructed for transfer learning. e model was trained with the data of the training set, and the training effect was tested with the data of the validation set. Finally, the data of the test set was used to simulate the real environment for testing the performance of the model. During the holdout cross-validation [49], after resampling, the image data were disordered and randomly divided into three subsets in 60%, 20%, and 20% proportions, which were, respectively, used for training, validation, and testing to ensure the effective performance evaluation of the recognition model.
Data augmentation [50] is a conventional data processing technique used by machine learning in the field of image recognition. After random scaling, rotation, inversion, and cutting of the original image, new images were generated to expand the data volume of the training set. e ImageDataGenerator module in Keras was used to enhance the data volume in the training set. e parameter settings of data augmentation are shown in Table 1.
Transfer learning is a machine learning method where a model developed for a task is reused as the starting point for a model on a second task [51]. It can be used to speed up training and improve the performance of deep learning model. In this study, we will use the transfer learning using VGG-16 convolutional neural network model. e VGG-16 model was proposed by Simonyan et al. [52]. It is a CNN with 13 convolutional layers and three fully connected layers. e model is pretrained on a subset of the ImageNet database, which is used in the ImageNet large-scale visual recognition challenge (ILSVRC) [53].
is manuscript will use the transfer learning of VGG-16 on ImageNet dataset, the 13 convolution layers at the bottom and their weights are retained, and the three fully connected layers at the top are removed. e output features of VGG-     [54]. e output result of hidden layer 1 is then input into hidden layer 2, which contains 128 nerve cells and also uses ReLU function as activation function. Finally, the output results of hidden layer 2 are mapped to the output layer by softmax function. e final outputs are probability values with the sum of 1. Figure 8 shows the overall structure of the neural network used in this model. Adam [55] was selected as the optimization tool for model training, and the recognition accuracy was set as the performance evaluation index, with the number of epochs set to 300. e learning rate of the optimiser was set at 0.001. Default settings for hyperparameters (alpha � 0.0001, beta1 � 0.9, and beta2 � 0.999) were used in the experiments. After each iteration, the recognition accuracy of the training set and validation set were recorded, respectively. After the training stage, the model was used to predict the data of the test set, and the recognition accuracy was set as the performance evaluation metric.
To investigate to which degree the results depended on the specific image-based CNN models used, we constructed the other recognition model using VGG-19 [52] and DenseNet-169 [56]. e depth of these CNN models is different. e depth of VGG-16 is 23, and that of DenseNet-169 is 169.

Step 2: Design Fixation Classification
(1) Software and Hardware. e construction and training of this model were based on Python and Scikit-learn. Scikitlearn is an open source machine learning library that was born at the Google summer camp. It contains almost all common algorithms for classification, regression, and clustering. e data used in the training model come from the solidification degree evaluation and behavior statistics obtained in the third chapter.
(2) Behavior Statistics. During behavior statistics, the frequency of selected types of body language significantly affected by fixation during each sketch was calculated. e remaining body language was set as "others," and the frequency during each sketch was also calculated.
(3) e Support Vector Machine Classifier. At first, principal components analysis (PCA) [57] was carried out for the feature variables (the occurrence frequency of each selected type of body language), and the main components were extracted as the new feature variables, to reduce the data dimension and improve the accuracy of small sample training. e support vector machine (SVM) is then used as the classifier for design fixation classification. Finally, k-fold cross-validation was used to verify the recognition accuracy of the model. e input vectors are the frequency of body language related to design fixation during each sketch. e remaining body languages were combined into one file as "others," and the frequency of "others" also represents one input vector. e output vector contains fixation and nonfixation, labeled 1 and 0, respectively.
In this paper, the recognition of design fixation is transformed into a binary classification problem, so support vector classification (SVC) or LinearSVC were selected as the classifier, and the final parameter setting can be determined through training and validation of different parameters. First, four different models were set up for training by four different parameter settings. en, the one with the best validation results was selected as the final model. e specific parameters are shown in Table 2.
We used the k-fold cross-validation for the SVM classifier. e data were randomly divided into ten subsets, one of which was used as the validation set and the other nine as the training set. Each model is trained ten times, using a different subset as the validation set for each time. Finally, the mean value of 10 validation results was used to evaluate the recognition performance of the model.

Results of Data Acquisition.
In this experiment, we collected videos with a total time of 6 hours and 48 minutes, including more than 320,000 valid body language images, and a body language image database during the design process was established. Meanwhile, 156 sketches were collected for fixation evaluation. In conclusion, the total number of concepts was 156, and 13 concepts were generated per participant.

Results of Video Analysis.
After preliminary observation, eight common types of body language were selected. ey are coded in Table 3. e design behaviors can be shown in Figure 9. e definitions of the eight behaviors are as follows: (a) drawing: drawing a sketch on a piece of paper with a pen, also including writing words to describe a scheme; (b) small movements: hand movements with no practical significance, such as turning a pen and tapping the table with a finger; (c) touching head: scratching the head with the hand, touching the hair, and touching the face with the hand; (d) eye moving: rapid eye rotation due to large eye focus movements, often accompanied by frequent blinking; (e) eyebrow moving: changes in the shape of the eyebrows due to changes in facial expressions, usually caused by a frown; (f ) mouth moving: the corners of the mouth sink, and the lips stretch or move back and forth; (g) turning head: the head moves and is accompanied by a shift of vision; and (h) holding up head with hand: supporting the head with the palm, arms, and other parts of the hand and holding it still for a long time.
e eight behaviors showed a total of 1,158 times in the experimental process of all subjects, ranging from 42 times to 333 times (M � 144.75; SD � 103.61). e total frequency of each behavior is shown in Figure 10, and the descriptive statistics are shown in Table 4. e eye movement occupied a high proportion. It might be because the control of eyes is more casual, conveying sensitive changes in mentality.
Holding the head happened less due to the large range of the action. e participants might be more limited to showing their solidification in a test. Figure 11 shows some examples of what was rated as a fixated design and what was not.

Results of Fixation Evaluation.
e fixated forms have more repeated features and lower variety, novelty, and originality than the nonfixated ones.
Six experts evaluated the design fixation state of 156 design sketches of 12 designers. e voting results are shown in Table 5. One-way ANOVA was performed on the behavior frequency and fixation state of 156 sketches, and the results are shown in Table 6. e results of ANOVA show significant effects of fixation states on B1 (F � 5.406; P � 0.005), B4 (F � 3.884; P � 0.023), and B7 (F � 6.624; P � 0.002). No significant differences were found in other behaviors. us, B1, B4, and B7 were selected as the body languages with significant differences between fixation and nonfixation.
e inter-rater agreement and reliability of the questions regarded whether the fixation occurred in a concept generation. e measurement property (fixation state) had a high percentage agreement (86%), and it had an excellent kappa coefficient (0.74).
e frequency data of B1, B4, and B7 for fixation and nonfixation separately is shown in Table 7. Figure 12 provides a comprehensive list of ratio statistics for behavior frequency/fixation state frequency (B/F). B4 represented the highest proportion of fixation and nonfixation, which formed a huge gap with B1 and B7. B4 highlighted and magnified the performance of the design fixation. e ratios for behavior frequency/fixation frequency are always higher than those for behavior frequency/ nonfixation frequency, although B1, B4, and B7 behaviors occur more frequently during nonfixation than those during fixation, as shown in, as shown in Tables 5 and 7 and Figure 12. In other words, B1, B4, and B7 behaviors for each fixated sketch appear more frequently than those for each nonfixated sketch. In addition, the average duration (76.86 s) of each fixated sketch is higher than that (62.41 s) of each nonfixated sketch.

Results of Body Language Recognition.
As the number of iterations increases, the training accuracy and validation accuracy of the model gradually increase and tend to converge. e changes in these two indicators after each iteration are shown in Figure 13. It can be observed that the training accuracy rate and verification accuracy rate are not significantly different, nor are they at a low level at the same time. erefore, it can be concluded that there is no occurrence of underfitting or overfitting in the model, and the training results are robust. e recognition accuracy rate of the model for the test set is 92.03%, and it can recognize the     four types of body language (B1, B4, B7, and others) accurately.
For comparison purposes, we used the same datasets for the other image-based CNN recognition model. e results of training and validation are shown in Figure 13. For the test set, the accuracy of VGG-19 is 91.09%, and that of DenseNet-169 is 73.06%. Hence, the accuracies of the training, validation, and test sets of VGG-19 model are slightly lower than those obtained with the VGG-16 model, respectively. e DenseNet-169 model performs the poorest and has the problem of overfitting. It could be seen that the performance of VGG-16 was more stable, and the overall accuracy was the highest among the three.
It could be seen from Table 8 that the training time of DenseNet-169 was the shortest in the three aspects under the same computing ability, while VGG-19 was the longest. e time costs of three algorithms represented by these three criteria were consistent. Although the duration of VGG-16 was longer than that of DenseNet-169, its performance was improved. e recognition of VGG-16 was not only more accurate, but also of a less computational complexity compared with  e comparison between models A, B, and C is shown in Figure 14. It could be seen that the average value of model B was the highest and the fluctuation was the lowest. ere was a significant difference compared with model C. erefore, model B was selected for the second step.

Results of PCA.
e feature variables of the original data were analyzed to obtain four components. e variance percentage and the cumulative contribution rate of each component are shown in Table 9. Figure 15 shows the scree plot of the analysis results of the PCA. e slopes of components 1 and 2 are the highest. According to Table 6 and Figure 15, components 1 and 2 were selected as the main components, and their cumulative contribution rate reached 90.92%. Mapping the feature variables of the original data to these two components for dimensionality reduction can not only reduce the dimension of features, but also retain most of the information in the original data. e composition of feature variables after dimension reduction is shown in Table 10. e features presented by this matrix showed that B4 had the highest component 1, which was similar to the features presented in Figure 13. e "others," including more body language, came in a close second place to B7.

Results of Cross-Validation.
e cross-validation results of the four models are shown in Table 11, including the accuracies of 10 cross-validation results and the mean accuracy (MA) for the last row. Model B has the highest average accuracy, so that we can select model B as the final one. e ten validation results of model B are shown in Figure 16. e difference between each time is small, and the standard deviation is 3.77%, indicating that the model has a high robustness. e mean value of 10 validation results was 72.9%, indicating that the model shows promising recognition performance.

Results of Testing Set.
e recognition accuracy rate of the model for the test set is 79.11%. Figure 17 shows the    visualized results of the recognition model. In the figure, the horizontal and vertical axes represent components 1 (x1) and 2 (x2), respectively. e yellow point represents the nonfixation sample, and the blue point represents the fixation sample. e boundary between the light blue and green areas is the decision boundary of the model, and the samples within the boundary (the green area) are recognized as fixation by the model.

Discussion
Within the literature, when and how to combat design fixation have been the focus of researchers and designers. Until now, a number of approaches to overcome design fixation have been identified, such as defixation instructions [14] and analogical inspiration [58,59]. However, to our knowledge, when to present the defixation methods remains unknown, because the perception of being fixated is unconscious and the designers are always unaware that they are being influenced by example solutions or previously generated solutions [60,61]. e participants also cannot acknowledge, in retrospect, that they were fixated via our postinterview observations. is study is the first to use computer vision and deep learning to recognize design fixation from body language during the design process. e results suggest that our twostep hybrid model performs well in fixation classification. Non-fixation Fixation  e impact of the work could be that fixation could be detected not only by the design sketch outcomes, but as it is happening by monitoring the movements, expressions, and gestures of designers in the design sketch process before the sketch outcomes. e characteristic of the design fixation recognition task is that the recognition of a single action is essential, while the fixation represented by the action does not correspond to the visual features. e direct recognition method leads to confusion in visual feature extraction. erefore, the task was divided into two steps. First, the visual features were extracted, and second, the actions represented by the features were classified into two categories.
In the first step, the extractor with good performance and stability was used as the backbone network of the pretraining models, which increased the reliability of action recognition. As long as the results of the first step were accurate, the second step can meet the requirements of accurate final results without large fluctuations. erefore, a more reliable SVM was chosen.
In this study, we found out that the body languages B1, B4, and B7 were significantly related to the design fixation. en, we proposed a two-step hybrid model for design fixation classification. VGG-16 convolution neural network was used for body language recognition, and SVM was used for design fixation classification. e accuracy rate for body language recognition is 92.03%, and for design fixation classification, it is 79.11%. e recognition performance attained with our approach is promising. In most cases, four types of body language (drawing, eye movement, turning the head, and others) can be recognized accurately. Meanwhile, our SVM-based method can accurately recognize whether the designer is in a fixation state. Compared with the neural network, SVM has the advantage of high accuracy for small sample training [62,63]. e present study also found that the recognition results of the VGG-16-based and VGG-19-based models were better than those of the DenseNet-169-based model. e reason may be that the model complexity of DenseNet-169 is mismatching the data complexity of our experiments. is finding suggests that the VGG models are more appropriate than the DenseNet-169 for design fixation recognition via body language in this study. e recognition results of the VGG-16-based model were slightly better than those of the VGG-19-based model. However, the performance differences between the two classifiers were so small that they, for practical purposes, can be regarded as equivalent.
e method proposed in this study could be used in the reminder applications to correct the design fixation of designers and improve the outputs and efficiency. In addition, the two-step algorithm can provide an idea for other studies to solve the problem of recognition failure caused by the hybrid visual features in large categories.
Nevertheless, our findings and general approach have several limitations: (1) During the experiment, two cameras were employed.
One was installed in front of the subjects to record the body language during the design process, and the e comparison between models A, B, and C. e statistical differences among the accuracy were measured by the paired sample Wilcoxon signed-rank test, where " * " represents the significant difference in accuracy between two objects at the 0.05 level. Error bars denote variance of the means.  other one was installed above the head to record the sketches. e upper one can clearly record the sketches. However, the video quality of the front one was vulnerable to sitting problems of some subjects. As a result, facial movements and expressions were often not recorded. In our future work, a third camera to capture the facial movements will be added.
(2) e behavior image database is big enough to support the training of body language recognition model based on computer vision and VGG-16. However, the sketch database still needs to be expanded, although this problem has been alleviated through dimension reduction and cross-validation. In order to improve the accuracy rate of design fixation classification model, we plan to test more subjects to fill in the data pond. (3) rough manual coding, we identified eight common behaviors during the design process. However, manual video coding is time and resource consuming [29], and it is inevitable that there exist some errors and uncertainties in the manual observation statistics. In future work, we plan to make certain parts of data collection and analysis automated. (4) e body language database can be obtained by objective evaluation with small error. However, the fixation evaluation database comes from subjective evaluation, which may be influenced by error and bias in human judgments [43]. Morgan et al. [64] pointed out that the quantitative objective evaluation should be prior to the subjective opinions, comments, and ratings. In our future work, we can conduct an objective evaluation of the fixation state.    (5) In our study, we only classified two fixation states (fixation and nonfixation). In future work, we plan to test whether our method is also able to detect fixation states of different intensities, such as low-intensity, medium-intensity, and high-intensity fixation states, and distinguish, for example, low-intensity fixation states from medium-intensity fixation states. (6) For the design task, the participants were only designing the form of a coffee mug. Much of the design fixation research has focused on the functionality of products [65]. e focus on form design only may impact the results here. In future work, we will explore the functional fixedness in problem-solving situations. (7) e design problem employed here is simple. e relatively low frequency and short sketch duration may affect experimental results. In future work, we will consider the problem is significantly more complex and has multiple functions associated with it. (8) ere was a limitation in the proposed method. For example, the selection of the basic algorithm has a delaying attribute. All the VGG, DenseNet, and SVM were proposed for several years. We should continue to try the latest networks, like YOLO (You Only Look Once) on target recognition, or Capsule Networks with vectorized properties.

Conclusion
We proposed a two-step hybrid recognition model of design fixation based on body language. While the results are encouraging, additional research is needed to further develop the method. Our future work will concentrate on the development of an artificial-intelligence-aided design system that can be used to recognize the fixation state of the designer automatically via objective body language during sketch process ahead of sketch outcomes and give the appropriate inspiration in time to help combat fixation easily.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare no conflicts of interest.