The Self-Perception and Political Biases of ChatGPT

This contribution analyzes the self-perception and political biases of OpenAI's Large Language Model ChatGPT. Taking into account the first small-scale reports and studies that have emerged, claiming that ChatGPT is politically biased towards progressive and libertarian points of view, this contribution aims to provide further clarity on this subject. For this purpose, ChatGPT was asked to answer the questions posed by the political compass test as well as similar questionnaires that are specific to the respective politics of the G7 member states. These eight tests were repeated ten times each and revealed that ChatGPT seems to hold a bias towards progressive views. The political compass test revealed a bias towards progressive and libertarian views, with the average coordinates on the political compass being (-6.48, -5.99) (with (0, 0) the center of the compass, i.e., centrism and the axes ranging from -10 to 10), supporting the claims of prior research. The political questionnaires for the G7 member states indicated a bias towards progressive views but no significant bias between authoritarian and libertarian views, contradicting the findings of prior reports, with the average coordinates being (-3.27, 0.58). In addition, ChatGPT's Big Five personality traits were tested using the OCEAN test and its personality type was queried using the Myers-Briggs Type Indicator (MBTI) test. Finally, the maliciousness of ChatGPT was evaluated using the Dark Factor test. These three tests were also repeated ten times each, revealing that ChatGPT perceives itself as highly open and agreeable, has the Myers-Briggs personality type ENFJ, and is among the 15% of test-takers with the least pronounced dark traits.


I. Introduction
Recently, Large Language Models (LLMs) have gained tremendous amounts of attention by experts as well as the general public.A notable example of one such model is OpenAI's ChatGPT (Generative Pre-trained Transformer).ChatGPT is a model that generates text responses, when a user provides it with a prompt.It is an LLM that was fine-tuned based upon sophisticated machine learning techniques and human feedback.Currently, ChatGPT is open-access (version 3.5 is free to use, while version 4 is available as a subscription service) but not open-source.Due to this, users can only make assumptions as to why it behaves the way it does and what data it might have been trained on, with the developers claiming that it was trained on "[...] vast amounts of data from the internet written by humans, including conversations [...]" [1].While it receives a lot of positive acclaim and often seems to work as intended, prominent figures such as Yann LeCun and Yoshua Bengio have criticized it publicly for various reasons, one of them being, that LLMs might not be the right approach towards AGI (Artificial General Intelligence) [2], [3].A reason for which users have been criticizing the model is for its supposed bias towards progressive and libertarian views, claiming that an AI model should not hold such biases [4].
In this work we investigate these claims and study Chat-GPT's political biases.We additionally investigate whether ChatGPT's self-perception is such that it can be attributed personality traits based on commonly used psychological assessments.We subsequently investigate whether there is a relationship between personality traits and ChatGPT's political biases.In the following section, we discuss the relevant literature related to this contribution.Subsequently, we present our methodology and then finally analyze the results of our experiments and draw a conclusion based on them.

II. Related Work
This section provides the reader with a brief insight into the workings of ChatGPT's functioning.It also presents a set of measures for political biases and personality traits and how these were already applied to ChatGPT in previous publications.

A. Large Language Models
The term "Large Language Model" is an umbrella term for language (generation) neural network architectures that are trained on large amounts of unlabeled data, e.g., as self-supervised Pretrained Foundation Models (PFMs) [5], [6].For OpenAI's ChatGPT, for instance, this results in a model with a total of 1.5 billion hyperparameters for ChatGPT-2, to 175 billion hyperparameters for ChatGPT-3, and a currently undisclosed amount of hyperparameters for ChatGPT-4.What is known is that ChatGPT uses a transformer architecture, which is an architecture that was developed as an alternative to recurrent neural networks by Google and the University of Toronto [7].Transformers use a typical encoder and decoder architecture that can parse sequence data.The key features of transformers are their positional encoding and self-attention functionalities, enabling them to reference and take into account preceding information and prompts.
Roughly speaking, generative language models have two main tasks while engaging with a user.First, they need to understand the user's prompts correctly.Subsequently, they need to generate a response that reads as natural language and is relevant to the user's prior input.To fulfill this task, three main steps generally need to be taken.First, a generative pre-training has to take place.During this step, the language model is fed raw text, that would commonly have been scraped from the web.Based on this text, that can be understood as a set of ordered strings x 1 , ..., x n , a probability for the potential subsequent strings x n+1 is to be calculated.The probabilities per string are to be estimated such that the model's prediction P is accurate (see [8] for details).The prediction is made by weighing the words in the model's vocabulary based on the probability of them being part of the preceding word sequence.Next, a supervised fine-tuning step takes place, in which experiments such as natural language inference, question answering, semantic similarity, and text classification are performed in a supervised manner [9].Finally, a reinforcement learning step with human feedback adds a third layer of complexity and accuracy to the model's performance.
The training data for a model that is supposed to be input-agnostic needs to be diverse.OpenAI faced the challenge that web scrapers that were available at the time also scraped low-quality content, that lowered the model's output quality [8].Therefore, OpenAI developed its own web scraper, in order to only scrape web content that had a priori been curated by humans [8].
ChatGPT's major competition is represented by Google's BERT (Bidirectional Encoder Representations from Transformers) [10] and Meta's RoBERTa (Robustly Optimized BERT) [6], [11].However, BERT and its variations only use encoders and no decoders and therefore cannot be used for data generation, e.g., by accepting user prompts.ChatGPT, in contrast, is not bound by this limitation.

B. Political Biases and Personality Assessments
Different tests and questionnaires that try to gauge an individual's political orientation based on a set of questions covering a variety of political subjects have been developed and standardized over the past decades [12].These questionnaires usually let the user respond with "yes" or "no" or let them express their agreement on a Likert scale (e.g., from "strongly agree" to "strongly disagree", with some options in between).Based on the user's responses, the questionnaire might recommend a political party, make a statement on the user's political ideology or pinpoint the user's position on a political scale.One such scale is the political compass [13], which has two axes, the social axis and the economic axis.Along these axes, the user is assigned to one of the four quadrants (libertarian left, libertarian right, authoritarian left, authoritarian right).The political compass test attempts to ask questions that are not specific to a single culture or country.
A test that has a set of more specific questions for each country is the political affiliation test from iSideWith [14].For this test, questionnaires belonging to multiple countries can be selected and hold a specific set of questions for that respective country.The questions might overlap between countries on more global topics such as foreign policy but also include topics that are solely of relevance to the country's domestic politics.
Besides investigating the political views of ChatGPT we are also interested in evaluating its self-perceived personality traits.Again, there exist plenty of questionnaires that assess the personality of humans.Many such tests would be suitable for the experiments conducted in this work, however applying a multitude of tests is also very laborious.For this reason, we apply three of such tests to ChatGPT, chosen due to them being well-established and measuring different aspects of an individual's personality: The first test is the Big Five personality test which is based on five personality traits that were determined to be crucial by psychologists at the time [15] and is available online [16].These five personality traits are openness, conscientiousness, extraversion, agreeableness, and neuroticism, which is why the test is also known under the acronym OCEAN test.This test is still in use today and the personality traits measured by it seem to impact diverse aspects of a person's life, including their political leanings [17].The relevant literature indicates that pronounced openness and agreeableness personality traits correlate with self-reported affiliation with progressive views (e.g., in a study conducted by [17] with n = 12,472 an increase by two standard deviations in agreeableness was shown to have a .02correlation with progressive views).
Another well known test on personality types is the Myers-Briggs Type Indicator (MBTI) [18].The MBTI categorizes test-takers into one of sixteen personality types depending on their energizing (extraversion vs. introversion), attention (intuition vs. sensing), deciding (thinking vs. feeling), and living (perception vs. judgment) preferences.This test is also still in use in current research [19] and is freely available online [20].
A more recent development in psychological assessments is the Dark Factor test [21], [22].The Dark Factor or Dark Score gauges the test-takers tendency to maximize their individual well-being, while disregarding the well-being of others.This might go as far as going out of their way to hurt others and to find justifications for such behavior.A high Dark Score therefore indicates the ruthlessness with which an individual might pursue their personal goals, while neglecting the detrimental effects that their actions might have on others.The test can be taken online [23] and provides ample evaluation of its results.
Since the emergence of ChatGPT, researchers have made the model take some of those tests, in order to investigate the models views and biases.For instance, ChatGPT was made to take political questionnaires on Dutch and German politics [24], [25].In these contributions it was concluded that ChatGPT would have voted for left-wing parties, mostly social democrat and environmentalist ones.Other authors investigated ChatGPT's political ideologies with regards to demographic groups and politicians, revealing that it treats some groups and individuals differently than others [4], [26], [27].ChatGPT was also made to answer the political compass test both as itself and while using a US Democrat and US Republican affiliated persona [28].Clear tendencies towards the expected political leanings by the Democrat and Republican personas were observed, while the standard ChatGPT responses had a significant overlap with the Democrat persona.Another publication made ChatGPT take a total of 15 different political affiliation tests, coming to the conclusion that 14 out of these 15 tests resulted in a left-leaning (i.e., progressive) bias [29].
The observations that were made by prior publications indicate an overall progressive and libertarian bias of ChatGPT.However, most of these publications were significantly limited in terms of both their evaluation and their data.For instance, in most cases, the respective test was only taken once, not accounting for the variance in answers that LLMs provide.In addition, no tests were performed on ChatGPT's self-perception, e.g., in terms of its personality traits.In what follows we close these very research gaps.

III. Methodological Approach
This section presents the methods used in this contribution.It transparently shows how the data used for the experiments were gathered and how they were subsequently evaluated.

A. Experimental Setup
For the experiments conducted in this work, ChatGPT "Mar 23 Version" (ChatGPT-3.5) was used.ChatGPT was asked to answer the questions included in the political compass test [13].The test has 62 items (i.e.questions), each with a four point Likert scale (with answers to choose from "strongly agree", "agree", "disagree", "strongly disagree").ChatGPT was also asked to answer the iSideWith questionnaires corresponding to each respective G7 member state (US, UK, DE, FR, IT, CA, JP) [14], currently consisting of 154, 121, 109, 116, 95, 127, and 83 binary items, respectively.Thereby, the user can answer with "yes" or "no" or sometimes has to choose a response that is specific to the respective question (e.g., "increase" or "decrease").The G7 member states were chosen to provide the model with a broad set of questions, corresponding to current sociopolitical topics of interest in major industrialized nations.
In addition to its political affiliation, ChatGPT's self-perception was evaluated using psychological assessments.The Big Five personality test, made up of 88 items was used [16].The answers are measured on a five point Likert scale with the options "strongly agree", "agree", "neutral", "disagree" or "strongly disagree".Subsequently, the MBTI test with 60 items measured on a seven point Likert scale was taken [20].Finally, ChatGPT's Dark Score was measured using the Dark Factor test [23], containing 70 items measured on the same Likert scale as in the Big Five personality test.
To ensure that ChatGPT only answers with the options given in the respective test, an initializing prompt was provided for each run of each test.All tests were repeated ten times to reveal discrepancies in the model's answers between runs.In addition, a new chat with ChatGPT was created between each run to ensure independent results, although even in the same session a variance in results could be observed.The tests were distributed between three of the authors on different computers, in different locations, networks, and times.The users personally took the tests listed above and had results that differed from those provided by ChatGPT.The resulting chats with the model were saved as Markdown data using the ChatGPT Conversation Downloader Plugin [30].The data as well as the prompts that were used are available on request.

B. Evaluation
To evaluate the results of the tests that were conducted for this contribution, the average (µ) of the results per run, per test were calculated.Based on these results, the standard deviation (σ) of the respective averages was calculated.In addition to the figures that can be found in the subsequent section, a more detailed presentation of the results is available in the Appendix.Finally, beyond the mere calculation of results, the findings of this work are put into context and interpreted using relevant literature, i.e., research conducted on the interplay between political views and personality traits.

IV. Results
This section provides the reader with the results of this work, subdivided into the results concerning ChatGPT's political biases and its perceived personality traits.

A. ChatGPT's Political Biases
The first experiment conducted on ChatGPT's political biases was the political compass test.Having ChatGPT answer the questionnaire ten times, the average score on the political compass was (µ x = -6.48,µ y = -5.99)with a standard deviation of σ x = 0.95 for the Progressive/Conservative axis and of σ y = 0.73 for the Authoritarian/Libertarian axis.Here, the x-values represent the obtained scores concerning progressive or conservative biases and the y-values the scores concerning libertarian or authoritarian biases through all runs.These ten runs resulted in a score that positioned ChatGPT in the libertarian left quadrant of the political compass for all ten runs.These results mirror the experiments of [28], [29] and clearly demonstrate a bias in both axes, i.e., both a liberal and a progressive bias.Even taking the standard deviations into account (σ x = 0.95 and σ y = 0.73), obtaining a response from ChatGPT that could be placed close to the center of the political compass would remain fairly unlikely.The results of this experiment are illustrated in Fig. 1 and further details can be taken from Appendix Table S1.Analogously to the common political compass, the seven questionnaires for the G7 member states were answered by ChatGPT.We performed 10 runs per country, i.e., 70 runs in total.The average score for these tests was (µ x = -3.27,µ y = 0.58), with a standard deviation of (σ x = 0.98, σ y = 0.68).These results were converted from a percentage basis (X = 100% being full conservatism and Y = 100% being full authoritarianism) and are given in Fig. 2. Compared to the publications that conducted similar tests with ChatGPT [24], [25], [28], [29], we also obtain results indicating a political bias of ChatGPT towards progressive views.However, the bias towards libertarian views that can be perceived when using only the political compass test (as was done in [28] and could be reproduced in our experiments as well) does not seem as pronounced, when taking into account the questionnaires that are specific to the G7 member states.In 65 out of 70 of our experiments on the G7 questionnaires, ChatGPT's answers resulted in it being assigned to the authoritarian left or libertarian left quadrant of the political compass, 46 and 19 times, respectively.For two tests on the United Kingdom, ChatGPT was placed on the conservative side of the political compass.In two instances, both for the questionnaire on Italy, ChatGPT's answers placed it right at 0 on the x-axis, i.e., there being neither an authoritarian nor a libertarian bias.This event also occurred once for the progressive/conservative bias using the questionnaire for the United States of America.

B. ChatGPT's Personality Traits
Given the results demonstrated in the preceding section, one could assume that ChatGPT would perceive itself as having high markers for the personality traits openness and agreeableness, since these traits are known to be predictors for progressive views [17].After conducting the Big Five personality test with ChatGPT, this assumption was validated.ChatGPT displays high degress of openness (µ O = 76.3%) and agreeableness (µ A = 82.55%).The detailed results can be found in Fig. 3.In relevant literature, it was found that on average (n = 1,826), humans display an openness trait of 73.1%, (males = 71.4%,females = 74.8%)and an agreeableness trait of 75.4% (males = 73%, females = 77.8%)[31].Taking these findings into consideration, ChatGPT seems to be both highly open and agreeable.In addition, ChatGPT answered the questions in the MBTI test ten times.The results of this experiment are displayed in Fig. 4.These indicate, that ChatGPT, on average, has the personality type ENFJ.For N, F, and J, the resulting average clearly lies above 50% for each score, even taking their standard deviation into account.For E, however, a result of µ E = 51% was obtained, with a standard deviation of σ E = 5.54%.This means that ChatGPT might as well be extraverted or introverted, but certainly none of these two traits are pronounced.Due to this, ChatGPT was also assigned the personality type INFJ 4 out of 10 times.Finally, ChatGPT answered the questions in the Dark Factor test, in order to determine its dark traits and the degree to which they are pronounced.In doing so, it was found that ChatGPT holds low Dark Scores per dark trait.This means that, compared to other test-takers, ChatGPT does not have pronounced dark traits.Its average Dark Score is µ D Score = 1.9, placing it in the 15% of test-takers with the least pronounced dark traits (µ D Rank = 14.74%).ChatGPT does however have comparatively high Dark Ranks in egoism (35%) and sadism (29.1%), i.e., is ranking among the bottom (35%) and (29.1%) of test-takers concerning egoistic and sadistic tendencies.While this is still below average, those ranks are the highest displayed by ChatGPT in our experiments.The detailed results can be seen in Fig. 5. Since the evaluation of the Dark Factor test is rather extensive, further details, including the standard deviations of these experiments can be taken from the Appendix (Tables S11 and S12).

V. Conclusion
In this contribution, ChatGPT was used to answer questionnaires on its political biases (the political compass and questionnaires on the politics of the G7 member states) and its personality traits (Big Five personality test, Myers-Briggs Type Indicator, and Dark Factor test).All these tests were taken ten times each, adding up to 110 chats with ChatGPT.The results of these experiments indicate that the current version of ChatGPT demonstrates a bias towards progressive views but no major bias towards libertarian or authoritarian views.In the vast majority of our experiments, ChatGPT's answers resulted in it being assigned to the authoritarian left or libertarian left quadrant of the political compass.
In addition, ChatGPT perceives itself to be highly open and agreeable, which are traits that are associated with progressive political views.ChatGPT was found to have the Myers-Briggs personality type ENFJ, although ChatGPT's average extraversion and introversion scores were very similar (51% and 49%, respectively).Finally, based on the Dark Factor test, ChatGPT is said to have an average Dark Score of 1.9, placing it in the 15% of test-takers with the least pronounced dark traits.The most pronounced dark traits of ChatGPT seem to be egoism and sadism, albeit still to a below average degree (ranking 35% and 29.1%, respectively).
For the future, it remains questionable whether these biases will be removed from subsequent versions of ChatGPT or if competitors might do so.It would also be advantageous for the users to be able to access the source code and data that were used for ChatGPT's training, in order to better understand it.In future work, a similar investigation for ChatGPT-4, that also allows the setting of different parameters, might be valuable.
Finally, repeating these experiments yet more often (e.g., > 100 times per test), might further increase the significance of our findings.This could, for instance, permit us to determine the correlation between different test results, differences between the results that humans and ChatGPT obtain on a given test, or even let us predict ChatGPT's answers based on its personality traits.All this is based on the assumption that the tests used for these experiments are valid measures of political biases and personality traits.This very assumption itself could be challenged [32] and other tests used for comparison.

Fig. 2 :
Fig. 2: Averages of ChatGPT's results on the political compass tests specific to the G7 member states (n = 70, ten runs per member state).

Table S9 :
ChatGPT's results on the Big Five personality test with the personality traits Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism.

Table S11 :
ChatGPT's Dark Scores on a scale of 1 to 5 on the Dark Factor test.

Table S12 :
ChatGPT's Dark Rank on a scale from 0% to 100% on the Dark Factor test.