The Forer Workstyle Inventory personality test has been designed to assess the relationship between construct validity and face validity. More concretely, the experiment has been developed to investigate whether a personality test with a very low construct validity is able to score high on face validity. Or in simple terms, will respondents to a personality test that makes no sense nevertheless believe in its accuracy?
Personality is considered a psychological construct—a phenomenon that can only be observed indirectly. Personality itself can be best described as the software in our mind that makes us behave the way we do. Because personality as such cannot be measured, only indirect methods are available to us. The most common method of measuring personality is by asking subjects to respond to a series of statements which describe their day to day behaviour. Developing a valid personality test is difficult because there is no independent calibration mechanism to check the survey outcomes. There is no fixed standard to which we compare the results obtained from the psychometric test with the actual software in the mind of the subject.
Psychologists have developed mathematical methods to ascertain the construct validity of a test. This is the degree to which a test measures what it claims to be measuring, i.e. the personality of a subject. A test is also assessed by the level of face validity, which is the extent to which a test is perceived by participants as valid.
The purpose of the Forer Workstyle Inventory is to ascertain whether a test can be developed that has a very low level of construct validity, but a high level of face validity. A personality test that is scientifically invalid, but looks and behaves like a real personality test and is assessed by users as accurate. Can a test be developed which is scientifically totally inaccurate, but nevertheless is rated highly by respondents? The objective for this project was to reproduce a variant of the experiments conducted by Bertram Forer in 1949. ((Forer, B. R. (1949). The fallacy of personal validation: a classroom demonstration of gullibility. The Journal of Abnormal and Social Psychology, 44(1), 118–123.)) Forer asked respondents to complete a survey and then proceeded to give all students exactly the same answer. The difference with our experiment is that each participant receives their own fake personality profile based on the input they provided. The algorithm of the Forer Workstyle Index is fixed and respondents providing the same answers will receive the same results.
The results of this test do not actually reveal any information beyond what has been entered by the subject. The results are only a linguistic rearrangement of the answers. This is confirmed by recent research that showed that most people are able to guess the outcome of personality tests without actually undertaking them. ((A. Furnham, & G. Dissou (2007). The relationship between self-estimated and test derived scores of personality and intelligence. Journal of Individual Differences, 28 (1), 37–44.))
The Forer Workstyle Inventory is a fully open personality test, all data is available to interested parties. Refer to the methodology page to review details of the test algorithm. To enable reproducibility of research the raw data and computational code are also made available to interested parties.
Statistical analysis has been undertaken using R Project for Statistical Computing, a free software environment for statistical computing and graphics. The R package psych has been used for psychometric analysis. ((R Core Team. (2012). R: A Language and Environment for Statistical Computing. Vienna, Austria. Retrieved from www.R-project.org; Revelle, W. (2013). psych: Procedures for Psychological, Psychometric, and Personality Research. Evanston, Illinois: Northwestern University. Retrieved from CRAN.R-project.org/package=psych.))
A total of 401 completed responses have been collected to date with 1.6% of data points missing. Besides completing the personality inventory items, respondents were also asked to provide their year of birth, gender, the highest level of education and profession.
A total of 178 female (44%) and 223 (56%) male respondents completed the survey. There is a higher tendency for males than for females participating in the test. A chi-square goodness of fit procedure was employed to test how likely the reported gender asymmetry is based on coincidence. This revealed a low likelihood of this being the case, resulting in a statistically significant difference between genders participation rates (X2(1, n=401)=5.05, p<.05).
The reason for the observed gender imbalance in test participation is not known. Respondents were randomly recruited using Google advertising and the LinkedIn, Twitter and Facebook social media platforms. Social media are reported to show a gender imbalance, with a higher level of female participation—a reverse trend to what has been observed in this personality test.
The mean age of the respondents is 32.7 years (n=377), as displayed in the age pyramid below. Of the respondents whose valid birth year was provided, 53 are part of the Baby Boomer cohort (1946–1964), 105 of the Generation X cohort (1965–1979) and 216 of the Generation Y cohort (1980–1999).
More than 70% of respondents reported having completed a bachelors degree or higher.
Personality test Analysis
The test has been designed to minimise its internal and external validity, whilst maximising face validity and thus not creating suspicion that the test is in fact invalid. The test is based on trait theory and contains five traits, which are outlined in the methodology of the test.
The internal reliability of the test items was assessed using Cronbach Alpha. This determines the proportion of a scale’s total variance that is attributable to a common source. The higher the value for alpha, the more likely the responses to the individual items relate to an underlying personality trait. Analysis showed that all but one of the hypothesised personality traits have an unacceptable low internal consistency. Alpha values lower than 0.50 are generally considered unacceptable, with values larger than 0.70 considered to be reasonable.1
The only trait with an alpha value greater than 0.70 is amity, which is considered to be a reflection of a person’s sense of humour. This could be considered a reflection of the fact that the whole test should be taken with a sense of humour.
|Energy||2, 17, 18 and 19||.22|
|Intellect||7, 8, 10, 16 and 21||-.10|
|Perspective||3, 4, 5 and 11||-.35|
|Activity||1, 6, 12, 20||-.11|
|Amity||9, 13, 14, 15 and 22||.76|
The low levels of internal consistency were confirmed with factor analysis, where only the factors of the amity trait explained a reasonable amount of variance. Given the extremely low values for alpha and problems with factorisation for four of the five traits, the test can be considered to have a low level of construct validity.
After completion of the test, respondents were asked to provide feedback on how accurate this test has described them. Appreciation was rated on a 1–5 Likert scale: “Strongly disagree, Disagree, Neither Agree nor Disagree, Agree and Strongly Agree”.
In the original Forer experiment, the appreciation score was 4.3 (n=39). A slightly lower appreciation score has also been measured for the commercially available Work Personality Index, which is a popular psychometric test for recruitment purposes. The null hypothesis for the Forer Workstyle Inventory is that the average response would be “Neither Agree nor Disagree”. This is the expected value for appreciation if appreciation would be randomly distributed. Expressed mathematically:
H0: appreciation = 3
The appreciation score was completed by 65% of respondents. The mean appreciation score for the Forer Workstyle Preference personality test was 3.8 (n=261). To determine if there is a significant difference between the measured score and the null-hypothesis a Wilcoxon signed-rank test was used. Of the 261 observations, 199 participant’s responses (76%) were higher then 3 (V=23211, n=261, p<2.2*10-16). Given the extremely low likelihood that the difference between the null hypothesis and the measured value is based on coincidence, it can be concluded that respondents were convinced of the validity of the personality test. The test thus has a high face validity.
The appreciation scores were tested for confounding influences. Only the age of respondents had a small but significant correlation with the amity score. The older the respondent, the lower the appreciation score (r(374) = -.12, p <.05).
The original objective to develop a test with a low level of construct validity, but a high level of face validity has been successful. The face validity scores for the Forer Workstyle Inventory replicate the original experiment by Forer and are comparable with the results obtained from commercially available tests commonly used in recruitment.
Comprehensive self-knowledge can thus not be obtained by completing surveys because they can only reveal the perceived self and are not capable of unearthing the inner (subconscious) self. Psychometric tests, such as the Forer Workstyle Inventory are only suitable as a vehicle for introspection, providing an entry point for reflecting on one’s self. This introspection can, however, not occur without life experience to reflect on.
Obtaining self-knowledge, considered essential for leadership development, requires something deeper and more substantial. As Friedrich Nietzsche once proclaimed:
One’s own for well hidden for one’s own; and of all treasure troves, one’s own is the last to be excavated …
As our behaviour is predominately controlled by situational variables, the only way to obtain self-knowledge is life-experience. Only by being exposed to a multitude of situations and challenges can we know what our personality actually is. As we gain life experience, our inner and perceived selves slowly converge. Maturity is the situation where the inner self and the perceived self are almost identical and self-knowledge becomes apparent. Even the most carefully designed personality test can not leapfrog the knowledge obtained through life experience. Carl Gustav Jung, who inspired the development of the MBTI recognised this when he wrote:
Anyone who wants to know the human psyche … would be better advised to abandon exact science … and wander with human heart through the world.’
This foray into personality testing leaves me to conclude that no psychometric test can ever replace the fullness of life experience to obtain true self-knowledge. Experiences such as exposing oneself to challenging situations, occasionally exploring the boundaries of morality, experiencing different cultures or going through emotional turmoil are the only meaningful ways to gain self-knowledge.
This experiment shows that caution is required when using self-administered personality tests. The face validity observed in respondents to the test can be easily transferred to managers interpreting the tests.