Monday, 6 August 2012

How should we measure general intelligence using IQ tests? Using *best* perfomance measures to define individual IQ, and also to generate the standard curve for defining IQ

Monday, 6 August 2012


General intelligence (g) is a construct used to explain that (in group studies) each and all cognitive abilities are co-correlated - being good at one implies being good at all the others. The hypothesis is that this co-correlation of abilities is due to a single underlying ability of general intelligence or g, with specific abilities (having various levels) on top of g.

Since g cannot be measured directly, IQ is derived from measuring cognitive abilities and putting people into rank order for ability - for instance, measuring one, several or a lot of cognitive abilities in 100 people, marking the test, then putting the 100 people into rank order (best to worst marks) - highest to lowest IQ.

(The validity of IQ testing comes from the fact (and it is a fact) that the rank order on the IQ test is statistically highly significantly correlated with a wide range of outcomes including exam performance, job performance, health and life expectancy.)


So IQ is ultimately a matter of rank order in tests.

The actual IQ score a person gets comes from a statistical manipulation of the rank order data, to make the distribution into a 'normal' or Gaussian curve, and the average score of a 'representative' population into 100 with (usually) a standard deviation of 15.

This is the 'standard curve' of IQ, since it is the standard against which individuals are measured.

The standard curve is constructed such that it describes the proportion of people that would get a particular IQ score - for example, an IQ of 115 is one standard deviation above the average and therefore about 16 percent of the population would have an IQ of 115 or above.


But there are difficulties in generating an IQ score for individual people, and in moving between the rank order data generated in a group study (used to generate the 'standard curve') and the score of an individual person doing an IQ test.


The individual score in an IQ test ought to be measuring a fundamental property of human ability (a property of the brain, roughly speaking).

Yet many or most IQ tests in practice require non-g abilities such as good eyesight, the ability to read, ability to move hands and fingers quickly and accurately; they require concentration (that a person not be distracted by pain or other interferences), many tests require stamina, a degree of motivation and conscientiousness in completing it... and so on.

In other words there are a range of non-g related factors which might reduce the test score for non-g reasons.

This means that the most valid measurement of intelligence is the highest measure of intelligence in a person.

So the best way to measure intelligence is for a person to do a series of IQ tests on different occasions and to take the highest score as the true-est score.


BUT this must also apply to the standard curve used to generate the IQ score.

The standard curve must be constructed from the highest IQ score of (say) 100 randomly chosen people - and these highest scores put into rank order and made into a normally distributed curve with the correct properties.


Yet this is not what happens.

The standard curve is typically generated using a one-off test on the representative sample, but the individual IQ is derived from the best performance in an IQ test - this systematically biases individual IQ scores towards being higher than they really are.


Of course, there are great logistical difficulties in using multiple tests (on several occasions) and best performances to generate a standard curve - much easier to get a representative group together just once for testing.

But this emphasizes the imprecision of individual measures of IQ.

If an individual gets their IQ score from a single test, it is likely to underestimate their real g, if the test is done in a way or at a time when their performance is impaired.

Yet if the individual has several tries at IQ test on different occasions, in order that their best possible level of performance be used to generate their real, underlying g, then this will overestimate their IQ.

(Doing several tests and taking an average does not work, because the bad performances drag-down the average.)


So, in practice and as things are - I do not feel that individual, one-off personal IQ measurements can be regarded as precise.

Probably individual IQ should be banded into roughly half-standard deviations.

Something like average as 96-104, above average as 105-114, high as 115-124, (above this 'g' begins to break-down as the component tests lose co-correlation) very high as 125-140, and above that we have the super-high and strange world of potential geniuses.

(Below average would probably be a mirror of this - but the meaning of low IQ is a bit more variable, and the levels may be very low.)

But IQ differences between individuals of less than half an SD (less than about 7 or 8) are uninterpretable - even around the average. 



Anonymous said...

Excellent post!

A person's best performance ever on an IQ test might be more meaningful than some random performance, or one's average performance, however if everyone was judged by their best ever performance, the average IQ might have a mean of 120 and an SD of 10, instead of a mean of 100 and an SD of 15.

So let's say one's best ever score is 130. That score has a Z score of 2 with respect to the particular test it was obtained on, but has a Z score of 1 with respect to the distribution of best ever scores. So the person's IQ should be listed as 1(15) + 100 = 115.

Such a procedure would allow one to be judged by their best ever performance without creating IQ inflation.

Of course, one would need more precise information to do this accurately, like how many tests were taken, which tests were taken, practice effects, etc.

Bruce Charlton said...

@p - my feeling is that no statistical method could correct for the problem, because the best performance would be non-random. I am thinking of someone who suffers some kind of illness which impairs test performance - headaches or anything that impairs concentration, perceptual or motor problems. These could be occasional, or they could be frequent - so their best performance might be almost always, or very rare - and would not be known unless the test coincided with one of the times they felt best.

Anonymous said...

I suppose the more often one is tested, the more likely it is that one's best performance will emerge, though IQ testing is poorly suited for repeat measures because there aren't that many really good IQ tests around, and taking the same test, or similar tests, repeatedly, invalidates them as measures of NOVEL problem solving.

Chronometrics would work much better in this context. In the late 1990s a member of the Prometheus high IQ society did some experimenting with a chronometric game called THINKFAST. Unlike the simple reaction time tests used by Galton, THINKFAST measured choice reaction time, reaction time consistency, and working memory, and thus produced aggregate scores that, after practice, seemed to become as g loaded as most IQ tests.

The experiment showed that THINKFAST had very low correlations with IQ when people first started. A lot of extremely bright people performed poorly because they lacked practice with video games or something, but after playing hundreds and hundreds of times, scores would improve dramatically until people reached their physiological limit and could no longer improve.

In a sample of about 28 people, this physiological limit (maximum score) correlated about 0.7 with the SAT, implying the physiological limit had a g loading of about 0.8. These physiological limits were transformed into IQ equivalents, and for a while, there was talk of people qualifying for the Prometheus society based on their maximum THINKFAST score.

It was great because even though fatigue or headaches really destroyed your score, you could always get a good night sleep and try again the next morning.

Bruce Charlton said...

@p - That sounds like a reasonable approach for putting high test scoring people into rank order. But it does not overcome the problem of giving a precise percentile score to people at the highest IQ - because there is no satisfactory population norm to calibrate people against - since a random sample of normal people could never be induced to do this test. I personally think that there is NO way of giving people a valid score at high levels of g - for several reasons.

Anonymous said...

I think a sample of normal people could be induced to take it because unlike many IQ tests, where you have to think hard, THINKFAST is just like playing video games, which normal people, and even sub-normal people, seem to love.

The maximum THINKFAST scores were assigned percentiles by having a group of bright people play THINKFAST for an hour a day for 3 weeks, and then obtaining their SAT scores. The average best THINKFAST score in this sample was about 39 units and the average SAT score (re-centered scale) was about 1305. Thus, a best THINKFAST score of 39 units was assumed to have the same percentile as an SAT score of 1305 based on a widely accepted theory known as score pairing/equipercentile equating. The standard deviations were similarly equated, and since best THINKFAST scores (which are a true interval scale) enjoyed a very linear relationship with SAT scores, it was estimated (through linear extrapolation) that a maximum THINKFAST score of 63 would have the same percentile as a perfect score of 1600 on the SAT.

Since, in 1996-1997, 453 students out of 3.5 million 17 year olds in America scored 1600 on the SAT (and assuming virtually 100% of the brightest took the test and whatever shortfall their might be would be balanced by bright foreign students), then an SAT of 1600 represents one in 7,726 level ability. This equates to a normalized Z score of 3.63 in the general U.S. population which was multiplied by 15 or 16, and added to 100 to assign a THINKFAST score of 63, a deviation IQ of 154 or 158, respectively. By continuing the linear extrapolation to a normalized Z score of 4 (one in 30,000 level, which is the cut-off for Prometheus society), it was suggested that a THINKFAST score of 65 could allow one to join Prometheus.