Beyond the Flynn Effect

Beyond the Flynn Effect, a lecture by Professor James Flynn

James Flynn Beyond the Flynn Effect: Solution to all outstanding problems - except enhancing wisdom

A wise man has the ability to reach sound conclusions about... what conduces to the good life as a whole.

(Aristotle, Ethics, vi, 5, 1094b, 25-28)

Introduction

To forestall a diagnosis of megalomania - two preliminaries. First, the label "Flynn Effect" was coined by the authors of The Bell Curve and not by myself. I have never done any studies of IQ trends over time in the sense of actually administering tests. About 1981, it struck me that if IQ gains over time had occurred anywhere, they might have occurred everywhere and that a phenomenon of great significance was being overlooked. Of those who had measured gains here or there, Reed Tuddenham was the first to present convincing evidence using a nation-wide sample. Had I thought of attaching a name to the phenomenon, I would have offered his. Second, while the title of this paper suggests that I can make all of the problems posed by IQ gains go away, I do not really think that I can say the final word. I mean only that I can at last propose an interpretation that eliminates paradoxes. These paradoxes have been so intimidating as to freeze our thinking about the significance of IQ gains ever since we began to take them seriously (Flynn, 2006). I also wish to underline that if we want to write the cognitive history of the 20th century, rising IQ is at most half the story. There are other intellectual qualities, namely, critical acumen and wisdom, that IQ tests were not designed to measure and do not measure and these are equally worthy of attention. Our obsession with IQ is one indication that rising wisdom has not characterized our time.

Naming the paradoxes

(1) The Factor Analysis Paradox: Factor analysis shows a first principal component called "g" or general intelligence that seems to bind performance on the various WISC subtests together. However, IQ gains over time show score gains on the WISC subtests occurring independently of one another. How can intelligence be both one and many?

(2) The Intelligence Paradox: If huge IQ gains are intelligence gains, why are we not stuck by the extraordinary subtlety of our children's conversation? Why do we not have to make allowances for the limitations of our parents? A difference of some 18 points in the average IQ over two generations ought to be highly visible.

(3) The Mental Retardation Paradox: In 1900, the average IQ scored against current norms was somewhere between 50 and 70. If IQ gains are in any sense real, we are driven to the absurd conclusion that a majority of our ancestors were mentally retarded.

(4) The Identical Twins Paradox: Twin studies show that genes dominate individual differences in IQ and that environmental effects are feeble. IQ gains are so great as to signal the existence of environmental factors of enormous potency. How can environment be both so feeble and so potent?

The solution in shorthand

(1) Solution to the Factor Analysis Paradox: The WISC subtests measure a variety of cognitive skills that are functionally independent and responsive to changes in social priorities over time. The inter-correlations that engender "g" are binding only when comparing individuals within a static social context.

(2) Solution to the Intelligence Paradox: Asking whether IQ gains are intelligence gains is the wrong question because it implies all or nothing cognitive progress. The 20th century has seen some cognitive skills make great gains, while others have been in the doldrums. To assess cognitive trends, we must dissect "intelligence" into solving mathematical problems, interpreting the great works of literature, finding on-the-spot solutions, assimilating the scientific world view, critical acumen, and wisdom.

(3) Solution to the Mental Retardation Paradox: Our ancestors in 1900 were not mentally retarded. Their intelligence was anchored in everyday reality. We differ from them in that we can use abstractions and logic and the hypothetical to attack the formal problems that arise when science liberates thought from concrete referents. Since 1950, we have become more ingenious in going beyond previously learned rules to solve problems on the spot.

(4) Solution to the Identical Twins Paradox: At a given time, genetic differences between individuals (within a cohort) are dominant but only because they have hitched powerful environmental factors to their star. Trends over time (between cohorts) liberate environmental factors from the sway of genes and once unleashed, they can have a powerful cumulative effect.

Swimming freely of 'g'

If we factor analyzed performances on the 10 events of the decathlon, an athletics "g" would emerge and no doubt, subordinate factors representing speed (the sprints), spring (jumping events), and strength (throwing events). We would get a g(A) because at a given time and place, performance on the 10 events would be inter-correlated, that is, someone who tended to be superior on any one would tend to be above average on all. We would also get various g-loadings for the ten events, that is, superior performers would tend to rise further above average on some of them than on the others. The 100 meters would have a much higher g loading than the 1500 meters, which involves an endurance factor not very necessary in the other events.

Athletics g might well have much utility in predicting performance differences between athletes of the same cohort. However, if we used it to predict progress over time and forecast that trends on the 10 events would move in tandem, we would go astray. That is because g(A) cannot discriminate between pairs of events in terms of the extent to which they are functionally interrelated.

Let us assume that the 100 meters, the hurdles, and the high jump all had large and similar g loadings as they almost certainly would. A sprinter needs upper body strength as well as speed, a hurdler needs speed and spring, a high jumper needs spring and timing. I have no doubt that a good athlete would best the average athlete handily on all three at a given place and time. However, over time, social priorities change. People become obsessed with the 100 meters as the most spectacular spectator event (the world's fastest human). Young people find success in this event a secondary sex characteristic of great allure. Over 30 years, performance escalates by a full SD in the 100 meters, by half an SD in the hurdles, and not at all in the high jump.

In sum, the trends do not mimic the relative g loadings of the "subtests". One pair of events highly correlated (sprint and hurdles) shows a modest trend for both to move in the same direction, another pair equally highly correlated (sprint and high jump) shows trends greatly at variance. At the end of the 30 years, we do another factor analysis of performance on the 10 event of the decathlon and lo and behold, g(A) is still there. Although average performance has risen "eccentrically" on various events, the following is still true: superior performers still do better than average on all 10 events and are about the same degree above average on various events as they were 30 years before.

Factor loadings have proved deceptive about whether various athletic skills are functionally independent. We can react to this in two ways: either confront the surprising autonomy of various skills and seek a solution by depth analysis of how they function in the real world; or deny that anything real has happened and classify the trends over time as artifacts. The second option is respectable if you can actually present evidence. Perhaps the sprinters of 30 years ago lacked "event sophistication": they may have been so tense at the starting line that they all got slow starts when the gun went off. Perhaps the content of the event used to disadvantage sprinters by way of "cultural bias": The starters may have been Etonians (my word processor wants me to say Estonians) who insisted on issuing their commands in Greek. Such things would mean that better 100 meters times do not signal any real increase in speed. Therefore, the problems of why there has been only a moderate carry over to the hurdles and why there has been no carry over to the high jump are pseudo-problems.

But if there is no such evidence, the second option is sterile. It becomes a matter of saying that since the trends are not factor invariant, they must be artifacts. This assumes that the hypotheses about functional skills in the real world that factor analysis poses need not be tested against evidence. Or that evidence cannot be real evidence if it is falsifying. I assume that this is an option no one will choose.

It is better to talk to some athletics coaches. They tell us that over the years, everyone has become focused on the 100 meters and it is hard to get people to take other events seriously. They point out that sprint speed may be highly correlated with high jump performance but past a certain point, it is actually counterproductive. If you hurl yourself at the bar at maximum speed, your forward momentum cannot be converted into upward lift and you are likely to time your jump badly. They are not surprised that increased sprint speed has made some contribution to the hurdles because speed between the hurdles is important. But it is only half the story: you have to control your speed so that you take the same number of steps between hurdles and always jump off the same foot. If you told these coaches that you found it surprising that real-world shifts in priorities, and the real-world functional relationships between events, ignored the factor-loadings of the events, I think they would find your mind set surprising.

Back to the WISC subtests. Arithmetic, Information, Vocabulary, and Similarities all load heavily on g(IQ) and on a shared verbal factor. Despite this, Americans gained 24 points on Similarities between 1947 and 2002 (1.6 SDs), 4 points on Vocabulary, and only 2 points on Arithmetic and Information. Which is to say that the pattern of gains bears little relation to factor-loadings and cannot qualify as factor invariant. As usual, factor analysis was done in a static setting where individuals were compared with social change held constant. It has no necessary applicability to the dynamic scenario of social priorities altering over time. Therefore, the factor loadings adduced can at best pose hypotheses to be tested against the evidence of actual score trends over time. And g(IQ) turns out to be a bad guide as to which real-world cognitive skills are merely correlated and which are functionally related.

The artifact option cannot be supported by evidence. Test sophistication has to do with feeling comfortable with the format of IQ tests, or whoever administers them, or using your time better, or trying harder in the test room. The 20th century has seen us go from subjects who have never taken a standardized test to people bombarded by them and undoubtedly, a small portion of gains in the first half of the century was due to growing test sophistication. Since 1947, its role has been relatively modest. U.S. gains have been steady at least since 1932 (Flynn, 1984). Which is to say that they antedate the period when testing was common, were robust while testing was at its maximum, and have persisted into an era when IQ testing waned, due to its growing unpopularity.

If gains are due to test sophistication, they should show a certain pattern. When naive subjects are first exposed to IQ tests, they gain a few points but after that, repeated exposures show sharply diminished returns. America has been waiting for at least 70 years for its rate of gain to diminish. Other nations show accelerating gains over an extended period. For example, in The Netherlands, a huge rate of gain escalated decade after decade from 1952 to 1982 (Flynn, 1987).

Are IQ gains due to "cultural bias"? We must distinguish between cultural trends that render neutral content more familiar and cultural trends that really raise the level of cognitive skills. If the spread of the scientific ethos has made people capable of using logic to attack a wider range of problems, that is a real gain in cognitive skills. If no one has taken the trouble to update the words on a vocabulary test to eliminate those that have gone out of everyday usage, then an apparent score loss is ersatz. I can discern no cultural bias that favors the present generation. Note that obsolete items would actually lead to an underestimate of IQ gains. We measure IQ gains in terms of the extent to which people do better on a old test unchanged from 25 years before their time (say the WISC) than they do on a more current test whose content has been updated (say the WISC-R).

Let us supply a tentative functional analysis of various cognitive skill trends over time that explains their pattern without downgrading their reality. Assume for the moment (evidence below) that science has engendered a sea change. We no longer use our minds to solve problems on a concrete level only, rather we also use them to solve problems on a formal level. Once we used logic primarily with concrete referents: all toadstools are poisonous; that is a toadstool; therefore it is poisonous. Now we have become used to using logic with the taxonic referents provided by science: only mammals bear their young alive; rabbits and dogs both bear their young alive; therefore, they are both mammals. I will show that this would bring huge gains over time on Similarities. But so long as other subtests sampled the core vocabulary and information needed in everyday life, this causal factor would not trigger large gains on those subtests. Indeed, changing social priorities might include both emphasis on a more scientific outlook and less time for reading, in which case huge Similarities gains could be accompanied by Vocabulary and Information losses. All of these real world functional skills would assert their autonomy from one another and from the strait jacket of factor loadings.

Arithmetic deserves special mention. There have been huge gains on Raven's as well as on Similarities. Both Raven's and Arithmetic deal with solving "abstract" problems and are correlated in terms of factor loadings. Therefore, it seemed sensible to teach young children Raven's type problems in the hope that they will become better mathematics problem solvers. Indeed, U. S. schools have been doing that since 1991 (Blair, Gamson, Thorne, & Baker, 2005, pp. 100-101). But here IQ gains over time not only trump factor analysis but also validate their credentials as a diagnostician of functional relationships between cognitive skills. The large gains on Raven's and virtually nil gains on Arithmetic show that the relationship between the two is no more functional than the relationship between sprinting and the high jump. Sadly, our understanding of the functional process for learning Arithmetic is far behind our understanding of the high jump. Some speculation: except for mathematicians who link the formulae with proofs, mathematics is less a logical enterprise than a separate reality with its own laws that are at variance with those of the natural world. Therefore just as infants explore the natural world, children must explore the world of mathematics themselves and become familiar with its "objects" by self-discovery.

Michael Shayer is breaking new ground using teaching techniques based on self-discovery within small groups. In addition, he may have found cognitive skills that have genuine functional links to arithmetical reasoning. In Britain from 1975 to 2003, performance among schoolchildren on the Piagetian tasks of conceptualizing Volume and Heaviness declined by 0.8 SDs. Flynn (under review b) has analyzed British WISC data covering the latter half of that period. From 1990 to 2003, British children lost 0.4 SDs on the WISC Arithmetic subtest. The rates of loss are of course identical (Shayer & Adhami, 2003: Shayer & Adhami, in press; Shayer, Ginsberg, & Coe, in press).

To sum up. Factor analysis and g(IQ) describes a static situation where individual differences are compared and social change is frozen. The degree to which superior people are above average on the various subtests sets their respective g loadings. IQ gains over time describe a dynamic situation in which social priorities shift in a multitude of ways. No better maths teaching, more leisure but with the extra leisure devoted to visual rather then verbal pursuits, the spread of the scientific ethos, and a host of other things all occurring together. The average on Similarities rises but the average on Arithmetic and Vocabulary does not. How odd it would be if social trends mimicked factor loadings in determining what real world cognitive skills progress and which mark time! If they did so, IQ gains would appear factor invariant, but that would be purely accidental (Wicherts et al, 2004). Although radically different trends alter average performances on various WISC subtests between Time 1 and Time 2, note that this leaves a certain stability untouched. Superior performers are much the same degree above average on each and every subtest at both Time 2 and Time 1. Therefore, much the same g will emerge.

Our first paradox is resolved. At any particular time, factor analysis will extract g(IQ) -- and intelligence appears unitary. Over time, real-world cognitive skills assert their functional autonomy and swim freely of g -- and intelligence appears multiple. If you want to see g, stop the film and extract a snap shot; you will not see it while the film is running. Society does not do factor analysis. It is a juggernaut that flattens factor loadings and imposes its own priorities.

Where has all the intelligence gone?

As Tables 1 and 1b show, Full Scale IQ gains in America are impressive. I am a grandparent and a member of the WISC generation who were aged 5 to 15 when they were tested in 1947-1948. Let us put our IQ at 100. Our children are essentially the WISC-R generation who were 6 to 16 when tested in 1972 and against the WISC norms, their mean IQ was 107.63. Our grand children are the WISC-IV generation who were 6 to 16 in 2002 and against the WISC norms, their IQ was 117.63. We can of course work backward rather then forward. If the present generation is put at 100, their grandparents had a mean IQ of 82.36. Either today's children are so bright that they should run circles around us, or their grandparents were so dull that it is surprising that they could keep a modern society ticking over.

TABLE 1

WISC subtest	WISC to WISC-R 1947-1972	WISC-R to WISC-III 1972-1989	WISC-III to WISC-IV 1947-2001	WISC to WISC-IV 1947 to 2001	WISC to WISC-IV 1947 to 2001
Interval in years	24.5	17	12.75	54.25	54.25
Standard Deviation used	3	3	3	3	15
Information	0.43	-0.3	0.3	0.43	2.15
Arithmetic	0.36	0.3	-0.2	0.46	2.30
Vocabulary	0.38	0.4	0.1	0.88	4.40
Comprehension	1.20	0.6	0.4	2.20	11.00
Picture Completion	0.74	0.9	0.7	2.34	11.70
Block Design	1.28	0.9	1.0	3.18	15.90
Object Assembly	1.34	1.2	[0.93]	[3.47]	[17.35]
Coding	2.20	0.7	0.7	3.60	18.00
Picture Arrangement	0.93	1.9	[1.47]	[4.30]	[21.50]
Similarities	2.27	1.3	0.7	4.77	23.85

Table 1B

Version	Subtest Sum	Full Scale IQ	Gain	Rate per Year
WISC	100.00	100.00	--------	---------
WISC-R	111.63	107.63	7.63	0.311
WISC-III	119.53	113.00	5.47	0.322
WISC-IV	125.63	117.63	4.63	0.363

Adapted from Flynn & Weiss, under review. Sources: Flynn, 2000b, Table 1; Psychological Corporation, 2003, Table 5.8; Wechsler, 1992, Table 6.8.

Notes:

It is customary to score subtests on a scale in which the SD is 3, as opposed to IQ scores which are scaled with SD set at 15. To convert to IQ, just multiply subtest gains by 5, as was done to get the IQ gains in the last column.
Values in brackets for Object Assembly and Picture Arrangement are estimates that assume their gains from WISC-III to WISC-IV were the same relative to other subtests as in the WISC-R to WISC-III era.
As to how the full scale IQs in Table 1B were derived:
The average member of the WISC sample (1947-48) was set at 100.
The subtest gains by the WISC-R sample (1972) were summed and added to 100: 100 + 11.63 + 111.63.
The appropriate conversion table was used to convert this sum into a Full Scale IQ score. The WISC-III table was chosen so that all samples would be scored against a common measure. That table equates 111.63 with an IQ of 107.63.
Thus the IQ gain from WISC to WISC-R was 7.63 IQ points
Since the period between those two samples was 24.5 years, the rate of gain was 0.311 points per year (7.63 divided by 24.5 = 0.311)

The subsequent gains are also calculated against the WISC sample, which is to say they are cumulative. By the time of the WISC-IV, closer to 2002 than 2001, you get a total IQ gain of 17.63 IQ points over the whole period of 54.25 years. That would average at 0.325 points per year, with some minor variation (as the table shows) from one era to another.

In either event, the cognitive gulf between the generations should be huge. Taking the second scenario, almost 20 per cent of my generation would have had an IQ of 70 or below and be eligible to be classed as mentally retarded. Over 60 percent of American blacks would have been MR. Anyone born before 1940 knows that all of this is absurd.

The solution to the paradox is to be found by focusing on the WISC subtest trends rather than Full Scale IQ trends. As Table 1 shows, between 1947 (WISC) and 2002 (WISC-IV): Similarities shows a huge gain of 24 points (SD = 15), the five Performance subtests show gains ranging from 12 to 21 points, Comprehension shows 11 points, and the remaining Verbal subtests (Information, Arithmetic, and Vocabulary) show very limited gains of 2 to 4 points. Let us continue our analysis of the cognitive skills needed to do well on the various IQ subtests and compare their trends with trends on tests of educational achievement.

Similarities requires you to solve problems on the spot without a previously learned method. When asked "how dawn and dusk are alike", children have to imagine alternatives and select the one that best catches an intrinsic similarity. Something like: "You get up in the morning and go to bed at night but that makes no sense because I often sleep past dawn and go to bed after dark. They are alike in that the sky is half-lit and often very pretty but of course that is not always true. What they really have in common is that they are the beginning and end of both the day and the night. The right answer must be that they separate day and night." The Performance subtests measure this kind of problem-solving skill to a lesser degree. They require arranging blocks so that the view from above duplicates a presented pattern, building an object out of its disassembled parts, arranging pictures to tell a story. Most children have some experience at jigsaw puzzles or reading books in which pictures are the main vehicle of the story.

Although there is scant US data, many nations show that the only test that can match the huge gains on Similarities is Raven's Progressive Matrices (Flynn, 1998). This is not surprising when one takes into account that both measure on-the-spot problem solving without the help of a previously learned method. Case, Demetriou, Platsidou, and Kasi (2001, pp. 322-327) analyzed 23 tests including both traditional psychometric items (Matrices, seven WISC subtests, etc.) and Piagetian tasks (tilted boxes task, weights task, class inclusion, etc.). They found that Matrices and Similarities had fluid g loadings that were virtually identical. And they led all other tests as measures of fluid intelligence by a wide margin.

We turn to the subtests that show minimal gains. Having an adequate fund of general information, being able to do arithmetic, and having a decent vocabulary are very close to school-taught skills. It is less a matter of solving problems on the spot than exhibiting what you know: you either know that Rome is the capitol of Italy or you know only of Rome Georgia; you know what "delectable" means or you do not. Arithmetic is more complex: you must know the mechanics of calculation but the questions are put verbally, which means the child cannot give a purely mechanical (times-table-type) answer.

It is illuminating to use these trends to analyze trends on the National Association of Educational Progress (NAEP) tests, often called the nation's report card. The NAEP tests are administered to large representative samples of 4th, 8th, and 12 graders. From 1971 to 2002, 4th and 8th graders (average age 11 years old) made a reading gain equivalent to 3.90 IQ points (SD =15) (U.S. Department of Education, 2000, pp. 104 & 110; 2003, p. 21). However, by the 12th grade, the reading gain drops off to almost nothing (U.S. Department of Education, 2000, pp. 104 & 110; 2003, p. 21). The IQ data suggest an interesting possibility.

For the sake of comparability, we will focus on WISC trends from 1972 to 2002, rather than on the full period beginning in 1947. Between 1972 and 2002, U.S. schoolchildren made no gain in their store of general information and only minimal vocabulary gains (Table 1). Therefore, while today's children may learn to master pre-adult literature at a younger age, they are no better prepared for reading more demanding adult literature. You cannot enjoy War and Peace if you have to run to the dictionary or encyclopedia every other paragraph.

Take Kipling's poem:

    Over the Kremlin's serpentine pavement white
    Strode five generals
    Each simultaneously taking snuff
    Which softness itself was yet the stuff
    To leave the grand white neck no gash
    Where a chain might snap

If you do not know what the Kremlin is, or what "serpentine" means, or that taking snuff involves using a snuff rag, you will hardly realize that these generals caught the Czar unaware and strangled him.

In other words, today's schoolchildren opened up an early lead on their grandparents by learning the mechanics of reading at an earlier age. But by age 17, their grandparents had caught up. And since current students are no better than their grandparents in terms of vocabulary and general information, the two generations at 17 are dead equal in their ability to read the adult literature expected of a senior in high school.

From 1973 to 2000, the Nation's Report Card shows 4th and 8th graders making mathematics gains equivalent to almost 7 IQ points. But once again, the gain falls off at the 12th grade, this time to literally nothing (U.S. Department of Education, 2000, pp. 54 & 60-61; 2001, p. 24).

And once again, a WISC subtest suggests why. The Arithmetic subtest and the NAEP mathematics tests present a composite picture. An increasing percentage of young children have been mastering the computational skills the Nation's Report Card emphasizes at those ages. However, during that very same period, children made no progress in acquiring the reasoning skills measured by WISC Arithmetic. Reasoning skills are essential for higher mathematics. Therefore, by the 12th grade, the failure to develop enhanced mathematical problem-solving strategies begins to bite. American schoolchildren cannot do Algebra and Geometry any better than their grandparents. Although the older generation was slower to master computational skills, they were no worse off at graduation.

There is one area in which the cognitive skills of secondary students have undergone a dramatic change. The huge gains on the Similarities subtest show that today's youth are much better at on-the spot problem solving without a previously learned method. It is likely that this advantage is sustained and perhaps enhanced by university study. There are a number of likely dividends. Every year America has an increased number of managerial, professional, and technical jobs to fill -- jobs that often require decisions without the guidance of set rules.

Although we have focused on post-1972 subtest trends, these are virtually identical with post-1947 trends. So now we know why recent IQ gains do not imply that today's young people would put their grand parents to shame. Assume we hear a recent high school graduate chatting with his grandfather (who also finished high school) about a novel they both read the week before. There is no reason to believe either would have to make any allowance for the obtuseness of the other. Assume we discover essays on current affairs they both wrote shortly after graduation. There is no reason to believe that either would strike us as inferior to the other in terms of vocabulary or fund of general information.

We would be likely to notice some differences. The grandson would be much better in terms of on-the-spot problem solving in certain contexts. He would be no more innovative in solving mechanical problems such as fixing a car or repairing things around the house. But he would be more adept at dealing with novel problems posed verbally or visually or abstractly. Sometimes, the grandfather's "handicap" would affect social conversation, particularly because he would not think that such problems were very important. The grandfather might be more rule-governed and would probably count that as a virtue.

Distant ancestors: Similarities

The grandparents of today's children need to be assigned a median birth date of 1937 to get them in school in time for the WISC. But what of their parents and grandparents, what of the cohort that was born in 1907 and the even most distant cohort born in 1877? British Raven's data show massive gains beginning with those born in 1877 -- they were actually tested at maturity of course (Raven, Raven, & Court, 1993, Graph G2). World War I military data show that U.S. gains were under way as far back as we can measure (Tuddenham, 1948). The Wechsler-Binet rate of gain (0.3 points per year) entails that the school children of 1900 would have had a mean IQ just under 70. The Raven-Similarities rate (0.5 points per year) yields a mean IQ of 50 (against current norms). Even if the latter accounts for most of the former, it will hardly do to simply say that our ancestors were bad at on-the-spot problems solving.

After all, lateral thinking is an important real-world skill. Only the worst child of the 2200 school children used to norm the WISC-IV would have performed as low as the 1900 average. To make our ancestors that lacking in innovation or problem-solving initiative is to turn them into virtual automatons. Moreover, there is some connection between mental acuity and the ability to learn. Jensen (1981, p. 65) relates an interview with a young man with a Wechsler IQ of 75. Despite the fact that he attended baseball games frequently, he was vague about the rules, did not know how many players were on a team, could not name the teams his home team played, and could not name any of the most famous players.

When Americans attended baseball games a century ago, were almost half of them too dull to follow the game or use a scorecard? My father who was born in 1885 taught me to keep score and spoke as if this was something virtually everyone did when he was a boy. How did Englishmen play cricket in 1900? Taking their mean IQ at face value, most of them would need a minder to position them in the field, tell them when to bat, and tell them when the innings was over.

The solution to this paradox rests on two distinctions that explain in turn the huge and therefore embarrassing gains made on Similarities and Raven's. The first distinction is that between pre-scientific and post-scientific operational thinking. A person who views the world through pre-scientific spectacles thinks in terms of the categories that order perceived objects and functional relationships. When presented with a Similarities-type item such as "what do dogs and rabbits have in common", Americans in 1900 would be likely to say, "You use dogs to hunt rabbits." The correct answer, that they are both mammals, assumes that the important thing about the world is to classify it in terms of the taxonic categories of science. Even if the subject were aware of those categories, the correct answer would seem absurdly trivial. Who cares that they are both mammals? That is the least important thing about them from his point of view. What is important is orientation in space and time, what things are useful, and what things are under one's control, that is, what does one possess.

The hypothesis is that our ancestors found pre-scientific spectacles more comfortable than post-scientific spectacles, that is, pre-scientific spectacles showed them what they considered to be most important about the world. If the everyday world is your cognitive home, it is not natural to detach abstractions and logic and the hypothetical from their concrete referents. It is not that pre-scientific people did not use abstractions: the concept of hunting as distinct from fishing is an abstraction. They would use syllogistic logic all of the time: Basset hounds are good for hunting; that is a Basset hound; that dog would be good at hunting. They would of course use the hypothetical: if I had two dogs rather than only one, I could catch more rabbits. They are not MR in any sense but in terms of current norms they will appear to be so on Similarities. Today we are so familiar with the categories of science and are so imbued with the scientific world-view, that it seem obvious that the most important attribute things have in common is that they are both animate, or mammals, or chemical compounds.

Today we have no difficulty freeing logic from concrete referents and reasoning about purely hypothetical situations. People were not always thus. From interviews Luria conducted with peasants in remote areas of Russia, Hallpike (1979) culls some wonderful examples. The dialogues paraphrased run as follows:

White bears and Novaya Zemlya

Q: All bears are white where there is always snow; in Novaya Zemlya there is always snow; what color are the bears there?
A: I have seen only black bears and I do not talk of what I have not seen.

Q: But what do my words imply?
A; If a person has not been there he can not say anything on the basis of words. If a man was 60 or 80 and had seen a white bear there and told me about it, he could be believed.

Camels and Germany

Q: There are no camels in Germany; B is a city in Germany; are there camels there?
A: I don't know, I have never seen German villages. If B is a large city, there should be camels there.

Q: But what if there are none in all of Germany?
A: Perhaps this is a small village within a large city and there is no room for camels.

The peasants, of course, are entirely correct. They understand the difference between analytic and synthetic propositions: pure logic cannot tell us anything about facts; only experience can. But this will do them no good on Similarities. Beginning with its inception, what counts as a correct answer favors the formal categories over the concrete and by the time of the WISC-R, this is made explicit (Wechsler, 1974, p. 155). I have altered the following to avoid reference to any item still in use. Italics are mine:

"Pertinent general categorizations are give 2 points, while the naming of one or more common properties or functions of a member of a pair (a more concrete problem-solving approach) merits only 1 point. Thus, stating that a pound and a yard are "Both measures" (their general category) earns a higher score than saying "You can measure things with them" (a main function of each). Similarly calling something a "feeling" is less concrete (and worth a higher score) than "the way you feel." Of course, even a relatively concrete approach, to solving the items ..... requires the child to abstract something similar about the members of the pair. Some children are unable to do this, and may respond to each member separately rather than to the pair as a whole .... although such a response is a true statement, it is scored 0 because it does not give a similarity."

The preference for taxonic answers (categories that classify the world and extra credit for the vocabulary of science) is extraordinary and reaches an even higher level in the WISC-IV, where the "one point" for concrete answers is reduced to "merits no or only a partial credit" (Psychological Corporation, 2003, p. 71). This preference dominates the specific scoring directions given item by item. I have used a fictitious item (dogs and rabbits) to illustrate the point, but an item abandoned after the WISC-R will show that I am not exaggerating. "What do liberty and justice have in common?" Two points for either "both are ideals" or "both are moral rights", one point for both are freedoms, nothing for both are what we have in America. The examiner is told that "freedoms" gets 1 point while "free things" gets 0 because the latter is a more concrete response (Wechsler, 1974, p. 159). You are just not supposed to be preoccupied with how we use something or how much good it does you to possess it.

If children use pre-scientific spectacles, they can get no more than half credit on most Similarities items. If the children of 1900 were given a prehistoric version of the WISC-IV, they would have a raw score ceiling of 22. This is at the 25th percentile of contemporary children aged 14. The average child of 1900 would have a raw score of about 11 and be two SDs below the current mean, which translates into an IQ score of 70 against today's norms (Psychological Corporation, 2003, p. 229). This was the "target" score that Full Scale IQ gains implied when projected back to 1900. But recall that Similarities set the more demanding target of a mean IQ of 50. It looks as if the permeation of our minds by the scientific world-view has been supplemented by additional factors and that these have enhanced our ability to solve on-the-spot problems. The latter kind of gain may account for much of the 24 points the post-1947 data signal.

Note how the WISC manuals use the word "pertinent" to justify rewarding taxonic answers. This is just a synonym for claiming that classification is what is important about a pair of things. Imagine a rural child in 1900 being told that the most important thing about dogs and rabbits is a name that applies to both, rather than what you use them for. These comments are not a criticism of the architects of the WISC-IV. Today, when all children are being schooled in a scientific era, the brighter child probably will be the one who uses the categories and vocabulary of science. But what we need not infer is this: that the huge gains on Similarities from one generation to another signal a general lack of intelligence on the part of our ancestors. Their minds were simply not permeated by the scientific world-view.

This solution to our paradox does not imply that massive IQ gains over time are trivial. Aside from the escalation in lateral thinking, they represent nothing less than a liberation of the human mind. The scientific world-view, with its vocabulary, taxonomies, and detachment of logic and the hypothetical from concrete referents, has begun to permeate the minds of post-industrial people. This has paved the way for mass education on the university level and the emergence of an intellectual cadre without whom our present civilization would be inconceivable.

Distant ancestors: Ravens

The above distinction is relevant to Raven's in that the entire test demands detaching logic from a concrete referent. However, when challenged by examination conditions, even subjects unused to this can adapt to varying degrees and I want more precision about the extent to which our distant ancestors were handicapped. Therefore, I will introduce Piaget's distinction between concrete and formal thinking. This distinction is logically and operationally discrete from pre-scientific vs. post-scientific. It is perfectly possible to be competent on the formal level in terms of assessing heaviness and volume and yet find pre-scientific rather than post-scientific categories more significant. However, the two are undoubtedly linked in terms of historical context. People lacking a scientific perspective are more likely to have their intelligence grounded on the concrete level and vice versa.

Today, there is general agreement that Piaget worked with an elite sample of children, put the ages at which children attain the formal mode far too low, and did not allow for the historical context as a determinant of whether children would reach the formal level at all. Shayer, Küchemann, and Wylam (1976) found that in the mid-1970s, only 20 percent of British children aged 14 had attained the formal level; and Shayer & Adhami (2003) argue that they have lost ground since. Using much smaller samples and a broader Piagetian test, Flieller (1999) presents trends for French 14-year olds. He puts 35 percent of them at the formal level in 1967 and 55 percent in 1996.

Clearly, national honor is at stake. But both sets of data support the hypothesis that in 1900 the overwhelming majority of people remained primarily on the concrete level even as adults. Rising to the formal level is highly correlated with years of schooling. The 14-year olds of today have had at least 8 years of schooling with more to come. In the America of 1900, adults had an average of about 7 years of schooling, a median of 6.5 years, and 25 percent had completed 4 years or less (Folger & Nam, 1967)

But what is the relevance of this to Raven's? Andrich and Styles (1994) did a five-year study of the intellectual development of children initially 10, 12, and 14 years of age. From the parent sample of 201 children, Styles (in press) selected 60 children who were representative of the larger group on the basis of age and initial testing. They took both a Piagetian test and items of the Ravens Progressive Matrices (RPM) ranked in order of difficulty. Over a period of four years, they were tested yearly on the former and twice yearly on the latter.

The RPM presents the subject with 60 patterns each of which has a piece missing. Six (or eight) alternatives picture a candidate for the missing piece and the subject must select the one that fits the logic of the matrix design. Five Raven's items were used to illustrate the sections of the test and therefore, were automatic correct answers. Two items were so easy for this group of children that everyone got them correct. The remaining 53 items mapped on to ascending Piagetian competence in ascending order of difficulty. Of these, 20 required the subject to be either on the threshold of the formal level or operating on that level. As Styles says, these items require using either a number of rules or a very complex rule to interpret the matrix pattern; and the subject needs to consider the logical relations between relations, rather than the factual relationship between a proposition and concrete reality.

In other words, if people in 1900 were primarily on the concrete level, we would expect their raw scores to have a ceiling of about 40. John Raven (2000, p. RS3 18) established norms for the US circa 1982 and these show a raw score of 40 at the 38th percentile of 14-year olds. The age curve corresponding to a ceiling of 40 is that of 7.5 year olds. Their median is a score of 20, which is off the bottom of the curve for 14 year olds. Raven's gains between 1900 and 2000 can be as large as you wish without any presumption that most of our ancestors suffered from Mental Retardation. They were quite capable of on-the-spot problem solving in the concrete situations that dominated their lives. The ingenuity of soldiers trying to say alive in the trenches of World War I and the improvisations of mechanics trying to keep the first motorcars running are part of the historical record.

The heritability of basketball

There have been many TV documentaries about identical twins who despite being separated at birth, have had amazingly similar life experiences and grow up to have similar IQs. These studies are interpreted as showing that genetic influences on IQ are potent and environmental influences feeble. Studies of identical twins raised apart are only one component of a wide variety of kinship studies. There have been comparisons of identical and fraternal twins each brought up by their own parents, comparisons of adopted children with natural children, and so forth. Most psychologists agree in the interpretation of these studies. For example, Jensen (1998) concludes that while environment may have some potency at earlier ages, IQ differences between adults are overwhelmingly determined by genetic differences.

And yet, how is this possible? As we have seen, there are massive IQ differences between one generation and another. No one has been selectively breeding human beings for high IQ, so it looks as if genetic differences between the generations would be trivial (we will evidence that assumption a few pages hence). If that is so, environmental factors must cause IQ gains over time and given the size of those gains, those environmental factors must have enormous potency. How can solid evidence show both that environment is feeble (kinship studies) and potent (IQ gains) at the same time?

Jensen (1973a, 1973b) made the paradox all the more acute by using a mathematical model. He plugged in two pieces of data: a 15-point IQ difference between two groups; and a low estimate of the influence of environment on IQ (a correlation between environment and IQ of about 0.33). These implied that for environment to explain the IQ gap between those groups, the environmental gap between them would have to be immense. One group would have to have an average environment so bad as to be worse than 99% of the environments among the other group. Dutch males of 1982 were 20 IQ points above the previous generation. According to Jensen's mathematics, the average environment of the previous generation would have to be worse than 99.99% of the 1982 environments. Jensen assumed that no one could make a case for something apparently so implausible.

Lewontin (1976a, 1976b) tried to solve the paradox. He distinguished the role of genes within groups from the role of genes between groups. He imagined a sack of seed corn with plenty of genetic variation randomly divided into two batches, each of which would therefore be equal for overall genetic quality. Batch A is grown in a uniform and optimal environment, so within that group all height differences at maturity are due to genetic variation; batch B is grown in a uniform environment which lacks enough nitrates, so within that group all height differences are also genetic. However, the difference in average height between the two groups will, of course, be due entirely to the unequal quality of their two environments.

So now we seemed to have a solution. The present generation has some potent environmental advantage absent from the last generation that explains its higher average IQ. Let us call it Factor X. Factor X will simply not register in twin studies. After all, the two members of a twin pair are by definition of the same generation. Since Factor X was completely missing within the last generation, no one benefited from it at all and therefore, it can hardly explain any IQ differences within the last generation. It will not dilute the dominance of genes. Since Factor X is completely uniform within the present generation, everyone benefits from it to the same degree and it cannot explain IQ differences within the present generation. Once again, the dominance of genes will be unchallenged. Therefore, twin studies could show that genes explain 100% of IQ differences within generations and yet, environment might explain 100% of the average IQ difference between generations.

However, Lewontin offers us a poisoned apple. History has not experimented with the last two generations as we might experiment with plants in a laboratory. Consider the kind of factors that might explain massive IQ gains, such as better nutrition, more education, more liberal parenting, the slow spread of the scientific ethos. It is quite unreal to imagine any of these affecting two generations with uniformity. Certainly, everyone was not badly nourished in the last generation, everyone well nourished at present; everyone without secondary school in the last generation, everyone a graduate at present; everyone raised traditionally in the last generation, everyone raised liberally at present; everyone bereft of the scientific ethos in the last generation, everyone permeated with it at present. If the only solution to our paradox is to posit a Factor X or a collection of such, it seems even more baffling than before. We should shut this particular door as follows: A solution is plausible only if it does not posit a Factor X.

Seven years ago, William Dickens of the Brookings Institution, decided to do some modeling of his own and asked my help in applying it to real-world situations (Dickens & Flynn, 2001a; 2001b). We believe that it solves the identical twins paradox without positing a Factor X. It makes an assumption that may seem commonplace but which has profound implications, namely: that those who have an advantage for a particular trait will become matched with superior environments for that trait.

Recall studies of identical twins separated at birth and reared by different families. When they grow up, they are very similar and this is supposed to be due solely to the fact that they have identical genes. But for that to be true, they must not be atypically similar in environment, indeed, the assumption is that they have no more environment in common than randomly selected individuals. To show how unlikely this is, let us look at the life history of a pair of identical twins.

John and Joe are separated at birth. Both live in an area (a place like the state of Indiana) that is basketball-mad. Their identical genes make them both taller and quicker than average to the same degree. John goes to school in one city, plays basketball a bit better on the playground, enjoys it more, practices more than most, catches the eye of the grade-school coach, plays on a team, goes on to play in high school where he gets really professional coaching. Joe goes to school in a city a hundred miles away. However, precisely because his genes are identical to Joe's, precisely because his is taller and quicker than average to exactly the same degree, he is likely to have a very similar life history. After all, this is an area in which no talent for basketball is likely to go unnoticed.

On the other hand, Mark and Allen have identical genes that make them both a bit shorter and stodgier than average. They too are separated and go to different schools. However, they too have similar basketball life histories except in their case, both play very little, develop few skills, and become mainly spectators.

In other words, genetic advantages that may have been quite modest at birth have a huge effect on eventual basketball skills by getting matched with better environments -- and genes thereby get credit for the potency of powerful environmental factors, such as more practice, team play, professional coaching. It is not difficult to apply the analogy to IQ. One child is born with a slightly better brain than another. Which of them will tend to like school, be encouraged, start haunting the library, get into top stream classes, attend university? And if that child has a separated identical twin that has much the same academic history, what will account for their similar adult IQs? Not identical genes alone -- the ability of those identical genes to co-opt environments of similar quality will be the missing piece of the puzzle.

Note that genes have profited from seizing control of a powerful instrument that multiplies causal potency, namely, feedback loops that operate between performance and its environment. A gene-caused performance advantage causes a more-homework-done environment, the latter magnifies the academic performance advantage, which upgrades the environment further by entry into a top stream, which magnifies the performance advantage once again, which gets access to a good-university environment. Since these feedback loops so much influence the fate of individuals throughout their life-histories, the Dickens/Flynn model calls them "individual multipliers".

Understanding how genes gain dominance over environment in kinship studies provides the key to how environment emerges with huge potency between generations. There must be persistent environmental factors that bridge the generations; and those factors must seize control of a powerful instrument that multiplies their causal potency.

The industrial revolution has persisted for 200 years and it affects every aspect of our lives. For example, look at what the industrial revolution did to basketball by the invention of TV. It gave basketball a mass audience, it increased the pay a professional player could expect. Basketball also had the advantage that ghetto blacks without access to playing fields could play it on a small concrete court. Wider and keener participation raised the general skill level, you had to shoot more and more accurately to excel. That higher average performance fed back into play: Those who learned to shoot with either hand became the best -- and then they became the norm -- which meant you had to be able to pass with either hand to excel -- and then that became the norm -- and so forth. Every escalation of the average population performance raised individual performance, which escalated the average performance further, and you get a huge escalation of basketball skills in a single generation.

The advent of TV set into motion a new set of feedback loops that revolutionized the game. To distinguish these society-driven feedback loops from those gene-driven feedback loops that favor one individual over another, Dickens and Flynn call them "the social multiplier". Its essence is that rising average performance becomes a potent causal factor in its own right. The concept applies equally well to IQ gains over time.

The industrial revolution is both the child of the scientific revolution and the parent of the spread of the scientific world-view. It has changed every aspect of our lives. It demands and rewards additional years of education. When a grade-school education became the norm, everyone with middle-class aspirations wanted a high-school diploma. When their efforts made a high-school diploma the norm, everyone began to want a B.A. Economic progress creates new expectations about parents stimulating children, highly paid professional jobs in which we are expected to think for ourselves, more cognitively demanding leisure activities. No one wants to seem deficient as a parent, unsuited for promotion, boring as a companion. Everyone responds to the new milieu by enhancing their performance, which pushes the average higher, so they respond to that new average, which pushes the average higher still. You get a huge escalation of cognitive skills in a single generation.

So now, everything is clear. Within a generation, genetic differences drive feedback processes -- genes use individual multipliers to determine and magnify IQ differences between individuals. Between generations, environmental trends drive feedback processes -- environment uses social multipliers to raise the average IQ over time. Twin studies, despite their evidence for feeble environmental factors, and IQ trends over time, despite their revelation of potent environmental factors, present no paradox. What dominates depends on what seizes control of powerful multipliers. Without the concept of multipliers, all is confusion. There is nothing more certain than this. If twin studies of basketball were done, they would show the separated twins growing up with very similar skills. And Jensen's mathematics would "show" that environment was far too weak to cause massive gains in basketball performance over time. Which is to say we would demonstrate the impossibility of what we know to be true.

Best of all, our solution posits no Factor X. Nothing said assumes that social changes from one time to another were uniform in their impact on individuals. Better education, better parent-child relationships, better work, better leisure, all may raise the quality of the range of environments available from one generation to another. But the magnitude of the differences between quality of environments from best to worse can remain the same. Genetic differences between individuals can continue to match people with better or worse environments to the same degree they always did. Even though slam dunks and passing behind the back become common, being tall and quick will still co-opt a better basketball environment. Even though people in general get better at solving intellectually demanding problems, being born with a bit better brain will still co-opt a better than average school environment. In a word, the operation of social multipliers over time does not abolish the operation of individual multipliers in the life-histories of individuals.

A digression on enhanced genes

There is a piece of unfinished business. The Dickens-Flynn model treats IQ gains over time as if they were overwhelmingly the product of environmental progress. What if enhanced genes over the last century played a dominant role? Then we might have an exaggerated notion of the potency of environment. I do not think anyone would propose that genes have been enhanced by eugenic reproduction. In America, those with more education have had fewer offspring than those with less education throughout either most or all of the 20th century. The current data suggest that reproductive patterns, perhaps reinforced by immigration, may have cost America about one IQ point per generation (Herrnstein & Murray, 1994, cha. 15; Lynn & Van Court, 2004). Lynn (1996) argues that most other nations are similar.

That leaves hybrid vigor. A group's genes can benefit from outbreeding as an antidote to the deleterious effects of inbreeding. The latter is called inbreeding depression (IBD). The classic study of IBD and IQ is that of Schull and Neel (1965, Table 12.19). Jensen (1983, Table 2) and Rushton (1995, Table 9.1) cite their results as indicative of which WISC subtests are most sensitive to IBD. For example, the IQ deficit that inbred children suffer on the Vocabulary subtest is almost three times as great as the deficit they suffer on Coding.

Schull and Neal administered the WISC to 1854 children in Hiroshima: 989 were outbred (their parents had no significant percentage of genes in common); and 865 were inbred, varying from being the issue of first cousin marriages to second cousin marriages. The measure of IBD is expressed in terms the size of the IQ deficit per 10% of inbreeding. The percentage of inbreeding refers to f (the coefficient of inbreeding), which is half of the percentage of genes the parents share. For example: brothers and sisters share half their genes, so their children have f = 25 percent; first cousins share an eighth of their genes, so their children have f = one-sixteenth or 6.25 percent; second cousins have only 1/32 of their genes in common, so their children have f = 1/64 or 1.5625 percent.

Schull and Neel's results show that the impact of inbreeding on Full Scale IQ is very small indeed. To illustrate this, I have used the outbred children to norm the WISC, which means that their performance is put at a standard score of 10 on each subtest with SD = 3. The effect of various percentages of inbreeding can then be expressed as the SS the typical child would get on each subtest.

Table 2:

Effects on WISC subtests and Full Scale IQ from inbreeding. Outbred children (those with no inbreeding) have been used to norm.

	Nil	Schull & NeelEstimates	First Cousins	Second Cousins	WISC sample
	(10%)	(10%)	(6.25%)	(1.56%)	(35.00%)
S.D.		3	3	3	3
Coding	10	9.555	9.722	9.930	6.40
Arithmetic	10	9.495	9.684	9.921	9.54
Block Design	10	9.465	9.666	9.916	6.82
Picture Completion	10	9.410	9.631	9.908	7.66
Comprehension	10	9.395	9.622	9.906	7.80
Object Assembly	10	9.395	9.622	9.906	6.53
Information	10	9.170	9.481	9.870	9.57
Picture Arrangement	10	9.060	9.413	9.853	5.70
Similarities	10	9.005	9.378	9.845	5.23
Vocabulary	10	8.855	9.284	9.821	9.12
Subtest sum	100	92.805	95.503	98.876	74.37
IQ (SD=15)	100	94.963	96.851	99.211	82.57
IBD effect on Subtest Sum		-7.195	-4.497	-1.124	-25.37a
IBD effect on IQ		-5.037	-3.149	-0.789	-17.63a

Schull and Neel's estimates are of the standard score deficit for each 10 percent of inbreeding. These allow us to calculate the magnitude of Inbreeding Deficit for the offspring of first cousins and the offspring of second cousins. The last column gives scores on each subtest for the children who were members of the U.S. WISC standardization sample of 1947-1948. These are derived by scoring them against the 2002 norms.
Conversion of sum of standard scores (SS) into IQs: (1) SS 90-100 = IQ 93-100, so within that range, 1.429 SS = 1 IQ; (2) SS 70-85 = IQ 78-89, so within that range, 1.364 SS = 1 IQ.
These values are hypothetical in the sense that they are those dictated by the hypothesis that IBD on the part of Americans in 1947 was responsible for their IQ deficits compared to Americans in 2002. See text for discussion.
Mazes has been omitted as an 11th subtest as it is not normally used to calculate IQs.
Sources: Schull & Neel (1965), Table 12.19; Rushton (1995), Table 9.1

For hybrid vigor to explain IQ gains from 1947 to 2002, the WISC children of 1947 would have to have an f (inbreeding coefficient) equal to 35 percent. An f of 35 per cent means that the parents of those children would share 70 per cent of their genes in common. Recall that brothers and sisters share only 50 percent (actually a bit more due to assortive mating for IQ). So about three-fifths of the children would have to the fruit of incest and the other two-fifths the offspring of same sex (by definition) identical twins.

An f of 35 percent means that for hybrid vigor to explain even 4 or 5 percent of IQ gains from 1947 to 2002, American children in 1947 had to be inbred to the point that their parents were all analogous to second cousins (1.5625 divided by 35.00 = 4.46%). That is hardly plausible. In addition, Table 2 suggests that hybrid vigor played no role at all. Note the lack of correlation between which subtests are most affected by IBD and which show the largest IQ gains: on the hierarchy of IBD, Coding, Block Design, Object Assembly, Picture Arrangement, and Similarities come in at second, third, sixth, eighth and ninth places respectively.

These results should come as no surprise. The notion that America was a collection of isolated communities that discovered geographical mobility only in the 20th century is an odd reading of history. Americans never did live in small inbred groups. Right from the start, there was a huge influx of migrants who settled in both urban and rural areas. There were huge population shifts during settlement of the West, after the Civil War, and during the World Wars. The growth of mobility has been modest: in 1870, 23 per cent of Americans were living in a state other than the one of their birth; in 1970, the figure was 32 percent (Mosler & Catley, 1998).

Shorthand Abstractions (SHAs) and their enemies

There is no reason to believe IQ gains will go on forever. There may remain few who have not absorbed the scientific worldview to whatever degree they can. The trend toward a higher ratio of adults to children in the home may reverse. Any further drop in the birth rate is likely to be outweighed by more solo-parent homes. There must be some saturation point in our willingness to be challenged by more conceptually demanding leisure activities. The number of professional and managerial jobs may continue to increase, but that may be only enough to compensate for worse childhood environments. Although IQ gains are still robust in America, they have stopped in Scandinavia (Flynn & Weiss, under review; Schneider, 2006). Perhaps their societies are more advanced than ours and their trends will become our trends.

The end of IQ gains over time would not necessarily mean the end of cognitive progress. People have assimilated some of the basic language of science, tend to organize the world using its taxonic categories, and are willing to take the hypothetical seriously. However, those achievements will be of limited value unless people take the next step. Can we capitalize on science to enhance our ability to debate moral and social questions intelligently? No one has written a history assessing whether there has been a rise in critical acumen over the last century or so. There is one pioneering study of which I an aware. Rosenau and Fagan (1997) compare the 1918 debate on women's suffrage with recent debates on women's rights and make an excellent case that the latter shows less contempt for logic and relevance. Note the setting, namely, debate that goes into the Congressional Record. That Congressmen have become less willing to give their colleagues a mindless harangue to read does not necessarily mean that Presidential speeches to a mass audience have improved.

How often they are used I cannot tell. But over the last century and a half, science and philosophy have expanded the language of educated people, particularly those with a university education, by giving them words and phrases that greatly increase their critical acumen. Each of these terms stands for a cluster of interrelated ideas that virtually spell out a method of critical analysis applicable to social and moral issues. I will call them shorthand abstractions (or SHAs) it being understood that they are abstractions with peculiar analytic significance.

I will name ten SHAs followed by the date they entered educated usage (dates all from the Oxford English Dictionary on line), the discipline that invented them, and a case for their virtues. None of them appear in the verbal subtests of the various editions of the WISC or WAIS, that is, Similarities, Information, Comprehension, and Vocabulary. So if we want a test to measure the enhancement of critical acumen over time, we will have to invent a new one.

(1) Market (1776: economics). With Adam Smith, this term altered from the merely concrete (a place where you bought something) to an abstraction (the law of supply and demand). It provokes a deeper analysis of innumerable issues. If the government makes university education free, it will have to budget for more takers. If you pass a minimum wage, employers will replace unskilled worker with machines, which will favor the skilled. If you fix urban rentals below the market price, you will have a shortage of landlords providing rental properties. Just in case you think I have revealed my politics, I think the last a strong argument for state housing.

(2) Percentage (1860: mathematics). It seems incredible that this important SHA made its debut into educated usage less than 150 years ago. Its range is almost infinite. Recently in New Zealand, there was a debate over the introduction of a contraceptive drug that kills some women. It was pointed out that the extra fatalities from the drug amounted to 50 in one million (or 0.005 %) while without it, an extra 1000 women (or 0.100 %) would have fatal abortions or die in childbirth.

(3) Natural selection (1864: biology). This SHA has revolutionized our understanding of the world and our place in it. It has taken the debate about the relative influences of nature and nurture on human behavior out of the realm of speculation and turned it into a science. Whether it can do anything but mischief if transplanted into the social sciences is debatable. It certainly did harm in the 19th century when it was used to develop foolish analogies between biology and society. Rockefeller was acclaimed as the highest form of human being that evolution had produced, a use denounced even by William Graham Sumner, the great "Social Darwinist". I feel it made me more aware that social groups superficially the same were really quite different because of their origins. Black unwed mothers who are forced into that status by the dearth of promising male partners are very different from unwed mothers who choose that status because they genuinely prefer it (Flynn, under review a).

(4) Control group (1875: social science). Recognition that before and after comparisons of how interventions affect people are usually flawed. We introduce an enrichment program in which pre-school children go to a "play center" each day. It is designed to raise the IQ of children at risk of being diagnosed as mentally retarded. Throughout the program we test their IQs to monitor progress. The question arises, what has raised their IQs? The enrichment program, getting out of a dysfunctional home for 6 hours each day, the lunch they had at the play center, the continual exposure to IQ tests. Only a control group selected from the same population and subjected to everything but the enrichment program can suggest an answer.

(5) Random sample (1877: social science). Today, the educated public is much more likely to spot biased sampling than they were a few generations ago. In 1936, the Literary Digest telephone poll showed that Landon was going to beat Roosevelt for President and was widely believed, even though few had telephones except the more affluent.

(6) Naturalistic fallacy (1903: moral philosophy). That one should be wary of arguments from facts to values, for example, an argument that because something is a trend in evolution it provides a worthy goal for human endeavor.

(7) Charisma effect (1922: social science). Recognition that when a technique is applied by a charismatic innovator or disciples fired by zeal, it may be successful for precisely that reason. For example, a new method of teaching mathematics often works until it is used by the mass of teachers for whom it is merely a new thing to try.

(8) Placebo (1938: medicine). The recognition that merely being given something apparently endorsed by authority will often have a salutatory effect for obvious psychological reasons. Without this notion, a rational drugs policy would be overwhelmed by the desperate desire for a cure by those stricken with illness.

(9) Falsifiablility/tautology (1959: philosophy of science). The stipulation that a factual claim is bankrupt (a mere tautology or closed circle of definitions) unless it is testable against evidence. It can be used to explode, for example, a theory of motivation that asserts all human acts are selfish and yet rules out every possible counter-example; or the claim that "real" workers by definition have a revolutionary psychology;or that "real" Christians are always charitable; and so forth.

(10) Tolerance school fallacy (2000: moral philosophy). Somehow my coining this term has not made it into common currency, but no doubt that is merely a matter of time. It underlines the fallacy of concluding that we should respect the good of all because nothing can be shown to be good. This fallacy put a spurious value on ethical skepticism by assuming that it entails tolerance, while the attempt to justify your ideals is labeled suspect as a supposed source of intolerance. It surfaced in William James, was embraced by anthropologists such as Ruth Benedict, and is now propagated by postmodernists who think they have invented it (Flynn, 2000a, C. 9).

Not all critical progress has been due to SHAs. Detaching logic from the concrete has made it a powerful instrument for dealing with the hypothetical and without this, much moral argument could not get off the ground. An apt example arises from arguing against classical racism. To emphasize the role of the hypothetical, the "if"s are in bold. If I gave a white person a pill that darkened his or her skin, would they deserve to be deprived of the vote, etc., etc.? It is those born black that deserve these things. If a white woman took a pill while she was pregnant and her child was born black, would he or she deserve these things? The point, of course, is to force an admission that it is not blackness per se that justifies special treatment but rather, certain traits (permanent immaturity, tainted genes) that are supposedly associated with blackness. When that admission is made, falsifying evidence can be introduced.

There is another set of concepts that superficially resemble SHAs but are actually wolves in SHA's clothing. They may pretend to offer a method of analysis but the method is either mere words or bankrupt in some other way. Often, either by accident or design, they devour SHAs by denigrating them in favor of an ideology of anti-science. I will give a short list to illustrate the point but sadly, it could be much longer.

(1) Contrary to nature. Although this is a special case of the naturalistic fallacy, it deserves mention because of its persistence. By calling something "unnatural", the speaker labels it intrinsically wrong in a way that is supposed to bar investigation of its consequences including beneficial ones. As Russell points out, the New England divines condemned lightning rods as unnatural because they interfere with the method God uses to punish the wicked (bolts of lightning). As Mill points out, nature has no purposes save those we read into it. It does not condemn gays, we do. When Haldane was asked what his study of nature had revealed to him about God's purposes, he replied "an inordinate fondness for beetles".

(2) Intelligent design. This implies a method in the sense that one investigates nature to find signs of order imposed by a rational agent. On one level it is not objectionable. It is a respectable enterprise to update this ancient argument for God's existence by appealing to the theories of modern science (arguing that the conditions for the development of the universe are too delicately balanced to be taken simply as a given). But as an alternative to evolutionary biology, it is entirely counterproductive. Rather than adding to our knowledge of nature, it delights in any present failure of science to explain a phenomenon so it can insert its monotonous refrain, "it was designed that way".

(3) Race or gender science. There is no implication that those who speak of gender science share the viciousness of those who spoke of "Jewish physics", but they are just as muddled. In essence, there is only one method of understanding both the universe and human behavior, one based on theory-formation, prediction, and attempts at falsification by evidence. Not one of its critics has an alternative. The practice of science is flawed in all the ways in which any human endeavor is flawed, that is, the interests and prejudices of scientists color the problems they investigate, how they go about it, the theories they propose, and the evidence they collect. The antidote is better science, not endless and empty assertions that some epistemological issue is at stake.

(4) Realty is a text. This phrase sums up the anti-science of our time. No one is willing to plainly say what it means because its plain meaning is ridiculous: that the world is a blank slate on which we can impose whatever subjective interpretation we like. The evidence against the assertion that all theories are equally explanatory/non-explanatory is refuted every time we turn on a light switch. As for the social sciences, how arbitrary is the choice between two theories of why most prostitutes in Boston were Anglicans (circa 1890)? The ministers who suspected that some subliminal text in their sermons was corrupting young women; or Sumner's observation that most prostitutes were graduates of orphanages and that the orphanages were run by the Anglican Church.

This concept is supposed to foster a method of investigation, but that method comes to no more than classifying the different kinds of texts we impose on the world. At its best, it merely copies the distinctions made by orthodox philosophy of science, which is careful to emphasize that some of these "texts" contain truths attested by evidence (physics) while others do not (aesthetic categories). Usually, it blurs these distinctions and asserts that they are all merely subjective, as if the text of an up-to-date timetable was not more valuable than the text of an out-of-date timetable because it tells the truth about something, namely, when busses actually depart. If all of this sounds absurd, that is not my fault.

Note that the ersatz SHAs are evenly divided between the contributions of obscurantist churches and contemporary academics. The battle over the SHAs is being fought out within the walls of the universities. It is a contest pitting those who attempt to help students understand science and how to use reason to debate moral and social issues against those of whom it may be said that every student who comes within range of their voice is that little bit worse for the experience. There is no case for barring the latter from the university. But much depends on demonstrating the error of their ways.

Practical wisdom

The most important form of cognitive progress is enhanced wisdom because wisdom is knowledge of how to live a good life and, if one is fortunate enough to understand other peoples and their histories as well, it is knowledge of how to make a better world. Someone can have great critical acumen and lack wisdom. The former is an intellectual virtue, while the latter exists only when human beings integrate the intellectual and moral virtues into a functional whole. Wisdom focuses perfected intellect and perfected character on the same object. As Plato shows so wonderfully in the image of the chariot, one cannot know the good without loving the good. It would be like saying one knew what made a great painting beautiful without having any appreciation of its beauty. Or to follow Aristotle, you cannot claim knowledge of the art of good living unless you practice the art. I may know what a good backhand in tennis looks like but if I cannot hit one, I cannot savor the body's wonderful coordination when it is done and do not experience the life of meticulous practice that is a functional part of the performance.

Aristotle spells out the traits of people of practical wisdom. They must value others rather than just themselves or they cannot fully participate in the kind of polis or human society that makes good living possible. In Book III of the Politics, he tells us that society is not merely a market because we can do business with foreigners; it is not a mutual security pact because we can have military alliances with foreigners; it is not intermarriage because one can marry a foreigner; it is not occupying the same territory because the occupants can treat one another as if they were enemies. There must be a cherished way of life woven out of friendships, civic cooperation, and social pursuits, but even this is not enough unless it is crowned by mutual moral concern among fellow citizens. All must count as worthy of justice: none must be denied full and proper participation in the cherished way of life (Aristotle, Politics, iii, 9, 1280a, 26-40 & 1280b, 1-40).

Certain loves make the good life impossible. The Spartan's love of power or victory in war turned the concept of the good person into the caricature of the good soldier (Aristotle, Politics, ii, 9, 1271b, 1-9). The love of money confuses the good person with the successful oligarch and corrupted the Carthaginians even though they sought to level differences of wealth by exporting their poor to other cities (Aristotle, Politics, ii, 11, 1273a, 21-40 & 1273b, 1-24). However, love of the good is not enough. The person of practical wisdom must also have certain moral and intellectual virtues: self-discipline and temperance so they can resist temptations to deviate from the good life; courage so their judgment will not be blinded by fear; prudence or the knowledge of means to ends; understanding of the fact that every parent or teacher creates a social dynamic peculiar to themselves and that no one method will serve all. And above all, sympathetic empathy or the ability to look at the world through the eyes of others and resonate with how they feel (Aristotle, Ethics, iii, 6-12 & vi, 5-11).

I am skeptical that the level of wisdom rose in the 20th century, but my case grows stronger as one goes from person to political to international behavior. On the personal level, the interrelated virtues of temperance and self-discipline have had to cope with new challenges. Avner Offer (2006) provides a brilliant analysis of how technological progress and affluence have contributed to the obesity epidemic. Before pre-prepared foods existed, meals required effort to cook. Before even middle-class children had much money, food intake was restricted to meal times at which adults were present. I never knew a child who left home with money in his or her pocket except to go on an errand or to an occasional film. The notion that I was an independent consumer never occurred to me. Both rich and poor children now spend much time in shopping malls. I would have thought this as bizarre as spending a day in a butcher's shop watching someone chop meat.

Obesity is less common among upper than lower income earners. It is easier for the former to exercise virtue. Even if the mother works, husbands often help. Together they are more likely to overcome fatigue and show the self-discipline necessary to plan and prepare a healthy diet. They are more likely to curb their own appetites and thereby set a better example. They are more likely to supervise what their children eat at school and try to forbid what has immediate appeal. But there are plenty of cases in which affluence and obesity go together. Even if only upper-income earners are considered, we are still struggling to develop new social restraints to do the job of the old. Ministers used to give sermons castigating gluttony and sloth and one had to confess them as sins. No one would give such a sermon today.

Current political behavior shows an unwillingness to accept the restrictions on growth necessary to preserve our habitat. This too is a new challenge and we may have to transcend the wisdom of previous times. The central question is whether or not we have developed an appetite for the endless acquisition of goods that, particularly as it sweeps through China and India, makes self-restraint impossible. That question turns on the evidence for two propositions.

First, whether more and more goods bring more and more happiness. If they do, we are in trouble because it is hard to ask people to settle for less happiness. Here the news is good. Setting aside the poor, reported happiness (are you 'very happy', 'pretty happy', or 'not too happy') did not increase in America over 20 years of growing affluence (Blanchflower, Oswald, & Warr, 1993; Easterlin, 1995). Reported happiness in Japan did not increase between 1958 and 1987 despite a five-fold per capita income increase (Veenhoven, 1993). There is no evidence that the members of very affluent societies are happier than those of somewhat less affluent societies (Oswald, 1997, p. 1819).

Second, do we value having more goods than others, which is to say have we gone from seeking possessions to seeking economic status? If so, an open-ended pursuit of more and more possessions will be difficult to avoid. Everyone cannot have more than the average person and even those above average will know a neighbor who has more. In terms of logic, it is possible to tell people there will still be a hierarchy of wealth even if the average wealth is less, and that they will have just as much chance of attaining a privileged placed on it as they do now. But in terms of psychology, the command "no more" appears to freeze your present position on the hierarchy rather than allow you to aspire to a higher place.

Here the news is bad. Some of the evidence is anecdotal. The emergence of a huge Russian industry that manufactures the appearance of affluence: you can impress others with forged documents proving that you were on an expensive holiday even though you took a modest one. The premium paid for designer labels on goods no better than other goods so that one can flaunt one's affluence. The really disturbing evidence comes from the happiness literature. Evidence that possessions affect reported happiness in relative terms (I have a better house than most other Americans) rather than in absolute terms (Easterlin, 1974; Frey & Stutzer, 1999).

Competition for possessions without a rationally imposed limit engenders pessimism about acceptance of the restraints necessary to avoid ecological disaster. It also creates a downward spiral destructive of civic virtue. Those who wish to maximize their economic status are reluctant to pay taxes and this diminishes state provision of health, education, and security against misfortune. As the quality of state provision declines, it becomes imperative to maximize private wealth for reasons of security even if status seeking is set aside. Even principled socialists will pay fees to jump the queue for medical care and get education for their children in schools that are not a test of physical survival. The more that is true, the more you resent any dollar leaving your pocket in tax, so public provision drops further, so willingness to be taxed drops further, and so forth. Indeed, since only a few can amass the fortune needed to provide self-security, no amount of money you can realistically hope to acquire is enough. Atomized actors forced to provide for their own security is always destructive of concern for others. The quest for absolute security by America and Russia during the arms race allocated resources away from too many goods to enumerate. I do not believe people really want lack of temperance to destroy the quality of their own life or the humanity of the body politic. On one level, everyone prefers Aristotle's polity to a mutual security pact. But wisdom requires that we love the good and love it enough to temper our desires.

On the international level, there is little evidence in favor of enhanced wisdom. Love of war is no longer respectable but the inordinate love of country that goes beyond patriotism to nationalism still cheats us of empathy. America today is no more aware of how to use its preponderant power without alarming other nations than the Kaiser was a century ago (Flynn, in press). Blair is far less aware of how little he influences American policy than Churchill was at Yalta. Enoch Powell and Michael Foot once agreed that no army can do the job of a police force without being de-humanized, at least not in a foreign county whose sociology it does not comprehend. America discovered this in Vietnam and yet cannot remember it today and is surprised to find its troops reacting with atrocities to the frustrations they encounter. Statesmen are no more aware of the limitations of force. They think they can impose political unity where there is no social unity. There are always some statesmen of broader vision, of course. George Bush senior was superior to George Bush junior and Rabin was superior to Sharon. But such have always existed: witness Congressman Reed versus President McKinley over the annexation of the Philippines.

Last words

The 20th century has been the century of rising IQ, the spread of the language and categories of science, the liberation of reason from the concrete, and the enhancement of on-the-spot problem solving. The 21st century will be the battleground of armies for and against the SHAs. It just might culminate in the triumph of critical thinking if universities hold fast to what they are supposed to be all about. Whether we can hope for anything more, I cannot predict. It would be ironic if the industrial revolution, the factory that made IQ gains inevitable, has manufactured an appetite that wisdom cannot tame. But elderly men should be aware of their limitations. Perhaps those more attuned than I am to the present see the future more clearly.

References

Andrich, D., & Styles, I. (1994). Psychometric evidence of intellectual growth spurts in early adolecence. Journal of Early Adolescence, 14, 328-344.

Aristotle. The citations in the text will guide the reader to the source. All editions of Aristotle, no matter what the publisher and date, have the same chapter and page numbers in the margins and those are what are cited herein.

Blair, C., Gamson, D., Thorne, S., & Baker, D. (2005). Rising mean IQ: Cognitive demand of mathematics education for young children, population exposure to formal schooling, and the neurology of the prefrontal cortex. Intelligence, 33, 93-106.

Blanchflower, D. G., Oswald, A. J., & Warr, P. B. (1993). Well-being over time in Britain and the USA. Paper presented at the Economics of Happiness Conference, London School of Economics

Case, R., Demetriou, A., Platsidou, M., & Katz, S. (2001). Integrating concepts and tests of intelligence from the differential and developmental traditions. Intelligence, 29, 307-336.

Dickens, W. T., & Flynn, J. R. (2001a). Great leap forward: A new theory of intelligence. New Scientist, 21 April, 2001, 44-47.

Dickens, W. T., & Flynn, J. R. (2001b). Heritability estimates versus large environmental effects: The IQ paradox resolved. Psychological Review, 108, 346-369.

Easterlin, R. (1974). Does economic growth improve the human lot? Some empirical evidence. In P. A. David & M. W. Reder (eds.), Nations and households in economic growth: essays in honor of Moses Abramovitz. New York & London: Academic Press.

Easterlin, R. (1995). Will raising the incomes of all increase the happiness of all? Journal of Economic Behaviour and Organization, 27, 35-48.

Flieller, A. (1999). Comparison of the development of formal thought in adolescent cohorts aged 10 to 15 years (i967-1996 and 1972-1993). Developmental Psychology, 35, 1048-1058.

Flynn, J. R. (1984). The mean IQ of Americans: Massive gains 1932 to 1978. Psychological Bulletin, 95, 29-51.

Flynn, J. R. (1987). Massive IQ gains in 14 nations: What IQ tests really measure. Psychological Bulletin, 101, 171-191.

Flynn, J. R. (1998). IQ gains over time: Toward finding the causes. In U. Neisser (Ed.), The rising curve: Long-term gains in IQ and related measures (pp. 25 - 66). Washington, DC: American Psychological Association.

Flynn, J. R. (2000a). How to defend humane ideals: substitutes for objectivity. Lincoln, NB: University of Nebraska Press).

Flynn, J. R. (2000b). IQ gains, WISC subtests, and fluid g: g theory and the relevance of Spearman's hypothesis to race (followed by Discussion). In G. R. Bock, J. A. Goode, & K. Webb (eds.), The nature of intelligence (pp. 222-223). Novartis Foundation Symposium 233. New York: Wiley.

Flynn, J. R. (2006). Efeito Flynn: Repensando a inteligncia e seus efeitos [The Flynn Effect: Rethinking intelligence and what affects it]. In C. Flores-Mendoza & R. Colom (Eds.), Introduï¿½io a Psicologia das Diferencas Individuais (pp. 387-411). [Introduction to the psychology of individual differences]. Porto Alegre, Brazil: ArtMed.

Flynn, J. R. (2007) What is intelligence? Beyond the Flynn Effect. Cambridge University Press. Cambridge

Flynn, J. R., & Weiss, L. G. (under review). American IQ gains from 1932 to 2002: The significance of the WISC subtests.

Folger, J. K., & Nam, C. B. (1967). Education of the American population (A 1960 Census Monograph). Washington, DC: U. S. Department of Commerce.

Frey, B., & Stutzer, A. (1999). Happiness, economics, and institutions. Unpublished paper, University of Zurich.

Hallpike, C. R. (1979). The foundations of primitive thought. Oxford: Clarendon Press.

Herrnstein, R. J., & Murray, C. (1994). The bell curve: Intelligence and class in American life. New York: Free Press.

Jensen, A. R. (1973a). Educability and group differences. New York: Harper and Row.

Jensen, A. R. (1973b). Educational differences. London: Methuen.

Jensen, A. R. (1981). Straight talk about mental tests. New York: The Free Press.

Jensen, A. R. (1983). Effects of inbreeding on mental-ability factors. Personality and Individual Differences, 4, 71-87.

Jensen, A. R. (1998). The g factor: The science of mental ability. Westport, CT: Praeger.

Lewontin, R. C. (1976a). Further remarks on race and the genetics of intelligence. In N. J. Block and G. Dworkin (eds.), The IQ controversy (pp. 107-112). New York: Pantheon Books.

Lewontin, R. C. (1976b). Race and intelligence. In N. J. Block and G. Dworkin (eds.), The IQ controversy (pp. 78-92). New York: Pantheon Books.

Lynn, R. (1996). Dysgenics: Genetic deterioration in modern populations. Westport, CT: Praeger.

Lynn, R., & Van Court, M. (2004). New evidence of dysgenic fertility for intelligence in the United States. Intelligence, 32, 193-201.

Mosler, D., & Catley, B. (1998). America and Americans in Australia. Westport, CT: Praeger.

Offer, A. (2006). The challenge of affluence: Self-control and well-being in th United Sates and Britain since 1950. New York: Oxford University Press.

Oswald, A. J. (1997). Happiness and economic performance. The Economic Journal, 107, 1815-1831.

Psychological Corporation (2003). The WISC-IV Technical Manual. San Antonio, TX: The Psychological Corporation

Raven, J. (2000). Raven manual research supplement 3: American norms; nuropsychological applications. Oxford: Oxford Psychologists Press.

Raven, J., Raven, J. C., & Court, J. H. (1993). Manual for Raven's Progressive Matrices and Vocabulary Scales (section 1). Oxford: Oxford Psychologists Press.

Rosenau, J. N., & Fagan, W. M. (1997). A new dynamism in world politics: Increasingly skilled individuals? International Studies Quarterly, 41, 655-686.

Rushton, J. P. (1995). Race, evolution, and behavior: A life perspective. New Brunswick, NJ: Transaction Publishers.

Schneider, D. (2006). Smart as we can get? American Scientist, 94, 311-312.

Schull, W. J., & Neel, J. V. (1965). The effects of inbreeding on Japanese children. New York: Harper & Row.

Shayer, M., Kï¿½chemann, D. E., & Wylam, H. (1976). The distribution of Piagetian stages of thinking in British middle and secondary school children .British Journal of Educational Psychology, 46, 164-173

Shayer, M., & Adhami, M. (2003). Realising the cognitive potential of children 5 -7 with a mathematical focus. International Journal of Educational Research, 39, 743-775.

Shayer, M., & Adhami, M. (in press). Fostering cognitive development through the context of mathematics: Results of the CAME Project. Educational Studies in Mathematics.

Shayer, M., Ginsburg, D., & Coe, R. (in press). 30 Years on - an anti-'Flynn effect'? The Piagetian test Volume & Heaviness norms 1975-2003. British Journal for Educational Psychology

Styles, I. (in press). Linking psychometric and cognitive-developmental frameworks for thinking about intellectual functioning. In J. Raven (Ed.), Contributions to psychological and psychometric theory arising from studies with Raven's Progressive Matrices and Vocabulary Scales.

Tuddenham, R. D. (1948). Soldier intelligence in World Wars I and II. American Psychologist, 3, 54-56.

U.S. Department of Education. Office of Educational Research and Improvement. National Center for Educational Statistics (2000). NAEP 1996 Trends in Academic Progress, NCES 97-985r, by J.R. Campbell, K.E. Voelkl, and P.L. Donahue. Washington, DC.

U.S. Department of Education. Office of Educational Research and Improvement. National Center for Educational Statistics (2001). The Nation's Report Card: Mathematics 2000, NCES 2001-517, by J.S. Braswell, A.D. Lutkus, W.S. Grigg, S.L. Santapau, B. Tay-Lim, and M. Johnson. Washington, DC.

U.S. Department of Education. Institute of Education Sciences. National Center for Educational Statistics (2003). The Nation's Report Card: Reading 2002, NCES 2003-521, by W.S. Grigg, M.C. Daane, Y. Jin, and J. R. Campbell. Washington, DC

Veenhoven, Ruut (1993). Happiness in nations: subjective appreciation of life in 56 nations. Rotterdam: Erasmus University Risbo.

Wechsler, D. (1974). Wechsler Intelligence Scale for Children - Revised. New York: The Psychological Corporation.

Wechsler, D. (1992). Wechsler Intelligence Scale for Children - Third Edition: Manual (Australian Adaptation). San Antonio, TX: The Psychological Corporation.

Wicherts, J. M., Dolan, C. V., Hessen, D. J., Oosterveld, P., van Baal, G. C. M., Boomsma, D. I., & Span, M. M. (2004). Are intelligence tests measurement invariant over time? Investigating the Flynn effect. Intelligence, 32, 509-538.