<< back to homepage

Does learning X makes you better at Y?

Summary

Training X makes you better at X. If there are generalizable skills learned from training X, they, most likely, can be gained from other kinds of training.

When measured, the effects of pedagogical interventions vanish within a few years. Most studies I have read do not account for this fact.

Difficulty of an encountered problem is not only determined by the underlying combinatorial structure. Context, superficial similarities and notation/interface have to be taken into account.

People are fluent in making analogies and generalizations only of particular things in specific contexts.

Why math?

Once a neighbour have asked me to prepare her son for a math exam. I was excited! I am constantly collecting teaching materialsMath starter pack, for example. and enjoy math. I agreed. Shortly after doubts had popped in my head: Why learn math, if you don’t like it!? Can pupils transfer gained skills to other areas?

Typing “why learn math” in search engines unveils myriad of praises to the generous influence of the Queen of Science. Most of them are based on anecdotes, intuition, and preferences. What’s even worse is that doing mathematics is compared to doing nothing.

But some reasons to learn math seem to be baked by evidence. Take this one: “[Evans et al.2015] indicates that children who know math can recruit certain brain regions more reliably, and have higher gray matter volume in those regions, than those who perform more poorly in math. The brain regions involved in higher math skills in high-performing children were associated with various cognitive tasks involving visual attention and decision-making. While correlation may not imply causation, this study indicates that the same brain regions that help you do math are recruited in decision-making and attentional processes.”10 Reasons Why Math Is Important In Life This is purely correctional. Ironically this study also finds that connectivity and grey matter volume in a few areas at age 8 is a better predictor of future math scores, than math scores at the age 8…Pedagogy is creepy. Benezet1935 showed that students who hadn't had math lessons in first years of school performed the same as the control group. As far as I know nobody tried to replicate that.

I am not saying that learning even basic mathematics is not beneficial, but that I could’t find any convincing evidence suggesting that learning mathematics gives more benefits outside mathematics than learning other subjects.

From now on I won’t focus on the alleged benefits of learning mathematics. Instead I generalize the matter to transfer of learning. It happens whenever learning X influences performance of Y. Main goal of this essay is to examine to what extent transfer of learning happens and wander off to topics nearby (mostly the influence of representation).

Close transfer

Before we look at what happens with transfer between two completely different fields (far transfer), let’s examine what happens when X and Y are similar. This sounds like an easier task.

Wait, but what does similar mean exactly? Well… nobody knows. I will use this term like others - intuitively. But bare with me, we are not completely lost. Take a look at those layers on which two areas/problems might be similar:

Gwern Branwen speculates that knowing a problem is solvable (context) might make the problem easier to solve.

Subjective context
How does the subject feel (for example they may be more confident after training)? What is the subject's intelligence? What about the subject's metacognitive skills? In what place and time the subject is?

Superficial similarities
How neatly objects and processes map to each other? For example horizontal movement of a ball → vertical movement of a triangle, is way easier than: movement of a ball → changing color of a triangle. Are moves intuitive?

Interface and notation
Is a problem presented orally, diagrammatically, with a physical model, an app, etc? How many symbols have to be used to express helpful things? Does notation have unnecessary features or distractions?

Underlying combinatorial structure (a.k.a. schema, abstraction)

It's easier to learn how to drive a truck, when you are already comfortable behind the car's steering wheel. In trivial cases like this all layers are similar. It’s hard to argue that such close transfers do not exist.

But what happens when we change few layers, but leave the underlying combinatorial structure untouched? Will the subjects realize that they are asked to solve the very same problem (at abstraction layer)? And, more importantly, apply insights they already have?

To check this, tricky types of puzzles are used. As you probably expect - while their appearances differ, the underlying combinatorial structure remains the same. You can check one of the hardest isomorphous puzzles here in an interactive game. Another, less formal, example is the Duncker’s Radiation Problem.

Analogy is spotted not that often. To Duncker’s radiation problem I found the highest rates in Kubricht2016: Ntotal = 126, Nper condition = 42; 50% (verbal presentation), 55% (verbal + diagram) and 83% (animation). Lower ratios (33%), with the same puzzle, in Gick and Holyoak1980 and 1983.

This might be a bit surprising. After all we use analogies on a daily basis. Studies from 2000s confirm this intuition.See Trench and Minervino2015 for a short review. In natural settings far analogies are used almost as often as close ones. It is a matter of debate how it links with the fact that it's harder to find solutions to analogous puzzles which differ more. And what does it imply in general.Christensen and Schunn2007 counted that during weekly meetings professional design engineers used 11.3 analogies per hour.

Interface and notation

Despite that analogous puzzles share the same problem space, they don't share difficulty levels by definition.

For example: On average it takes 16x more time to solve the hardest variation of the Tower of Hanoi, compared to the easiest one (Kotovsky et al.1985). It translates to the difference between 1.8 minute and 30 minutes. Why?

In this particular example a huge difference does not come from solving actual problem. The difficulty lies in figuring out the rules. Why is it so hard to get the feeling of legal moves?

In both versions we have three rules with similar word count. But in the classical version of Hanoi with physical pegs (the easiest version) illegal moves are evident. You simply can’t put a bigger disk on a smaller one. However in alternative versions illegal moves require mental checking of two (not one) pegs. There is no graphical interpretation of rules.Easy to spot mistakes are one of the features of a good notation, as Terrence Tao points out. It’s frustrating! On top of that moves in alternative versions are not consistent with real world knowledge - analogs of disks change sizes in discrete jumps, unlike any other object we encounter daily.It’s peculiar how children sometimes react to situations not consistent with their intuitions. In Clement and Richard1997 we read: “A substantial proportion of the children aged 6-7 years refused to move a disk from left to right (or right to left) if there was a smaller disk in the middle (the disks did not have to be put on a peg but only laid down in an empty place or on another disk): a number of children thought that going from left to right implied going through the middle. In this case, the three places, lined up in a row, are interpreted as a road: a move is conceived of as a move in the plane and not as a move in the third dimension, as it should be. This is consistent with the fact that at this age, "moving" has the prototypical meaning of moving on the ground (see Bernicot1981).” I found Bernicot1985 only in french. I couldn’t verify it.

Bret Victor points to analogous problems in current coding practice. Programmers have to constantly simulate how a code will be executed. Bret finds it unnecessary. When we code, he suggests, we should instantly see how changes look like. It’s being implemented slowly by a few, including Khan Academy.Redefining the Introduction to Computer Science See immediate connection in action.

If representations are so important, who is working on them? Here are at least few of them: Bret Victor (Kill Math and other projects), Henry Sagerman (Visualizing Mathematics with 3D Printing), Andy Matuschak and Michael Nielsen embed space repetition in essays (they describe their views on tools in How can we develop transformative tools for thought?), 3Blue1Brown works with animations, Vi Hart works with VR, and various authors in Explorable Explanations - a long list of interactive apps explaining myriads of topics curated by Nicky Case.Have I missed somebody? Feel free to comment this document.

Teaching new tools might be helpful (like a software for visualizing logic described in Cullen2018).

I want to end up with a paradox. Or rather with another sign that things are complicated. Although notation and interface is important, some great figures in mathematics and physics claimed to not think in symbols.The most popular testimony is perhaps a fragment of Einstein’s letter published in Hadamard1954. “The words or the language, as they are written or spoken, do not seem to play any role in my mechanism of thought. The psychical entities [I emboldened] which seem to serve as elements in thought are certain signs and more or less clear images which can be "voluntarily" reproduced and combined. There is, of course, a certain connection between those elements and relevant logical concepts. It is also clear that the desire to arrive finally at logically connected concepts is the emotional basis of this rather vague play with the above-mentioned elements. But taken from a psychological viewpoint, this combinatory play seems to be the essential feature in productive thought - before there is any connection with logical construction in words or other kinds of signs which can be communicated to others.” There is a group of researchers who want to figure this out. In Amalric and Dehaene2016 we read in abstract: “High-level mathematical reasoning rests on a set of brain areas that do not overlap with the classical left-hemisphere regions involved in language processing or verbal semantics.” Maybe good notation allows us to learn quickly, but at some stage it becomes secondary. Or maybe experts’ intuitions are still influenced by notation and way of presenting problem on some level.

Far transfer

Linking latin to academic achievement, or even to general performance, probably seems archaic and absurd nowadays. Well, not for everybody (Benefits of Latin). Please note that I am not arguing that learning latin is for sure worthless. It is beyond scope of this essay. I just doubt that learning latin is particularly better than learning other topics. Thus it is not a strong argument to learn X for the benefits it gives outside X. Just pick what you enjoy.

Nevertheless analogous statements concerning programming, math, music, chess are more catchy to some.

On a far transfer level only general skills like cognitive strategies, intuitions, learning to learn, dynamics of being in a group etc. can be transferred. Otherwise one would have to argue that, let’s say, there is a structure in chess waiting to reveal insights about the real word.

In recent meta-analysis, focused on far transfer in music, chess and working memory training (Sala and Gobet2017) we read “results show small to moderate effects. However, the effect sizes are inversely related to the quality of the experimental design (e.g., presence of active control groups).” This general observation, that effects of far transfer are far less pronounced (if existent at all) when we compare to active group (not null group), is consistent with what I have found in Scherer et al.2019. The effect size from the treated (active) groups is g = 0.16 (small), while untreated is g = 0.65 (large).Another issue is the persistence of successful pedagogical interventions in general. Bailey et al.2017 gives a short overview of this problem and lists three features of successful intervention. 1) Teaching new skill, which wouldn’t be taught otherwise, 2) doing it in the right moment to avoid imminent risks (e.g. teen drinking), and in 3) sustaining the environment. The last point reminds me about the word "timeful". This term was coined by Andy Matuschak to describe stuff which is designed to last. For example books aren't timeful, since we forget them quickly. Spaced repetition software, like Anki, is timeful - gains from using these apps are planned to accracate year by year.

Endnote

It’s worth noting that while training X perhaps won’t influence our general performance it may help to prevent decline. As far as I know more research is needed in this domain (Sala and Gobet2017 points to Karbach and Verhaeghen2014 and Melby-Lervåg and Hulme2016).

Transfer of learning has huge implications for education. In the drafty Appendix I collect how teachers deal with this concept.

References

Amalric, Marie, and Stanislas Dehaene. ‘Origins of the Brain Networks for Advanced Mathematics in Expert Mathematicians’. Proceedings of the National Academy of Sciences 113, no. 18 (3 May 2016): 4909–17. https://doi.org/10.1073/pnas.1603205113.

Bailey, Drew, Greg J. Duncan, Candice L. Odgers, and Winnie Yu. ‘Persistence and Fadeout in the Impacts of Child and Adolescent Interventions’. Journal of Research on Educational Effectiveness 10, no. 1 (2017): 7–39. https://doi.org/10.1080/19345747.2016.1232459.

Benezet, L P. ‘The Teaching of Arithmetic I, II, III: The Story of an Experiment’, n.d., 19.

Christensen, Bo T., and Christian D. Schunn. ‘The Relationship of Analogical Distance to Analogical Function and Preinventive Structure: The Case of Engineering Design’. Memory & Cognition 35, no. 1 (January 2007): 29–38. https://doi.org/10.3758/BF03195939.

Clement, Evelyne, and Jean-François Richard. ‘Knowledge of Domain Effects in Problem Representation: The Case of Tower of Hanoi Isomorphs’. Thinking & Reasoning 3, no. 2 (April 1997): 133–57. https://doi.org/10.1080/135467897394392.

Cullen, Simon, Judith Fan, Eva van der Brugge, and Adam Elga. ‘Improving Analytical Reasoning and Argument Understanding: A Quasi-Experimental Field Study of Argument Visualization’. Npj Science of Learning 3, no. 1 (December 2018): 21. https://doi.org/10.1038/s41539-018-0038-5.

Evans, Tanya M., John Kochalka, Tricia J. Ngoon, Sarah S. Wu, Shaozheng Qin, Christian Battista, and Vinod Menon. ‘Brain Structural Integrity and Intrinsic Functional Connectivity Forecast 6 Year Longitudinal Growth in Children’s Numerical Abilities’. The Journal of Neuroscience 35, no. 33 (19 August 2015): 11743–50. https://doi.org/10.1523/JNEUROSCI.0216-15.2015.

Gick, Mary, and Keith Holyoak. ‘Analogical Problem Solving’, n.d.

Gick, Mary L., and Keith J. Holyoak. ‘Schema Induction and Analogical Transfer’. Cognitive Psychology 15, no. 1 (January 1983): 1–38. https://doi.org/10.1016/0010-0285(83)90002-6.

Karbach, Julia, and Paul Verhaeghen. ‘Making Working Memory Work: A Meta-Analysis of Executive Control and Working Memory Training in Younger and Older Adults’. Psychological Science 25, no. 11 (November 2014): 2027–37. https://doi.org/10.1177/0956797614548725.

Kotovsky, K, J.R Hayes, and H.A Simon. ‘Why Are Some Problems Hard? Evidence from Tower of Hanoi’. Cognitive Psychology 17, no. 2 (April 1985): 248–94. https://doi.org/10.1016/0010-0285(85)90009-X.

Melby-Lervåg, Monica, Thomas S. Redick, and Charles Hulme. ‘Working Memory Training Does Not Improve Performance on Measures of Intelligence or Other Measures of “Far Transfer”: Evidence From a Meta-Analytic Review’. Perspectives on Psychological Science 11, no. 4 (1 July 2016): 512–34. https://doi.org/10.1177/1745691616635612.

Sala, Giovanni, and Fernand Gobet. ‘Does Far Transfer Exist? Negative Evidence From Chess, Music, and Working Memory Training’. Current Directions in Psychological Science 26, no. 6 (December 2017): 515–20. https://doi.org/10.1177/0963721417712760.

Scherer, Ronny, Fazilat Siddiq, and Bárbara Sánchez Viveros. ‘The Cognitive Benefits of Learning Computer Programming: A Meta-Analysis of Transfer Effects.’ Journal of Educational Psychology 111, no. 5 (July 2019): 764–92. https://doi.org/10.1037/edu0000314.

Trench, Máximo, and Ricardo A. Minervino. ‘The Role of Surface Similarity in Analogical Retrieval: Bridging the Gap Between the Naturalistic and the Experimental Traditions’. Cognitive Science 39, no. 6 (2015): 1292–1319. https://doi.org/10.1111/cogs.12201.