Title of paper under discussion
Sight over sound in the judgment of music performance
Proceedings of the National Academy of Sciences, vol 110, no 36, pp 14580-14585
Link to paper (free access)
A series of 6-second clips of international music competition contestants in action – some audio-only, some video-only and some audio plus video – were played to a group of participants who were then asked to predict which of the contestants had won their competition. Although the participants were confident that listening would be the best way to judge the competition winners, it turned out that presenting them with silent ‘video-only’ clips elicited the highest predictive accuracy. Tsay, the study’s author, claims that the results “highlight our natural, and non-conscious dependence on visual cues.”
The study was divided into seven parts:
Expt 1 – to determine, before any music competition footage was presented, which type of recording – audio-only clips, video-only clips or audio+video clips – each participant thought would prove most useful in helping them to predict the winner from three finalists.
Expt 2 and 3 – to determine which type of recording – audio-only clips, video-only clips or audio+video clips – best helped non-musician participants correctly predict the competition winners
Expt 4 and 5 – to determine which type of recording – audio-only clips, video-only clips or audio+video clips – best helped musician participants correctly predict the competition winners
Expt 6 – to determine whether ‘outlines of motion’ of the competition contestants were enough visual information by which to predict the winners
Expt 7 – to determine which ‘playing qualities’ – eg passion, confidence, creativity – were being discerned from audio-only or video-only clips, and how these perceived qualities correlated with a contestant’s success.
Method, Results and Discussion
106 participants, musicians and non-musicians alike, were told they were about to take part in a study to predict music competition winners from recorded material, and were asked which kind of footage would they most prefer for the task in hand – audio-only, video-only or audio+video – on the understanding that 1) a correct prediction would attract an $8 bonus, but 2) had they chosen to use the audio+video clip, $2 would be deducted from that bonus.
It turned out that 58.5% would choose an audio-only clip, 14.2% a video-only clip and 27.4% an audio+video clip.
So as expected, most participants had the “intuition that sound is a more revealing channel of information in the domain of music”. Moreover, the participants willing to forgo $2 from a potential bonus were demonstrating their belief that “recordings with both visual and auditory output offer additional and more relevant information that better approximates the conditions under which the original expert decisions were made.”
Experiments 2 and 3 (non-musician participants only)
Non-musician (‘novice’) participants were presented with short clips (6-second recordings) of the top three finalists in each of 10 prestigious international music competitions (including the Van Cliburn International Piano Competition, the International Tchaikovsky Competition, the Queen Elisabeth International Music Competition of Belgium, the International Franz Liszt Piano Competition, the Cleveland International Piano Competition, the Hanover International Violin Competition and the San Marino International Piano Competition).
Some participants (106 of them, expt 1) received audio-only and video-only clips of the performers. Others (185 of them, expt 2) received either audio-only or video-only or audio+video clips of them. Using the clips, each participant was asked to guess the winners of each competition as had been decided by that competition’s panel of judges.
With 3 finalists to chose from in each competition the participants had a 33.3% probability of guessing correctly by chance. In fact, pooling the results of both experiments, an average participant chose the correct winner less than 30% of the time with an audio-only clip, 35% of the time with audio+video….but over 45% of the time with video-only.
In the words of the author, these findings “suggest that novices [non-musicians] are able to approximate expert judgments, originally made after hours of live performances, with brief, silent video recordings. However, when novices were also given the sound of the performance through the video-plus-sound recordings, they did no better than picking a winner at random.”
In case this was somehow due to the lack of musical training in those participants, experiments 2 and 3 were repeated (as experiments 4 and 5) with musicians instead:
Experiments 4 and 5 (musician participants only)
On repeating experiments 2 and 3 – but this time with musicians as participants (35 in expt 4 and 106 in expt 5) – the findings were just as stark: the audio-only clips elicited correct judgments in under 25% of cases, with the audio/video clips faring not much better at 30%. Once again only video-only clips brought forth above-chance predictive accuracy…..at 47%.
Tsay comments “ [t]hese results demonstrate how visual information, the information generally deemed as peripheral in the domain of music, can be overweighted when such inclination is neither valued nor recognized. Ironically, this tendency results in our neglect of the most relevant information: the sound of music.”
Repeats of the experiment using longer clips – up to 60 seconds – produced similar results.
In order to investigate participants’ use of visual information further, Tsay devised two more experiments:
Judging that “movement and gesture are elements of performance that are primarily visual”, Tsay “distilled” the 6-second video-only clips “to their most basic representation as outlines of motion.” Even these clips, shown to 89 participants, produced a winner-prediction success rate of 49% (well above chance).
In contrast to this dynamic appraisal, participants were not able to reliably guess winners by looking at their photographs. Nor, it turned out, did a contestant’s perceived physical attractiveness correlate with their chance of competition victory.
262 participants (musicians and non-musicians alike), presented with either audio-only or video-only recordings of three competition finalists, were asked to identify in turn the most confident, most creative, most involved, most motivated, most passionate and most unique performer (with each contestant allowed to attract repeat votes).
‘The most passionate’, when identified through a video-only recording, correlated with ‘competition winner’ at a rate significantly higher than chance (60%). But when identified through an audio-only recording this fell to 39%. Other visually-judged qualities in a contestant showed further above-chance correlations with their being the competition winner: ‘most creative’ (45%), ‘most involved’ (53%), ‘most motivated’ (53%), ‘most unique’ (44%).
Perhaps, reasons Tsay, “facets of performance” such as creativity and passion “are visually accessible and readily so” and are therefore understood by musicians and non-musicians alike. “Thus”, she writes, “even novices are able to quickly identify the actual winners among world-class performers, without being encumbered by the sound of music that professional musicians unintentionally and nonconsciously discard.”
Tsay writes that “[e]xperts and novices alike privilege visuals above sound, the very information that is explicitly valued and reported as core to decision making in the domain of music. Moreover, when sound is made available along with the video, it led people away from the actual (visually based) competition outcomes.”
She goes on to note that “[p]rofessional musicians and competition judges consciously value sound as central to this domain of performance, yet they arrive at different winners depending on whether visual information is available or not. This finding suggests that visual cues are indeed persuasive and sway judges away from recognizing the best performance that they themselves have, by consensus, defined as dependent on sound. Professional judgment appears to be made with little conscious awareness that visual cues factor so heavily into preferences and decisions.”
Especially interesting is the finding that “both experts and novices appear to be surprised by their own data, and experts in particular reported a severe lack of confidence in their judgment when they were assigned to the video-only recordings, not knowing that their approximations of the actual outcomes would be superior under such constrained conditions.”
Tsay concludes that ”[p]rofessional training may hone musicians’ technical prowess and cultivate their expressive range, but in this last bastion of the realm of sound, it does little to shift our natural and automatic overweighting of visual cues. After all, sound can be neglected while trained “ears” focus on the more salient visual cues. It is unsettling to find—and for musicians not to know—that they themselves relegate the sound of music to the role of noise.”