Music competitions are judged on sight more than sound

trophies for the International Joseph Joachim Violin Competition Hannover (image © Helge Krückeberg)

Title of paper under discussion

Sight over sound in the judgment of music performance


Chia-Jung Tsay


Proceedings of the National Academy of Sciences, vol 110, no 36, pp 14580-14585

Link to paper (free access)

author – Chia-Jung Tsay


A series of 6-second clips of international music competition contestants in action – some audio-only, some video-only and some audio plus video – were played to a group of participants who were then asked to predict which of the contestants had won their competition. Although the participants were confident that listening would be the best way to judge the competition winners, it turned out that presenting them with silent ‘video-only’ clips elicited the highest predictive accuracy. Tsay, the study’s author, claims that the results “highlight our natural, and non-conscious dependence on visual cues.”

Experimental aims

The study was divided into seven parts:

Expt 1 – to determine, before any music competition footage was presented, which type of recording – audio-only clips, video-only clips or audio+video clips – each participant thought would prove most useful in helping them to predict the winner from three finalists.

Expt 2 and 3 – to determine which type of recording – audio-only clips, video-only clips or audio+video clips – best helped non-musician participants correctly predict the competition winners

Expt 4 and 5 – to determine which type of recording – audio-only clips, video-only clips or audio+video clips – best helped musician participants correctly predict the competition winners

Expt 6 – to determine whether ‘outlines of motion’ of the competition contestants were enough visual information by which to predict the winners

Expt 7 – to determine which ‘playing qualities’ – eg passion, confidence, creativity – were being discerned from audio-only or video-only clips, and how these perceived qualities correlated with a contestant’s success.

Vladimir Putin at the gala concert of winners of the XV Tchaikovsky International Competition (image from here)

Method, Results and Discussion

Experiment 1

106 participants, musicians and non-musicians alike, were told they were about to take part in a study to predict music competition winners from recorded material, and were asked which kind of footage would they most prefer for the task in hand – audio-only, video-only or audio+video – on the understanding that 1) a correct prediction would attract an $8 bonus, but 2) had they chosen to use the audio+video clip, $2 would be deducted from that bonus.

It turned out that 58.5% would choose an audio-only clip, 14.2% a video-only clip and 27.4% an audio+video clip.

So as expected, most participants had the “intuition that sound is a more revealing channel of information in the domain of music”. Moreover, the participants willing to forgo $2 from a potential bonus were demonstrating their belief that “recordings with both visual and auditory output offer additional and more relevant information that better approximates the conditions under which the original expert decisions were made.”

Experiments 2 and 3 (non-musician participants only)

Non-musician (‘novice’) participants were presented with short clips (6-second recordings) of the top three finalists in each of 10 prestigious international music competitions (including the Van Cliburn International Piano Competition, the International Tchaikovsky Competition, the Queen Elisabeth International Music Competition of Belgium, the International Franz Liszt Piano Competition, the Cleveland International Piano Competition, the Hanover International Violin Competition and the San Marino International Piano Competition).

Some participants (106 of them, expt 1) received audio-only and video-only clips of the performers. Others (185 of them, expt 2) received either audio-only or video-only or audio+video clips of them. Using the clips, each participant was asked to guess the winners of each competition as had been decided by that competition’s panel of judges.

With 3 finalists to chose from in each competition the participants had a 33.3% probability of guessing correctly by chance. In fact, pooling the results of both experiments, an average participant chose the correct winner less than 30% of the time with an audio-only clip, 35% of the time with audio+video….but over 45% of the time with video-only.

Experiment 2 results –

left: over 85% of participants reckoned before the experiment that audio-only (‘Sound’) would prove to be more useful in determining the competition winner, compared with less than 15% reckoning on video-only (‘Video’). But….

right: it turned out that video-only clips elicited a correct prediction over 50% of the time, whereas audio-only clips resulted in less than 30% accuracy

In the words of the author, these findings “suggest that novices [non-musicians] are able to approximate expert judgments, originally made after hours of live performances, with brief, silent video recordings. However, when novices were also given the sound of the performance through the video-plus-sound recordings, they did no better than picking a winner at random.”

In case this was somehow due to the lack of musical training in those participants, experiments 2 and 3 were repeated (as experiments 4 and 5) with musicians instead:

Experiments 4 and 5 (musician participants only)

On repeating experiments 2 and 3 – but this time with musicians as participants (35 in expt 4 and 106 in expt 5) – the findings were just as stark: the audio-only clips elicited correct judgments in under 25% of cases, with the audio/video clips faring not much better at 30%. Once again only video-only clips brought forth above-chance predictive accuracy… 47%.

Tsay comments “ [t]hese results demonstrate how visual information, the information generally deemed as peripheral in the domain of music, can be overweighted when such inclination is neither valued nor recognized. Ironically, this tendency results in our neglect of the most relevant information: the sound of music.”

Experiment 5 results –
Video-only clips elicited a higher % of correct guesses than audio-only or audio+video

Repeats of the experiment using longer clips – up to 60 seconds – produced similar results.

In order to investigate participants’ use of visual information further, Tsay devised two more experiments:

Experiment 6

Judging that “movement and gesture are elements of performance that are primarily visual”, Tsay “distilled” the 6-second video-only clips “to their most basic representation as outlines of motion.” Even these clips, shown to 89 participants, produced a winner-prediction success rate of 49% (well above chance).

In contrast to this dynamic appraisal, participants were not able to reliably guess winners by looking at their photographs. Nor, it turned out, did a contestant’s perceived physical attractiveness correlate with their chance of competition victory.

Sample outline figure used in experiment 6, isolating visual information to basic motion alone. The outlines are the detected regions/silhouettes of movement. After receiving silent performance excerpts of the musicians as rendered in the above example, participants were asked to identify the winners of each competition.

Experiment 7

262 participants (musicians and non-musicians alike), presented with either audio-only or video-only recordings of three competition finalists, were asked to identify in turn the most confident, most creative, most involved, most motivated, most passionate and most unique performer (with each contestant allowed to attract repeat votes).

‘The most passionate’, when identified through a video-only recording, correlated with ‘competition winner’ at a rate significantly higher than chance (60%). But when identified through an audio-only recording this fell to 39%. Other visually-judged qualities in a contestant showed further above-chance correlations with their being the competition winner: ‘most creative’ (45%), ‘most involved’ (53%), ‘most motivated’ (53%), ‘most unique’ (44%).

Perhaps, reasons Tsay, “facets of performance” such as creativity and passion “are visually accessible and readily so” and are therefore understood by musicians and non-musicians alike. “Thus”, she writes, “even novices are able to quickly identify the actual winners among world-class performers, without being encumbered by the sound of music that professional musicians unintentionally and nonconsciously discard.”

University College London (affiliated University of Chia-Jung Tsay)


Tsay writes that “[e]xperts and novices alike privilege visuals above sound, the very information that is explicitly valued and reported as core to decision making in the domain of music. Moreover, when sound is made available along with the video, it led people away from the actual (visually based) competition outcomes.”

She goes on to note that “[p]rofessional musicians and competition judges consciously value sound as central to this domain of performance, yet they arrive at different winners depending on whether visual information is available or not. This finding suggests that visual cues are indeed persuasive and sway judges away from recognizing the best performance that they themselves have, by consensus, defined as dependent on sound. Professional judgment appears to be made with little conscious awareness that visual cues factor so heavily into preferences and decisions.”

Especially interesting is the finding that “both experts and novices appear to be surprised by their own data, and experts in particular reported a severe lack of confidence in their judgment when they were assigned to the video-only recordings, not knowing that their approximations of the actual outcomes would be superior under such constrained conditions.”

Tsay concludes that ”[p]rofessional training may hone musicians’ technical prowess and cultivate their expressive range, but in this last bastion of the realm of sound, it does little to shift our natural and automatic overweighting of visual cues. After all, sound can be neglected while trained “ears” focus on the more salient visual cues. It is unsettling to find—and for musicians not to know—that they themselves relegate the sound of music to the role of noise.”

Van Cliburn (1934 – 2013), after whom the Piano Competition is named


Tchaikovsky Piano Competition Laureates, 1958-1990


4 thoughts on “Music competitions are judged on sight more than sound”

  1. Adrian;

    I read about this study in the Harvard Gazette in 2013. It confirmed for me that audiences listen as much with their eyes as they do their ears. I have long disagreed with the American orchestra player staid-looking approach to stage presence for this very reason. Video of the Berlin Philharmonic performances displays a vastly different approach. The string players there show real involvement, with much movement. Somehow that show of involvement has actively been discouraged among orchestra players here. I experienced it myself as a member of an orchestra here in the states. It is not good for an art form that is in its death throes to stick to this deadpan stage demeanor. It is we, the musicians, who make the music, and yet the conductor is seen as the only one allowed to be physical. That is entirely backward. Video and reports from the past showed that some of the greatest interpreters were very spare in their gesturing, so as not get in the way of the players. I read a review of two concerts in Europe attended by the writer, where he compared the performances of two disparate conductors. The performance under the flamboyant conductor paled, in his opinion, to the one under the minimal gesturing conductor.
    Audiences love to watch the conductor. Ozawa was like a dancer on the podium, beautiful to watch. It did not matter if the actual reading was rather generic and routine. A story I read about an individual from a Native American region brought to a concert for the first time. He was asked how he liked it. His response, “I liked the dance”.

    A conducting student coming to concerts under Levine, a minimalist with his gestures, said that he couldn’t see anything to learn from. Very telling that he could not learn the important lesson of staying out of the way with your gesturing, of which there is way too much, and to a distracting degree.

    I shared that study with musicians and and a few open-minded conductors because a change is needed in the field. I have said that the first American orchestra to Berlin-ize itself will see ticket sales skyrocket. Audiences do listen with their eyes. Orchestra musicians should understand that and divest themselves of the grim-faced stoicism.

    A young student auditioning for the music school where I taught said he wanted to study with me as he had been inspired by my involvement in the music making. A few years before I retired I was stopped by the concert’s guest conductor (on the short list of candidates to replace the MD) outside of the green room as I was leaving the stage (I had stayed to review a few passages after rehearsal had ended). He broke off a conversation with the the principal cellist to catch me before I passed to tell me that he wanted to let me know how much he appreciated my involvement, and that he often looked to me for some feedback. This was the first time that any one in my work had ever remarked on how much I threw of myself into the music making. That conductor said he just couldn’t let it go without saying something to me.

    Conductors want this from their players, and yet we withhold it out of some distorted idea that it distracting to others. Well it wouldn’t be if more did it. The show of playing and moving together to make the music can be reinforcing.

    But nothing will change and classical music in America will continue its march to extinction.

    I retired a few years later. That conductor did not get the gig.


    1. Dear James,

      Great to here from you. I’m afraid it’s been months since I posted, but all for good reason, work is (presently at least) flooding back.
      Your email is as wise as ever, on the ‘spectator vs audience’ paper. UK orchestral musicians are much the same as US ones – they look down on moving musicians. I remember my dad (who was principal clarinet BBC Symphony for over 30 years) questioning an oboist’s ‘off-putting histrionics’ on stage. And there’s that famous anecdote of Richard Strauss feeling the armpits of an overactive young conductor and, on feeling a bit of sweat, declaring “amateur!”

      I think the cynicism comes from the idea that we should be economical of movement, channelling every action into our instruments, which of course makes complete sense in and of itself. But communication, musician-musician and musician-audience, is visual as well as auditory, so I’m with you – to move is to interact.

      I remember my first rehearsal with the European Union Youth Orch, loving the way the German bass players looked like they were swaying in a breeze in time to the music. And a bass colleague, Stacey Watton, is forever getting great audience feedback for his actionful stage presence (and his smile!). It’s a sad end to your email that the young conductor who celebrated your movement wasn’t chosen!

      Best wishes

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top