Clarinettists heighten emotion and extend phrases with their gestures

Title of paper under discussion

Performance Gestures of Musicians: What Structural and Emotional Information Do They Convey?


Bradley W. Vines, Marcelo M. Wanderley, Carol L. Krumhansl, Regina L. Nuzzo and Daniel J. Levitin


5th International Gesture Workshop, GW 2003, Genova, Italy, April 15-17, 2003

Gesture-Based Communication in Human-Computer Interaction, pp 468-478

Link to paper (free access)


How do the expressive gestures of clarinettists contribute to the ability of audiences to perceive musical structure and emotion? Thirty trained musicians either heard, saw, or both heard and saw a performance of Stravinsky’s Three Pieces for Solo Clarinet, movement two. They were asked to judge, in real time, the shape of phrasing (structure) and the tension (emotion) in the music. Analysis of the results suggested that 1) the music’s structure is conveyed as much by vision as it is by sound 2) “gestures elongate the sense of phrasing during a pause in the sound and certain gestures cue the beginning of a new phrase” 3) Vision becomes more important to experiencing musical tension when the music itself is quieter, lower-pitched and less busy.

Above – Igor Stravinsky (1882 – 1971)


How important is the visual experience of a musical performance? Canadian scientists set out to explore the relative importance to an audience of seeing and listening – how critical was each of these senses, they wondered, to relaying the structural and emotional content of the performance.

Previous research, for example into the piano playing of Glenn Gould or into a ballet performance choreographed by Balanchine, had indicated a close relationship between performance gestures and musical architecture. It was also observed that “Gould’s gestural behaviour at the piano changed upon moving into the studio” suggesting that musicians, whether consciously or unconsciously, use gestures especially to communicate musical information to their audiences.

Synthesised music libraries are even getting in on the act by modelling the spectral fluctuations that result from a moving clarinet, so adding realism to their electronically produced sound world.

Although the use in this experiment of a ‘continuous tension scale’ was a tried and tested way of judging the amount of emotion being felt by a participant, the use of a ‘continuous phrasing scale’ was a completely novel way of judging perception of musical structure. Previous studies have only ever asked participants to judge the beginnings and ends of phrases, not to effectively ‘draw the phrase mark’ as the music progresses – a clever innovation by this group of researchers, who presented their paper at an international Gesture conference in Genoa, 2003.


A clarinettist (either Oskar Ramspek or Lars Wouter, it’s not clear which from the paper’s credits) made an audiovisual recording of the second of Stravinsky’s ‘Three Pieces for Solo Clarinet’.

A lack of barlines and time signature, and the fact that it is unaccompanied, made this particular Stravinsky movement ideal for this study – the performer (Oskar or Lars) has a certain freedom in his expressive and structural interpretation and needs to communicate these choices expertly.

Of the thirty musicians who volunteered as ‘audience’ participants, 10 experienced the performance complete as audiovisual, 10 as audio-only and 10 as visual-only.

Each participant was asked to make “a continuous judgement of tension and a continuous judgement of phrasing” by moving a slider (on a midi-controller) “up and down along a track as the performance was presented”:

These are the exact instructions given to each participant:


– Use the full range of the slider to express the TENSION you experience in the performance. Move the slider upward as the tension increases and downward as the tension decreases.


– Use the full range of the slider to express the PHRASING you experience in the performance. Move the slider upward as a phrase is entered and downward as a phrase is exited. The slider should be near the top in the middle of a phrase and near the bottom between phrases.”

Results and discussion

1) Tension data

This graph shows how each of the three groups of participants (audio-only, visual-only, audiovisual) judged their tension levels as the piece progressed:

Of real interest here is the performance patch from about 35 seconds up to about 65 seconds ( the ‘middle section’ of the piece). During these moments the group who could only see the performer judged the ‘music’ as more tense compared with those who could see and hear him. Correspondingly, those who could only hear him gave the lowest tension scores of all.

As the authors point out, during “this section the dynamics [had] decreased dramatically, from mezzo forte to pianissimo, the note density [had] decreased from 16th and 32nd notes to 8th notes, and the pitch height [had] decreased as well.” Bradley Vines and his colleagues deduce that the weight given to visual information when judging musical tension thereby depends on the music’s “loudness, note density, and pitch height.”

2) Phrasing data

Here are the (busy looking) phrasing traces from each of the groups:

Remarkable here is the similarity between all three groups, even audio-only and visual-only – the authors comment that “The magnitude of judgments varied from group to group, but the troughs and peaks which mark the temporal [timing] boundaries of each phrase align consistently.” In summary, phrasing is clearly communicated visually as well as auditorily.

Vines and his colleagues were particularly interested in how phrasing was communicated between sections, so they zoomed in on the data at the moment the first section ends and the second section begins:

During this transition, the authors remind us, “a fermata ends the first section and there is a pause in sound. The clarinettist takes a breath before entering the new section.” The graph demonstrates that “the Visual-only group was slow to recognize the end of the preceding phrase and the Audio-only group was slow to recognize the beginning of the new phrase.”

So why was the visual-only group slow to recognise the end of the first section? We are told that “The performer ended the fermata with a strong movement of his body. His left hand raised from the clarinet and then slowly descended during the silent rest. We hypothesize that gestural movement during the pause extended the sensation of the phrase for the Visual-only group. These subjects did not have the benefit of hearing the note come to a clear and concise end. The sound was important for recognizing the conclusion of the phrase and the visual information served to elongate the experience.” In other words, the left hand gesture kept the phrase going. Intriguingly, the group who saw and heard the performance also judged the first phrase as longer than did the audio-only group, so their judgment was indeed coloured by this gesture.

And why was the audio-only group slow to recognise the beginning of the new phrase? Again, the authors offer an explanation: “Before the new phrase began, the performer made certain movements in anticipation of the sound. He took a deep breath, raised the clarinet and brought it back down in a swooping motion before initiating the sound at the bottom of his arc. Without these visual cues, the Audio-only group could not anticipate the onset of sound. Subjects in the Audio-only group had a sense of phrasing that lagged behind the other two groups who were privy to the movement cues that anticipated the coming sound. The visual information was important for engaging the experience of a new phrase.”

Bradley Vines (above, lead author) reminds us that everyday speech contains countless examples of corresponding phenomena: articulation can often be seen before it’s heard, helping the listener to anticipate that sound and even guess what it will be from the shape of the mouth etc. And in speech, just as in music, “breathing cues are used by people engaged in conversation to help in timing their exchange”.

In conclusion, “This research augments our understanding of multi-modal [sight, sound etc] relations in a musical performance and sheds light upon the important involvement of performance gestures in the perception of music.”


Three Pieces for Solo Clarinet – Stravinsky

(2nd movt starts 2:36)

Anton Maiseyenka – clarinet

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top