New: Download the companion Percussive Notes article (published Nov, 2009)

This session was held at 2pm on Thursday, November 6th in the Austin Convention Center


It is a seductive fallacy to think of music as a purely acoustic phenomenon.  To be sure sound plays an important role, but in the final analysis it is not acoustic information but the perception of that information which defines the musical experience.  Though Yankee Doodle Dandy is easy to identify, it would be unrecognizable when transposed 10 octaves higher. For while it would attract interest from dogs, it would be completely inaudible to humans.  Although scientists could analyze this acoustic information and recognize it contains the exact melodic contour of Yankee Doodle, it would be ridiculous to call this inaudible sound a “musical performance.”  Though extreme, this example demonstrates implicit recognition of the role of the perceptual system in shaping the musical experience.  This article will demonstrate that sound becomes music only within the mind of the listener, an insight which is as much practical as philosophical, holding implications for performers, educators, and audiences alike.

Does gesture length matter?

There has been great debate among percussionists as to whether it is possible to create long and short notes on the marimba.  Well-trained, well-respected musicians routinely disagree on what initially appears a simple question – does the length of the physical gesture (e.g. the up-down motion used to strike a note) have any effect on its duration?   Longtime New York Philharmonic percussionist Elden “Buster” Bailey observed that, “[when] sharp wrist motions are used the only possible results can be sounds of a staccato nature. . . [When] smoother, relaxed wrist motions are used, the player will then be able to feel and project a smoother, more legato-like style” (1963).  Others, such as recent PAS Hall of Fame inductee Leigh Howard Stevens are adamant that gesture length in and of itself is irrelevant, arguing it has “no more to do with [the] duration of bar ring than the sound of a car crashing is dependent on how long a road trip was taken before the accident.” (2004) [1]   

Both views initially appear quite reasonable – much as a longer swing of the bat generally sends the ball farther, it is plausible longer gestures produce longer notes. On the other hand, holding constant a myriad of variables such as the angle of attack, tension on the mallet, placement of mallet on the bar, mallet speed at impact, etc., if energy is transferred from mallet to bar according to the equation: energy=1/2 mass x velocity2, differences in gesture length are irrelevant in that the velocity and mass of the mallet (and attached limb) fully dictate the physics of the impact.   This view (shared by many including Mr. Stevens and myself) is supported by evidence suggesting differences in gesture do not reliably produce differences in acoustic duration (Saoud, 2003).  The following research stems from my interest in understanding the role of physical gesture length, which differs from previous work by distinguishing between its effect on sound (e.g. acoustic information) and the way that sound is perceived

The term perception is frequently misunderstood.  In the popular press, it often carries a connotation of being out-of-touch with reality (e.g. “although flying is perceived as dangerous it is actually relatively safe”). Within the realm of scientific psychological research, the term has a different, specific meaning.  In such contexts, “perception” refers to our internal experience of the external physical world.  It is the second definition that will be used throughout this article.

Efforts to distinguish between the external world and our internal representation of it illustrate that our everyday experience comes not only from physical properties, but also the way in which we experience those properties. That is not to say our internal representation is “untrue,” merely that it reflects the structure of our perceptual system as well as information from the external world.  For example, strictly speaking objects do not possess any property of color – they merely reflect light of varying wavelengths which we happen to experience as green, red, blue, etc.  In a sense, color is an “illusion” in that it exists only in the mind rather than the physical world.  In everyday life it is common and generally harmless to brush aside the seemingly abstract distinction between properties of physical objects and the way those properties are experienced.  However, in certain instances the difference is crucial, and one such instance can be found in the use of physical gestures in musical performances.

The first section of this article describes an experiment examining the effectiveness of gestures used to control note duration.  The second section discusses the nature of the perceptual system, examining the relationship between energy in our physical world (acoustics) and the way we detect and experience that energy (perception).  The third ties the first two together, demonstrating that learning to understand the process of perceiving sound is an invaluable part of a musical education, useful for performers and teachers alike.  While this study focuses on gestures used by marimbists, the conclusions drawn from this research are applicable to performances on other percussion instruments as well.


1) Experiment

Perceptual psychologists often conduct research by isolating specific components of a problem and constructing experiments to test each individually.  Accordingly, the following experiment was designed to independently analyze the auditory and visual consequences of gestures used by percussionists.  To ensure relevance to a wide audience of educators and performers, it was based upon the recordings of acclaimed marimba virtuoso Michael Burritt using gestures, instrument, mallets, technique and a recording environment[2] similar to those found in actual performances.  The first section of this paper contains a summary of the experimental design, methodology, and analysis before concluding with a discussion of its implications with respect to the role of gesture in music.

1.1 Design

Participants. Fifty-nine Northwestern University undergraduate music majors participated in return for extra credit in their music theory or aural skills classes. While participants were all trained musicians, none considered percussion their primary instrument[3].

Stimuli. Michael Burritt was video recorded performing single notes on a variety of pitch levels: E1 (lowest E on a 5 octave marimba, sounding at ~82 Hz), D4 (~587 Hz), and G5 (highest G on a 5 octave marimba, sounding at ~1568 Hz) using both long and short gestures for each of the three pitch levels (for a total of six recordings).  In order to isolate the individual contributions of gestures and the acoustic information on perceived note duration, the videos were split into auditory [long-audio, short-audio] and visual [long-gesture, short-gesture] components (note: the terms long-audio and short-audio refer to the auditory components of strokes produced with long and short gestures).  These components were then mixed and matched such that in addition to the “natural” pairings of ‘long-gesture with long-audio’ and ‘short-gesture with short-audio’ participants saw two hybrid combinations: ‘long-gesture with short-audio’ and ‘short-gesture with long-audio.’ A screenshot taken from one of the videos is shown in Figure 1 (click to download/view actual video).


         1a) Long Gesture

       1b) Short Gesture

Figure 1: A world-renowned percussionist performed a series of notes using long (a) and short (b) gestures. (note: if your browser is unable to display the file, try manually downloaded by right-clicking and selecting "save link as")

Procedure. The purpose of this study was not to examine whether gestures look different, but rather whether they cause notes to sound different.  Therefore, participants were informed at the outset that some stimuli contained auditory and visual components which had been intentionally mismatched (e.g. long-gesture with short-audio), and were asked to rate note duration based on the sound alone. As both types of notes were paired with both types of gestures, this design allows us to isolate the effect of visual information on perception of note duration by examining how the ratings for each note differed depending upon the gesture with which they were paired. 

The experiment took place in a computer lab at the Northwestern University Library.   The stimuli were presented in blocks organized into two conditions: (i) as audio-visual stimuli combining the visual gesture and auditory note, and (ii) as audio-alone. After each stimulus, participants were asked to make a duration rating using a slider with endpoints labeled “Short” and “Long.”  The position of this slider was translated into a numeric value ranging from 0 (short) to 100 (long). 

1.2) Results

The difference of opinion over the effect of gesture stems in part from overlooking the distinction between physical energy (sound) and the way that sound is perceived.  Resolution therefore requires examining the question from both the acoustic and perceptual perspectives. Results are summarized below - for full details please see the technical version of this paper published in Perception (cited in the references).

Acoustical Analysis

As shown in Figure 2, the acoustic profiles of notes produced with long and short gestures were indistinguishable.  Therefore, gesture length had no effect[4] on acoustic duration.  These results are consistent with previous work suggesting it is not possible to produce reliable acoustic differences in duration through the manipulation of gesture length alone (Saoud, 2003).



Figure 2. Acoustic Profiles as depicted by the RMS (root-mean-square) of energy (y-axis) over time (x-axis) show no meaningful differences between notes produced with long (solid blue) and short (dashed red) gestures. 

Perceptual Analysis

 As shown in Figure 3 which averages across ratings for all three pitch levels, there was no perceptual difference[5] in perceived duration (y-axis) based on the auditory component of the videos in either the audio alone (left) or audio-visual (right) conditions.   However, large differences were observed when the same audio example was paired with long (dark red) and short (light blue) gestures in the audio-visual condition.  That the gestures influenced ratings so strongly despite instructions to ignore visual information suggests integration is obligatory; it is no more possible to ignore the gesture than to read the letters D-O-G without understanding they refer to the 4-legged animal commonly known as “man’s best friend.”

                 3a) Audio alone condition

3b) Audio-visual condition

Figure 3. Perceptual Ratings

Ratings did not differ based on the auditory component of the videos (left panel), however they were strongly influenced by the visual component (right panel).  The plot was generated by averaging ratings across all three pitch levels; with error bars represent a 95% confidence interval (margin of error) about the mean.


1.3) Discussion – who was right?

In the end, the naysayers were vindicated – long and short gestures produced notes with acoustically indistinguishable profiles.  Consequently there was no perceptual difference when presented as audio alone, validating Mr. Stevens’ assertion that gesture is inconsequential.  That much is straightforward.  The twist comes in reconciling this finding with results supporting the opposite opinion – that long and short notes were clearly distinguishable when participants were watching as well as listening. Such results corroborate Mr. Bailey’s assertion that changes in gesture do play a role in musical performances.  Coming to terms with these differences requires recognizing the conflict stems not from the answers, but rather the question (or more specifically, the way in which it was asked). 

While seemingly simple, the question “does gesture length matter” is really two questions rolled into one; questions requiring different approaches which in turn yield different answers.  As shown by the results, those who dismiss the role of gesture are clearly correct within the realm of acoustics (Figure 2) whereas those who acknowledge gestures’ role are correct within the realm of perception (Figure 3b), at least as long as audiences are watching as well as listening.  Ultimately, resolution comes not from the results of the experiment itself but rather their interpretation – which domain (acoustical vs. perceptual) is most representative of the musical experience?  Before making such a determination, it is useful to clarify the relationship between energy in our physical world (e.g. acoustic information), and the way that energy is perceived within the mind of the listener. 


2) The Nature of Perception

When an object such as a marimba bar is struck, energy from the mallet causes air molecules to vibrate, a phenomenon we call “sound”.   These air vibrations can be detected by a variety of sources including microphones, other physical objects (e.g. the sympathetic vibration of a timpani head) and the human ear. This entire process can be described rather neatly through physics – the study of physical properties such as mass, energy, distance, and time.  However, understanding the way this sound is experienced inside the mind is more complex and beyond the reach of physics alone.  Such a question falls under the domain of psychophysics – the study of the relationship between energy in the physical world and the way that energy is perceived and experienced.  As a subfield within the study of perception, psychophysics offers a tool for understanding the relationship between acoustics (e.g. sound produced by musical instruments), and the way that acoustic information is perceived and experienced by listeners. 

2.1) Perception and “Truth”

It is tempting yet errant to regard music as a purely acoustic phenomenon.  While acoustics play an important role, our internal experience (a.k.a. perception) of the external world reflects the design of our perceptual system in addition to the energy it is detecting.  Consequently, perception is not in 1-to-1 correspondence with the physical world.  For visual perception, this is clearly illustrated by the Müller-Lyer (Figure 4a) and Ebbinghaus (Figure 4b) illusions.  These examples demonstrate that our experience of properties such as length (4a) and size (4b) is affected by factors other than the physical length/size of the object in question.



Figure 4. Visual Illusions such as the Muller-Lyer lines (top) and Ebbinghaus circles (bottom) demonstrate our internal experience is not always aligned with structure of the physical world

While these visual illusions are purely uni-modal, multi-modal illusions caused by interactions between the auditory and visual systems demonstrate similar principles.  One common example is the well-known “ventriloquist illusion” in which speech appears to emanate from the lips of a mute puppet.  In addition to amusing audiences, it offers insight into another aspect of perception crucial to our understanding of music – the multi-modal nature of the perceptual system.


2.2) Sensory Integration

Cross-modal illusions in which information from one sensory modality influences perception of information in another are similarly fascinating and informative. One of the most compelling, known as the “McGurk effect,” demonstrates that visual lip movements are capable of altering our perception of spoken syllables.  In this illusion, watching a speaker’s lips while listening to his speech results in a categorically different experience than when listening to the speech alone[6].   The explanation for this phenomenon is almost as fascinating as the illusion itself.

The McGurk effect works by exploiting the perceptual interpolation of conflicting auditory and visual information.  On a continuum of speech syllables, the one consciously experienced falls between those presented through the visual (lip movements) and auditory (spoken) modalities – the event which could most plausibly have produced the discrepant sounds and images. It is important to remember that the perceptual system evolved in response to the natural world prior to the ability of technology to present such artificial pairings.  Therefore, ‘averaging’ conflicting sensory information is actually a useful and robust way of resolving such discrepancies, a property of the mind that movie directors have been exploiting successfully for the better part of a century.

As unsuspecting moviegoers, we are generally unaware of the large discrepancy in the spatial location of an actor’s face and voice. While facial images are free to move about onscreen, vocal sounds can originate only from immobile speakers in fixed positions.  However, due to the pre-conscious integration of auditory and visual information (similar to the McGurk effect), voices “sound” as if they are coming from the actor’s lips. That we do not even notice the discrepancy is a testament to the efficiency of our perceptual system.  It is so graceful and elegant that we are generally blissfully unaware of its role in everyday life, including the ways in which it shapes the musical experience.  Yet similar principles of pre-conscious audio-visual binding are what allow skilled marimbists to control audience perception of note duration.


2.3) The Mind of the Listener

Armed with a clear understanding of the distinction between events in the world and our perception of those events, we are now ready to tackle the philosophical question raised by the experiment – where does music exist?  In other words, given that gestures selectively affect perception without altering a sound’s acoustic properties, deciding whether the gesture “changes the music” requires determining which domain (e.g. acoustic sound or the perception of that sound) ultimately defines the musical experience.  Purists may argue that music is defined by sound alone, reasoning that while gestures may alter perception this is merely a gimmick similar to the McGurk effect.  However, as illustrated by the Fletcher-Munson Equal Loudness Curves, the coloring of sound introduced by the perceptual system is actually a fundamental part of the musical experience itself.

Much as our perception of an object’s length or size is in 1:1 correspondence with its physical properties (Figure 4), our perception acoustic information is not in 1:1 correspondence with that acoustic information in the physical world.  All else being equal, a high-pitched tone will sound louder than a low-pitched tone when presented at equal decibel levels.  This is because our hearing is not “flat” but favors high frequencies – fortunate, since these frequencies are crucial for the processing of both speech and musical timbre.  As a special case of human-designed sound, music is a mold cast to the irregular contours of our perceptual system.  Consequently, modern symphony orchestras employ about ten each of low-frequency instruments such as cello and bass, but rarely more than one piccolo.[7]  This bias towards low-frequency instruments is a reflection of (and actually a requirement for) an audience in need of greater emphasis on low frequencies to produce the experience of a “balanced” performance. 

 A purely acoustic view of music ignoring the role of the perceptual system would erroneously conclude the balance is “wrong.” However, as with the earlier example involving Yankee Doodle, it is not the acoustic information but the way that information is perceived which defines the musical experience (Figure 5). Such transformations are entirely independent of visual information – our greater sensitivity to high frequencies is identical whether listening to live or recorded music; with our eyes open or closed.


Figure 5. Perception of Sound is generally “imbalanced” with thousands of times more energy at low frequency levels.  As our hearing system is far more sensitive to high frequencies, the result is the perception of a “balanced” performance. 

Accordingly, sound truly becomes music only within the mind of the listener.  Therefore, factors affecting the processing of sound prior to our conscious experience of it are as much a part of music as the sound itself.  If acoustic information becomes music only when perceived and gestures alter that perception, then by definition gestures shape musical reality by controlling what matters – the experience within the mind of listener.

3) Conclusions and Applications

Michael Burritt (the performer in the videos) was not coached on his gestures in any way – he was merely asked to perform his best “long” and “short” notes on the marimba.   It is worth noting that Professor Burritt is an internationally acclaimed marimba virtuoso – if he was unable to use gesture to acoustically manipulate note length then it is doubtful that it can be done. However, while the gestures were acoustically ineffective, they were (inadvertently) perceptually successful. In essence, while they cannot change the sound of the note, they can change the way the note sounds.  That this is accomplished through sensory integration rather than acoustic manipulation is irrelevant to concert audiences who care only that a performance “sounds right.”  However, understanding this distinction is imperative for performers, who are ultimately evaluated in-part based on their ability to effectively communicate with their audiences.

It is possible (though not desirable) to perform a piece without analyzing its structural properties or exploring its historical significance.  Yet most would agree that a basic understanding of music theory and history are essential for well-rounded musicians.  Similarly, a basic understanding of the perceptual system is an equally important part of any musical education (Figure 6).  While some may argue they have always “known” gestures to be important, it is doubtful that many truly understood the nature of their role.  Furthermore, it is important to remember that others have argued against the role of gesture with equal fervor.  Now, after distinguishing between its acoustic and perceptual effects (section 1) and recognizing it is the latter that defines the musical experience (section 2), we can conclude definitively that gestures are an effective technique for controlling musical note duration.

Figure 6. Practical Application Understanding the role of the perceptual system is an invaluable part of any musical education

The conclusion that visual information plays an important role in music perception is supported by a number of other studies demonstrating visual influence on ratings of musical expressiveness (Davidson, 1993), emotional intent (Dahl, 2007), performance quality (McClaren, 1988), and audience interest (Broughton, 2006).  Consequently, contexts that ignore visual information (e.g. radio broad-casts, CDs, blind auditions, etc) are robbing both the performer and audience of a significant dimension of musical communication.  Given the observed disconnect between sound and its perception, it is important to remember that virtuosos are masters at shaping the musical experience.  Ultimately, this means sidestepping the acoustically impossible to control that which is musically desirable - the experience within the mind of the listener.


I am grateful to Professor Michael Burritt both for graciously volunteering to record the videos used in this experiment and for being an exceptional teacher and mentor.  Additionally Dr. Scott Lipscomb was instrumental to this project, serving as advisor for the Master of Music Thesis on which this paper was based.




Bailey, Elden. (1963). Mental and manual calisthenics for the mallet player. (New York: Adler).

Broughton, Mary, Stevens, Kate, and Malloch, Stephen (2006). Music, movement and marimba: An investigation of the role of movement and gesture in communicating musical expression to an audience.  Proceedings of the 9th International Conference on Music Perception and Cognition.

Dahl, Sofia and Friberg, Anders. (2007).  Visual Perception of Expressiveness in Musicians Body Movements. Music Perception, 45 (5), 433-454.

Davidson, Jane. (1993). Visual perception of performance manner in the movements of solo musicians. Psychology of Music, 21 103-113

McClaren, Cort. (1988).  Focus on research: The Visual Aspect of Solo Marimba Performance.  Percussive Notes, Fall, 54-59

McGurk, Harry and MacDonald, J. (1976). Hearing lips and seeing voices. Nature, 264, 746-748.

Saoud, Erik. (2003). The effect of stroke type on the tone production of the marimba. Percussive Notes, 41 (3), 40-46.

Schutz, Michael and Lipscomb, Scott (2007).  Hearing Gestures, Seeing Music: vision influences perceived tone duration.  Perception 36(6) 888 – 897.

Stevens, Leigh Howard. (1990). Method of movement 2nd ed. Asbury Park, NJ: Keyboard Percussion Publications).

Stevens, Leigh Howard (2004). Personal communication. (email).

Walker, J. T and Scott, K. J. (1981).  Auditory-visual conflicts in the perceived duration of lights, tones, and gaps.  Journal of Experimental Psychology: Human Perception and Performance 7 1327-1339



Further reading on Music Perception and Cognition

This Is Your Brain On Music: The Science of a Human Obsession.  Dan Levitin

Musicophilia: Tales of Music and the Brain.  Oliver Sacks

Sweet Anticipation: Music and the Psychology of Expectation.  David Huron

The Brain, Music, and Ecstasy: How Music Captures our Imagination.  Robert Jourdain

        Statistical Analyses

Acoustic analysis: Acoustic duration (Figure 2) was assessed by selecting “cutoff points” in the range of log (RMS) amplitude (-3, -5).  A t-test examining the time at which each stroke type’s acoustic profile first dropped below a given threshold found no statistically significant difference between notes produced with different gestures [t(122.18) =.0604, p=.952].


Audio alone: Duration ratings in the audio-alone condition were assessed with a 3 (pitch) x 2 (auditory stroke type) repeated-measures ANOVA (Analysis of Variance) with pitch and auditory stroke type as within-participants variables.  While there was a main effect of auditory stroke type (F1,58 =4.811, p = .032). As shown in Table 1, differences between stroke types were small in size (2 points), did not occur in the audio-visual condition, never replicated in subsequent experiments and were similar in size to differences among stroke types intended to be identical (Saoud, 2003)[8].  Therefore this difference is a reflection of natural variability in acoustic duration rather than a “true” difference produced intentionally by the performer (Figure 3a).

Audio visual: Duration ratings in the audio-visual condition were assessed with a 3 (pitch) x 2 (auditory stroke type) x 2 (visual stroke type) repeated-measure ANOVA with pitch, auditory stroke type, and visual stroke type as within-participants variables.  The most important finding was a significant effect of visual stroke type (F1,58 =148.424, p < .0001),  indicating visual information affected duration ratings (Figure 3b).  There was no main effect of auditory stroke type (F1,58 =.218, p = .643), indicating no perceptual difference between the auditory information produced by long and short gestures (Figures 3a and 3b).






Audio alone

Audio visual



Conf Int


Conf Int






Auditory “Long”


+/- 2.80


+/- 2.70

Auditory “Short”


+/- 2.37


+/- 2.62






Visual Long




+/- 2.80

Visual Short




+/- 3.29

 Table 1. Means and Confidence Intervals for key comparisons, showing average ratings as well as 95% confidence intervals for the auditory and visual components of the stimuli.  Differences based on the auditory component were negligible compared to differences based on the visual



[1] On the surface, the quotations address different issues in that the first discusses articulation (legato-staccato) and the second duration (long-short).  However, they are useful in illustrating the general confusion regarding the role of gesture, and ultimately both share a common answer in that the gesture produces different perceptual and acoustic results.

[2] The recital hall within Regenstein Hall, Northwestern University’s primary venue for solo recitals

[3] As a later study replicated this experiment using participants without musical training, these results are not specific to musicians.

[4] Details of statistical tests used for the acoustical analysis are summarized in Appendix 2

[5] Details of statistical tests used for the perceptual analysis are summarized in Appendix 2

[6] This is best demonstrated by viewing the video online at

[7] A quick search of major American orchestras posting their complete instrumentation online indicates an average  of 11 cello, 9 bass, and 1 piccolo positions

[8] After adjusting the acoustic note length data presented by Saoud (2003) to a scale equivalent to that used in this experiment, the standard deviation of note lengths intended to be identical was 1.93.