6.1 Perception analysis

Segment analysis is based on score reading, and it is therefore also theoretical in its nature. A segment analysis might identify a use of orchestration methods that might not be audible. This should however not devaluate an orchestration. Orchestration might be a more existential, conceptual, or formal property of music than an audible one. However, orchestration are often viewed in relation to its effectiveness through perceptual questions as to whether a given listener can follow an element realized in split orchestration, perceive all the segments in a soundscape, experience a specific doubling, etc. These are questions whose answers depend as much on the person being asked as the musicians interpretation of the music. Yet, some of these questions can to some extent be answered at a more general level through scientific knowledge about how sounds are perceived and processed by our ears and brains. Chapter 6 provides a brief review of some of the most important principles of our auditory perception. Subsequently, it will be exemplified how a perception analysis, supported by listening to a performance, can provide a deeper understanding of segment analysis.

6.2 Gestalt principles

Auditory perception concerns, among other things, how we transform acoustic information into perceptual units and organize them into a limited number of auditory streams. An example is when we perceive a melody as such rather than merely experiencing the individual notes of which it is made. This is not unlike how the complex harmonic spectrum of a note played on a musical instrument is perceived as giving rise to a specific pitch and timbre. In these terms, a melody can be regarded as a perceptual stream, just as an accompaniment can be perceived as another stream. A stream can also be a more disparate collection of information that only has a few characteristics in common – for example, the environmental sounds that do not belong to the music, or all the notes that are not played by the soloist.

How a sound picture is divided into streams depends partly on our attention. If we are listening closely to the soloist, anything else might blur into “the rest”, i.e., that which it not the soloist. We can to some extent consciously impose a particular division of the music into streams, but our ability to perceive specific streams will be dominated and limited by some perceptual constraints. The Gestalt school of psychologists formulated these constraints on our perception in terms of a number of the Gestalt principles governing our perception of visual stimuli. Music cognition scholars have subsequently demonstrated how many of the same principles apply in the auditory domain.1

6.3 Proximity and similarity

The Gestalt principles of proximity and similarity state that objects that are similar and close to each other will generally be perceived as a group.

For example

tends to group into

  rather than into 

Transferred to music, these principles state that notes that are close together in time and pitch and that are similar to each other– e.g., have the same timbre, articulation, and/or dynamics – will be perceived as a single stream. At the same time, the principles states that notes who do not resemble each other will tend to be perceived as belonging to separate streams. When the opening of the 2nd movement from Ravel’s Rapsodie Espagnole can be easily heard as two streams, as indicated in the segment analysis in Figure 56, it is due to fact that the similarities between the double bass and bass clarinet parts are greater than the differences (in terms of timbre, range, and rhythmical alignment), while the differences between the double bass/bass clarinet and the cello are greater than the similarities.

Figure 56. Ravel’s Rapsodie Espagnole, 2nd movement. Following the Gestalt principle of similarity, the double bass and bass clarinet parts group perceptually better together with each other than with the cello part. This leads to the formation of two separate segments. 

Based on the principles of proximity and similarity, one can improve the likelihood of a specific perceptual result through design of texture and orchestration. If the perceptual clarity of a four-part, polyphonic texture is to be increased, each of the four parts should (1) consist of relatively small intervals (the closer, the more likely the notes are to be perceived as belonging to the same stream), (2) occupy a distinct register (so that notes in one part do not interfere with notes in another part), and (3) employ their own, distinctive articulation and timbre (since notes that have the same articulation and timbre tend to merge).2

And vice versa, the same four-part texture can collapse into a single stream if, for example, each part (1) employs large skips, (2) crosses the other parts, (3) in split orchestration (so that none of the streams is characterized by a specific timbre) with (4) the same articulation in all four parts.

6.4 Common fate

The principle of common fate states that objects that move together are perceived as belonging to the same group. Common fate generalizes the principle of similarity beyond similarity between objects to similarities in the contexts where these objects occur and the way they behave.

Transferred to music, common fate entails that notes that start at the same time and employs the same rhythm and/or follow the same contour can be perceived as belonging together. This principle thereby supports the reading of doublings by similar motion into subsements and the reading of the subsegments into one segment in figure 57.

Figure 57. Ravel’s Rapsodie Espagnole, 2nd movement, mm. 93-94. The principle of commen fate helps reading 1 into a sum of 1.1 and 1.2 since they are playing the same chord in overlapping octave position with the same beginning and end time. Also  1.1’s doubling by contour is supported by the principle of commen fate. 

6.5 Figure and ground

The Gestalt principle of figure and ground states that some objects are experienced as more dominant or attention-demanding than certain others. If fully visible, figures will be perceived as located in front of a surface, thus adding a depth effect to the image as a whole (Figure 58). 

Figure 58. Due to the Gestalt principle of figure and ground, the black figure is perceived to be located in front of (and not behind) the white background. 

In music, there will be a similar tendency for different elements to adopt different positions in the background-to-foreground hierarchy, depending on how prominent they appear to be to the listener. A stream that has a clear delineation, is rich on information, and avoids redundancy – like many melodies – will appear to be located in front of competing streams that are less sharply defined, static and/or characterized by repetitions, such as is often the case for accompaniment elements. Such location in the musical foreground and background can be supported through or contradicted by the orchestration. A melodic element can end up behind less active elements due to volume, timbre, or register. This happens in some recordings of the following excerpt from Tchaikovsky’s Symphony No. 6 (Figure 59). 

Figure 59. Tchaikovsky’s Symphony No. 6, 2nd movement

Despite their melodic appearance, the horns (segment 2) are often difficult to hear since they are placed in the same register as segments 1 and 3 and timbrally resemble the other brass instruments in segment 3. The upper strings (segment 1), on the other hand, tend to stand out clearly due to their distinct timbre, rhythmic prominence, and registral placement above the other segments.

6.6 Prägnanz

Finally the Gestalt principle of Prägnanz states that simpler and more stable interpretations are preferred. Translated into sound, the Prägnanz principle posits that we generally tend to perceive soundscapes as consisting of relatively few, stable streams. But the strength of the principles depends on how many and how complicated soundscapes are. In a simple soundscape, small differences within the musical material can cause a splitting into more streams. If, on the other hand, there is a potential for many streams to form, our perceptual system will try to reduce the overall number of streams by allowing for greater variety within each stream. 

Figure 60 contains 10 different instrumentel parts. Yet, it is exceptionally difficult to discern each of them merely from listening (not in the least because of the high number of voice crossings). 

Figure 60. Stravinsky’s Symphony of Psalms, 1st movement. A complex texture of 10 parts where a perceptually simple interpretation with merely two streams tends to dominate.

So instead of hearing 10 individual parts, a simpler perceptual interpretation with two overall streams tends to dominate: One group that play slow and legato, and another one that plays fast and staccato (Figure 61).

Figure 61. A simple perceptual interpretation with two overall streams. Excerpt from Stravinsky’s Symphony of Psalms, 1st movement.

The principle of Prägnanz works as a meta-principle which modifies the relative balance of the other principles and changes the criteria for what counts as similar and different.

6.7 Listening and perception

As they often point towards different organizations of the music, the Gestalt principles are in constant competition with each other. Especially the similarity principle can be hard to predict the outcome of since it works concurrently on a wide range of musical parameters. When used on timbre, pitch, loudness, rhythm, etc., it raises questions about the relative strength of each of these parameters. Are similarities in timbre or pitch more important for our perception than similarities in loudness or articulation? On furtunatly this question cannot be answered outside a specific context – if at all. The different parameters interfere with each another, and there is no generally applicable relative strength between them. Furthermore, perceptual affordances also change according to how we listen. When we direct our attention towards something (like the horns in Figure 63), then other details will become less accessible to us (like the text in the tenor voice). We cannot focus equally on all different elements at the same time. It will either be one or the other that dominates our experience. Instead, we can direct our attention to investigate specific elements, or we can adopt a more holistic listening strategy sacrificing certain details. The principles promotes many different listening strategies, but it also excludes some. Understanding these principles is an important step in the direction of being able to anticipate the auditory outcome resulting from performing a specific score. But without experience to guide the anticipation, they are very hard to applay.

On the other hand, while actually experiencing the realiszation of a score, it quite ofte seems rather easy to explain the outcome through the principles.

6.8 Perception and segment analysis

Experiences with auditory perception are deeply integrated into most theories about music, and they can be used to explain why music can be understood in terms of textural elements, why doublings might be perceived as with one voice, and why notes near each other in pitch and time tend to be experienced as a melody. However, this does not mean that our knowledge on auditory perception is fully integrated into our theories about music. We can still talk about a melodic line even though it might be too slow or jumping too much to be perceived as such, and we can organize music using time signatures that cannot be heard. Segment analysis, like texture analysis, is an analysis method based on reading the score and not on hearing the music, and as a concequence hereoff, segment analysis may sometimes point to segments that are not audible. Perception analysis, as it is used here, on the other hand, centers around how music is perceived when listening to it. We can therefore use perception analysis to examine the more theoretical segment analysis. A perception analysis that do not agree with the segment analysis is an invitation to investigate the difference between the orchestration as it appears in the score and as it might be heard in performance.

What follows is three examples in which perception analysis points towards a different understanding of the orchestration than the one suggested by the segment analysis.

6.8.1 Stravinsky: Symphonie of Psalms 1st movement, Rehearsal number 7

Figure 62. Segment analysis of Stravinsky’s Symphony of Psalms, 1st movement. 

The segment analysis in Figure 62 looks quite simple at first glance. There are five different parts that make up four segments. Segment 1 is in the front of the soundscape (mf) while the other three segments are aligned behind it (p) with sub-segments 2A.a and 2A.b (both orchestrated with double-reed instruments) in a constant crossing over each other and therefore taken as one segment. Yet, perception analysis suggests something else, since the outer notes of the English horn (F3 and F4) coincide with the simultaneous notes in segments 1 and 2B (Figure 63).

Figure 63. Stravinsky’s Symphony of Psalms, 1st movement. Perception analysis indicates how the outer notes in the English horn coincide with notes in Segments 1 and 2B, even though they belong to different segments in the segment analysis.

Therefore, a perception analysis might suggest the alternative set of streams included in Figure 64.

Figure 64. Stravinsky’s Symphony of Psalms, 1st movement. This segment analysis reflects the reorganization suggested by the perception analysis.

The example is also interesting because of the 1st oboe playing two octaves above the others in segment 1. The lack of registral proximity and timbral similarity points towards separating segment 1into two streams in a perception analysis (Figure 65).

Figure 65. Stravinsky’s Symphony of Psalms, 1st movement. Yet another reorganization splitting segment 1A from segment 1B because the Gestalt principle of proximity posits that the registral displacement would results in dissociable segments.

6.8.2 Sciarrino: 4 adagi, 1st movement, mm. 4-6

A segment analysis of mm. 4-6 from the first movement of Sciarrino’s 4 adagi is included in Figure 66.

Figure 66. Segment analysis of Sciarrino’s 4 Adagi, 1st movement, mm. 4-6.

Each of the five coordinate segments is unique in that

  • A is the opening motive, that combines horn and trombone in a split orchestration;
  • B is a pulsating roll in the bass drum that is completed by the double basses;
  • C is a descending gesture dominated by flageolets, glissando, and an extremely high register;
  • D is a sequence of overlapping sonorities in the middle register, transforming each other; 
  • E is a small, orchestrated crescendo dominated by breathy (white noise) sounds.

The last three segments (C, D, E) can, however, be understood together as a single stream in a perception analysis. This is due to some overall similarities, which are not articulated well in the overview above. The three segments all focus on continuos changing sound rather than pitch. The flutes are producing breathy sounds, the strings a glissando on a cluster, and the winds multiphonics. Furthermore, the instruments emerge from nothing (i.e., dal niente) which makes it very hard to identify the specific instruments in use. A perceptual analysis of this passage could therefore organize it into three streams (Figure 67).

Figure 67. Perception analysis of Sciarrino’s 4 Adagi, 1st movement, as it might be perceived as three streams.

6.8.3 Messiaen: Turangalîla Symphonie, 1st movement, Rehearsal number 12

Figure 68. Segment analysis of Messiaen’s Turangalîla Symphony, 1st movement, Rehearsal mark 12.

In the Messiaen excerpt, a segment analysis organizes the many different parts into 7 segments, with segment 1 comprising the sub-segments 1.1 and 1.2 that are in the same register and complement each other rhythmically (Figure 68). Sub-segment 1.2 is further divided into the two sub-sub-segments, 1.2.b and 1.2.a, with the former being a partial doubling of the latter. This interpretation is open to discussion – not least because of the substantial differences in dynamics. It is, however, supported by the fact that 1.2.a and 1.2.b are both designed as a loop of five eighth notes tied together via the principles of common fate and similarity. The same logic could be used to tie segments 1 and 2 together, but these two segments inhabit distinct registers with several octaves between them. They are thus considered separate segments in this analysis. 

Notice also that  23A and 3B consist of several instruments that are unified by a single rhythm into three different chord progessions, whereas the rest of the segments, 3C4 and 5, are are uphold by one instrument each. 

A perception analysis, however, might say something completely different. Contrary to the segment analysis (Figure 68) where the score can be read without regard to the unfolding of the music over time, the perception analysis is at the mercy of the “bandwidth” of our attention. When the music spreads over many octaves, with many segments in many different orchestral colors, and with very different dynamics, the listener tends to lose track of the music in its entirety. A holistic listening strategy may allow the most dominant elements to come through, embedded in the “noise” of the rest. An analytical listening, on the other hand, may explore the soundscape, switching from element to element, realizing that it takes time to detect the hidden elements. This explorative process may blur the listener’s sense of how many segments are present at any given point in time. 

6.9 Summary

Auditory perception organizes music into a limited number of perceptual units, called streams. Amongst other things, this process is governed by the Gestalt principles of similarity, proximity, figure and ground, with an overall preference for organizing soundscapes into relatively few streams (according to the principle of Prägnanz). The number and nature of auditory streams is ambiguous and often vary with our attention level and current listening strategy. Knowledge on perception can be used to anticipate the outcome of a musical score or to explain a given auditory experience. Since segment analysis is based on score reading, it might not line up with a perception analysis, allowing an investigation of the difference between the segment analysis and the actual outcome of the score.

However, it is important to remember that a perception analysis relies not only on the listener and the listening strategy but also, when supported by listening to a recording, reflects the musician’s interpretation, the acoustics of the concert hall, and how it has been recorded.


1) The groundwork for the gestalt principle was done in the beginning of 20th century by Wertheimer, Köhler and Koffka. For an extended introduction to auditory perception see Bregman’s Auditory Scene Analysis (1990). –>

2) Also see Huron’s (2016) Voice Leading: The Science Behind a Musical Art (MIT Press). –>

Scroll to Top