Yet Another Super-Stimulus Theory of Music
Hypothesis: Music is a super-stimulus for the perception of something. But what?
A super-stimulus is, to quote Wikipedia: “an exaggerated version of a stimulus to which there is an existing response tendency”.
Certain characteristics of music can potentially be explained as super-stimuli.
With these characteristics, we start with the characteristic, and then ask the question: what perceived stimulus could it be the super-stimulus of?
Every perception can be regarded as a calculation of some value as a function of some set of raw perceptual inputs.
For a particular characteristic, we want to ask, what value could we calculate as a function of the raw inputs, where something with that characteristic results in a maximum of that value?
Consider musical scales.
A scale can be described as a finite set of pitch values from which a melody is constructed.
When a person perceives a melody which is constructed from a scale, then the pitch values of the scale will be perceived to occur, and other pitch values, ie those pitch values in the gaps between the notes of the scale, will not occur.
We could hypothesize that some sub-system in the listener’s brain is keeping track of the degree of occurrence of different pitch values in a melody as observed and summed (or integrated) over a certain period of time (ie a period of time similar to the length of a musical item), and that this sub-system additionally calculates constrasts between the degrees of occurrence of the pitch values that occur and pitch values that don’t occur, for pairs of pitch values that are close together. That is, for two pitch values close together, it calculates the difference between the degree of occurrence of those pitch values summed over the period.
For a musical melody, constructed from notes in a scale, there will be a high level of contrast between the perceived degree of occurrence of the pitch values in the scale and the perceived degree of occurrence of the pitch values not in the scale but close to those values that are in the scale.
And, if we wanted to maximize the values of these contrasts for a melody, we would have to construct the melody from a scale.
In other words, melodies constructed from notes in a scale would be a super-stimulus for the perception of this type of contrast.
(A secondary question is what determines how close the notes of the scale are to each other. A simple sum or integration would be optimised by a scale with many notes very close together. But contrasts derived from a convolution determined by a distribution function with a certain width would be optimised by scales of notes separated by distances similar to that width. Additionally we can ask why musical scales typically have steps between notes of different sizes - a convolution as just described can account for the finite size of the scale steps, but it can’t account for the un-evenness of scale steps. Further adjustments to the integral function would be required to find something where the final perceptual value is optimised by an uneven scale. But for the purposes of the present discussion I will not go into these details, which start to require heavier mathematical machinery.)
We can compare musical melodies to speech “melodies” - normal speech has discernible melody, because vowel sounds have pitch values, but speech melodies are not constructed from any scale, and the perceived level of contrast between the degree of occurrence of pitch values close to other pitch values is not going to be so high.
A similar argument can be made for the perception of musical rhythm, and in particular the perception of regular beat. Musical rhythm is normally based on regular beat periods, which are nested in the sense that they form a sequence of beat periods each of which is an integer multiple of the next period in the sequence.
For example, if we consider a musical item with 6/8 time and with 1/16th notes, then the regular beat periods, starting from a single bar length, are:
1 bar = 6 quavers, 1/2 bar = 3 quavers, 1 quaver, 1/2 quaver
The corresponding multiples are:
2, 3, 2.
That is:
1 bar = 2 x 1/2 bar
1/2 bar = 3 x 1 quaver
1 quaver = 2 x 1/2 quaver
In this case the contrast occurs between the occurrence of these regular beat periods, ie 6, 3, 1 and 1/2 quaver, and the non-occurrence of other regular beat periods with values in between those values.
Most modern popular music isn’t in 6/8 time, and indeed for most modern popular music all the multiples are 2, typically with a time signature of 4/4, with 16th notes, ie as follows:
1 bar = 2 x 1/2 bar
1/2 bar = 2 x 1 crotchet
1 crotchet = 2 x 1 quaver
1 quaver = 2 x 1/2 quaver
1/2 quaver = 1/2 quaver
So the sequence of multiples in this case is 2, 2, 2, 2, and the perceived contrast will be between the occurrence of regular beat periods of 8, 4, 2, 1 and 1/2 quavers, and the non-occurrence of other regular beat periods with values in between those values.
(Here, in contrast to the situation with scales, the steps between each value and the next can all be the same size, if we are prepared to regard the logarithm of the multiple as being the “size” of the step. As it happens, the size of a “step” in a scale is actually the logarithm of the ratio of the pitch values, so actually we are dealing with logarithms in both situations.)
As for the case of pitch values, we can compare music and speech, and we can observe that speech has “rhythm”, but speech rhythm is not based on precise regular beat periods, and for speech there will not exist the same strong contrast between beat periods that occur and beat periods that don’t occur.
Based on these observations about musical scales and musical rhythm, I developed a theory that music could be a super-stimulus for the perception of something, where this unknown something was represented by the perception of these contrasts between occurrence and non-occurrence of certain types of values, such as pitch value and regular beat periods.
There is of course more to music than just being constructed from notes in a scale and having a regular time signature.
But I extrapolated my hypothesis, and I conjectured that similar contrasts are perceived in the occurrence and non-occurrence of various other values that could be perceived by the brain (somehow), and that determining the full set of such values would be equivalent to specifying the full criterion for what constitutes the musicality of music.
To fully develop my hypothesis, I would have to determine what all these other values are.
But, ignoring for the moment the problem of determining what these other values are, I had another problem: why would the brain have any interest in perceiving these contrasts?
What would they represent?
In the first instance, I asked myself: what other thing exists that has both melody and rhythm?
As it happens, there does exist one other thing other than music, which is not music, but which is like music. It is like music because it has properties of “melody” and “rhythm”, and that thing is speech.
So I developed the hypothesis that music is a super-stimulus for an aspect of the perception of speech.
And next I asked myself, what was this aspect of speech, the perception of which music was a super-stimulus for?
What was the something that was being perceived when a listener perceives these contrasts that - according to my hypothesis - define the “musicality” of speech?
If my hypothesis stated that music is a super-stimulus for the perception of particular aspects of melody and rhythm of speech, where those aspects define the musicality of music, it followed that the perceived musicality of ordinary speech should have some particular relevance to the listener, ie that the perception of the musicality of speech is telling the listener something important about the speech they are listening to.
For example, the perception of musicality might be giving the listener important information about the state of mind of the speaker. More specifically, maybe it is providing information about the honesty of the speaker, or the sincerity of the speaker, or the emotionality of the speaker, or something else…
Up to a point this hypothesis of mine seemed plausible, even if I could not immediately identify what that information actually was, ie what it represented about the state of mind of the speaker.
But even ignoring the question of determining precisely what it is that the musicality of speech tells the listener about the speaker, or the speaker’s state of mind, there were at least two serious difficulties with my hypothesis.
The first was that, whatever the meaning of the perceived musicality of ordinary speech is, we are not consciously aware of what that meaning might be. Indeed we do not have any conscious awareness that “musicality” is a significant or meaningful property of speech. So whatever that meaning might be, the effect that it has on our perception of speech must be something very subtle.
The second problem was that if musicality implies something about the state of mind of the speaker, then surely there would be at least some situations where a speaker would deliberately “musicalize” their speech in order to generate this perception in the mind of the listener.
For example, if musicality represents sincerity, then whenever a speaker wanted to appear to be more sincere, it would make sense for them to speak in a musical manner. Or similarly if musicality represents the emotionality of the speaker.
Whereas, in practice, a fundamental feature of ordinary speech is that it is never musicalized.
There is a clear separation, in the minds of listeners, between normal speech, which is perceived as information to be communicated by the speaker, and music, which is perceived as a performance.
If you start singing in the middle of a normal conversation, anyone listening to you is going to have difficulty processing your singing as if it was normal speech. The very moment you start singing, your listeners will cease to take you seriously - even though they will continue to understand the meaning of the words and sentences that you are singing.
This separation between speech as communication and music as performance applies even when music contains words embedded within it, ie songs.
In other words, music and speech are different things, and music is not speech, even when speech is contained inside the music.
To put in another way, you can “speechify” music, for example writing lyrics to turn a melody into a song, which is still music, but you cannot “musicalize” speech - adding music to normal conversational speech turns it into something that is no longer conversational speech.
It might seem, with these difficulties, that the super-stimulus hypothesis has to be abandoned.
But there is another possibility.
Speech is the only thing existing now that is not music but at the same time is something like music.
But, music could be a super-stimulus for some other thing, where that other thing used to exist, but, now it doesn’t, except in the super-stimulus form of music.
So, let us assume that such a thing existed.
What could we call it?
We could call it “proto-music”.
“Proto-music” is a general term used, in the field of the scientific and academic study of the evolution and development of music, for any hypothetical ancestor of music which was sufficiently different from music as we know it that we feel the need to give it a different name.
For the purposes of the present analysis, I propose that proto-music existed as a thing, satisfying some function, and then music as we know it developed as a super-stimulus for some aspect of the perception of this proto-music, and then proto-music in its normal non-super-stimulus form disappeared.
So, what was proto-music? What function did it provide? What did the “musicality” of proto-music represent? Why did proto-music in its normal form disappear?
I will attempt to answer these questions (and more) in my next article.
To be continued …