Written by Travis M. Moore
Last edited 25-Jun-2020
While we now have a complete system in place to record, quantify and interpret evoked potentials (EPs), we need to make a few tweaks based on the system we're measuring (i.e., the brain). We need to consider what we know about neurophysiology and decide how that information can be used to elicit the best responses in the least amount of time. In other words, we need to determine the optimal stimulus parameters to use based on our knowledge of neurophysiology.
Recall the most important criterion to meet in order to record anything at the scalp is that a large group of neurons must fire in synchrony (i.e., at the same time). You already know how neural firing begins in the auditory system: with sound! Sound pressure is transferred to the inner ear, which causes the basilar membrane (BM) to move at a place matching the frequency of the sound, and that movement opens the mechanical gates in the stereocilia of the inner hair cells (IHCs) at that place. So if one frequency, a sine wave, moves the basilar membrane at one location and makes a small number of IHCs fire, it stands to reason that many frequencies will cause a lot of BM movement and therefore many IHCs to fire. Those IHCs tell the auditory nerve fibers to depolarize, and just like that we have activated a large population of neurons!
The clinic is a fast-paced environment, so we want a stimulus that contains many frequencies, and is brief in duration. (Remember, we need to collect potentially thousands of sweeps for signal averaging.) We know from the lesson on the uncertainty principle that the shorter a sound in time, the more frequencies it contains. That really works to our advantage here! A good choice would be a very brief click. You'll typically see a 100 μs click used in clinic. That's quite brief.
Ok, so the click contains many frequencies, which will cause a large population of neurons to fire. It's also brief, so we can collect a large number of sweeps quickly. But the brevity of the click, specifically it's rapid onset, buys us something else as well. Because the click begins almost instantaneously, that means all those frequencies are presented at the same time. That helps meet the criterion of neural synchrony. The faster and closer together in time all those frequencies are presented, the closer the responding neurons will fire in time.
Now let's consider the level of the stimulus. What would be best for eliciting a synchronous response from a large group of neurons? Well, we know from psychophysics that a small area of the BM moves in response to soft sounds, and a larger area moves in response to more intense sounds. It seems like presenting the stimulus at a high level will help to activate the most IHCs.
So far so good. We have identified that we need to present high-intensity, brief duration clicks. But how many clicks should we present each second? One-hundred microseconds fit into 1 second a million times. Should we present 1 million clicks per second? If something sounds too good to be true, it usually is. We cannot present back-to-back clicks and collect millions of sweeps in a matter of seconds. There are two reasons for this. First, we need to make sure our stimuli do not overlap in time, so that we are truly presenting one stimulus at a time. That requires some space between clicks. If we need a specific-duration prestimulus baseline, that means we need to add at least that duration between the stimuli.
The second reason we cannot present 1 million clicks per second is due to the limit of how fast neurons can fire. Recall that while the neuron is not at resting potential, it cannot depolarize. While the process of repolarization is quick, it still takes time. Evidence from the squirrel monkey shows that single auditory nerve fibers maximally fire around 250 times per second.1 But nerve fibers work together, and can achieve phase locking (firing once per cycle) up to approximately 4000 Hz.2 Four thousand is a long way from 1 million, so we're going to have to slow way down. It also turns out the slower we present stimuli (up to a point), the better the waveform. So there is a tradeoff between efficiency and accuracy. Typically you will see presentation rates around 5 - 30 stimuli per second.
Imagine you have recorded a perfect wave V on your ABR with a clear peak that ends in a single point. You label that perfect peak in the software. What have you really identified? What does that peak mean? Furthermore, why is that peak a peak and not a trough? It turns out that there are significant limitations to what the waveforms we obtain with EP tests can tell us.
What does it mean when a wave is positive or negative? Is one better than the other? More stable? For the most part, the polarity of a wave is based on the orientation of the dipole being recorded and does not tell us much. There are exceptions, of course (such as the polarity of vestibular evoked myogenic potentials), but for now let's just assume the polarity is arbitrary. Essentially, if the dipole is oriented with the negative end toward the electrode, we see a negativity. If the positive end is pointed toward the electrode, we see a positivity. So don't read too much into which direction the wave is pointing.
Why do we find and label the apex (tip) of a wave? Is there something special there? Does it mark the exact millisecond that something happened in the brain? It would be nice if that were the case, but unfortunately, it's not. The fact is, there's nothing inherently informative about the apex of a wave; it's just a handy place for consistent labelling. But how can that be? Doesn't the change in voltage show that a large group of neurons fired at the same time? Yes, it's true that neural activity occurred where the voltage changed, but things are not as precise as being able to point to the apex of a wave and state that it represents anything specific. Let's think about why.
Recall that the neural firing in the brain occurs as oscillations. In other words, groups of neurons fire in different rhythms rather than in isolation. You can imagine this like several brain networks producing electrical rhythms at different speeds. A more precise way of saying that is the oscillations of neural networks occur at various frequencies. The electrical activity of the brain is like a sea of sine waves. If you're wondering what networks are firing if your patient is sitting still and you haven't presented a stimulus, take a quick look at your patient. Is he still alive? Breathing? Hearing? Seeing? Daydreaming? All of those processes (and more) are constantly at work in the central nervous system. The brain is never silent. You already know this because you know there is electrical noise in the recording that you had to account for (e.g., using signal averaging). That noise is random groups of neurons firing.
Now consider what would happen if one of those processes (sine waves of neural firing) in the brain lined up in a way to interfere with the brain process we are trying to measure. This could happen randomly, due to anatomy, etc. What happens when two sine waves interact? Additive synthesis, or constructive and destructive interference. So if the peaks and troughs of the neural firing we want to measure mix with other sine waves, the actual peaks could be anywhere by the time they reach the scalp. So to reiterate, the peaks and troughs don't really inherently tell us much, but they do represent that neural activity occurred around that time. We just label the apex because it's easy to find, consistent, and easy to label. So all the talk about latency and how precise electrophysiology is in the time domain is true, but we're not 100% sure when in the wave the specific neural activity we are measuring happened. This is kind of a weird concept, but it's based on principles you know. Take a look at Figure 1 so you can visualize the situation.