Chapter 2 Part 2: Speech Planning & Errors II

Posted on May 4, 2025

Chapter 2: Speech Planning & Errors (Part 2)

Overview

This chapter examines the complex process of speech perception, exploring how listeners decode acoustic signals into meaningful language. We analyze the challenges posed by coarticulation, compare competing theories of speech perception (Motor Theory vs. General Auditory Approach), and investigate key phenomena like categorical perception and the McGurk effect that reveal how our brain processes speech.

Learning Goals

After studying this chapter, you should be able to:

Explain the acoustic properties of speech (frequency, amplitude, formants) and how they’re visualized in spectrograms
Describe how coarticulation creates challenges for speech perception and the evidence supporting its role
Analyze the core claims and evidence for Motor Theory of speech perception
Evaluate the major critiques and limitations of Motor Theory
Compare Motor Theory with the General Auditory Approach and their respective strengths/weaknesses
Explain key speech perception phenomena including categorical perception, McGurk effect, and duplex perception
Understand how top-down and bottom-up processing interact in speech perception (FLMP model)
Apply knowledge of speech perception principles to real-world listening challenges

📖 Required Reading

Traxler (1st ed.), Chapter 2, pp. 37–51 — Speech Production & Comprehension (continue production focus from Week 2).

Chapter 2 Lecture Notes: Speech Planning & Errors (Part 2)

Key Focus: Speech perception challenges, acoustic properties, and competing theories

I. Introduction: The Challenge of Speech Perception

Core Task: Listeners decode complex sound waves (from speakers’ articulators) to recover intended meaning.
Myth vs. Reality:
- ❌ Myth: “Hear words → understand” (simple).
- ✅ Reality: Sound waves are “wickedly complex”; mental work is required to link sound to meaning.
Chapter Goal: Explain why speech perception is tricky and review major theories of how listeners succeed.

II. Foundational: Acoustic Properties of Speech

1. Speech as an Acoustic Signal

Speech is vibrations in air, analyzed by two properties:

Frequency: Cycles per second (Hz) → pitch (high Hz = high pitch, e.g., whistle; low Hz = low pitch, e.g., foghorn).
Amplitude: Pressure difference in sound waves (dB) → loudness (high dB = loud; low dB = quiet).

2. Sound Spectrograms: Visualizing Speech

A graph that shows speech’s acoustic structure:

Y-axis: Frequency (Hz) → high = top, low = bottom.
X-axis: Time (seconds) → speech progression.
Dark patches: High sound energy; light patches: low energy.

Example: Phrase to catch pink salmon

Real speech: Irregular dark/light patterns.
Simplified artificial speech (Liberman et al., 1952): Stripped-down patterns.
Key Finding: Simplified spectrograms still let listeners recognize the phrase—core phonological content lies in basic frequency-time patterns.

3. Formants and Formant Transitions

Two critical acoustic features (Liberman et al.):

Formants: Steady, stable vibrations (e.g., /a/ in catch—vowels).
Formant Transitions: Short, rapid frequency changes (e.g., /t/ in to—consonants).
Fricatives (e.g., /s/ in salmon): Random energy (like radio static) across frequencies.

III. Key Challenge: Coarticulation Effects

1. What is Coarticulation?

From production: Overlap of gestures for one phoneme with others (efficiency). For perception, this creates two hurdles:

No Clear Phoneme Boundaries: Speech has no “spaces” between phonemes—acoustic info overlaps.
Variable Acoustic Signals: Same phoneme (e.g., /d/) sounds different with adjacent vowels.

2. Example: /d/ in /di/ vs. /du/

/di/ (“de”): /d/’s formant transitions rise in frequency.
/du/ (“doo”): /d/’s formant transitions fall in frequency.
Perceptual Constancy: Listeners hear both as /d/ (context of the vowel helps). Isolated transitions sound like different “chirps.”

3. Evidence for Coarticulation’s Role

A. Silent Center Vowels

Edit speech to remove a vowel’s middle (e.g., erase /æ/ center in bag).
Result: Listeners still identify the vowel (e.g., hear bag, not big).
Why? Vowel info is embedded in preceding (/b/) and following (/g/) consonants (coarticulation fills gaps).

B. Cross-Spliced Stimuli

Split words into onsets (e.g., /jo/ in jog) and codas (e.g., /b/ in job), then splice onset of one word onto coda of another.
Result: Listeners misperceive coda to match onset. Example: /jo/ (from jog) + /b/ (from job) → hear jog.
Key: Early onset cues guide perception more than later coda cues.

IV. Major Theory 1: Motor Theory of Speech Perception

1. Core Claim (Liberman et al.)

Gestures (not sounds) are the unit of perception:

Listeners use acoustic signals to infer the articulatory gestures that made them (e.g., “tongue tip tapped alveolar ridge”).
Link gestures to the speaker’s phonological plan (words/syllables).

2. Why Gestures?

Acoustic signals for a phoneme vary (e.g., /d/ in /di/ vs. /du/), but gestures are consistent (e.g., “tap tongue tip”).
Perceived similarities align with articulatory (not acoustic) similarities.

3. Phenomena Explained by Motor Theory

A. Duplex Perception

Setup: Split /da/ or /ga/ into transition (formant change) and base (vowel). Play transition to one ear, base to the other.
Result: Listeners hear two things:
1. A “chirp” (general auditory processing of transition).
2. The full syllable (/da/ or /ga/) (speech module integrates transition + base).
Interpretation: Speech module overrides general hearing—proves a “special” speech processor.

B. Categorical Perception

Definition: Continuous acoustic stimuli are perceived as fixed categories.
Example: /b/ vs. /p/ (Voicing Contrast):
- /b/ (voiced): Vocal folds vibrate immediately after lip closure.
- /p/ (unvoiced): Vocal folds vibrate after a delay (Voice Onset Time, VOT).
- Listeners perceive:
  - VOT < 20 ms → /b/; VOT > 20 ms → /p/.
- Key: Can’t distinguish VOT within a category (5 ms vs. 15 ms = both /b/) but can across categories (15 ms vs. 25 ms = /b/ vs. /p/).
Interpretation: Listeners map acoustic signals to discrete gestures (not continuous sounds).

C. The McGurk Effect

Definition: Visual (lip movements) + auditory (sound) cues clash → perception compromises.
Classic Experiment:
- Visual: Video of someone saying /ga/.
- Auditory: Audio of someone saying /ba/.
- Perception: Listeners hear /da/.
Properties: Robust (even with warnings); extends to touch (feeling lips form /ba/ + hearing /ga/ → /da/).
Interpretation: Speech system uses all cues (visual/touch/auditory) to infer gestures.

4. Neural Basis: Mirror Neurons

Mirror Neurons (Monkeys): Fire when:
1. Monkey performs a gesture (e.g., grasp food).
2. Monkey watches others do the gesture.
3. Monkey hears gesture-related sounds.
Human Link: Monkey area F5 = human Broca’s area (language region).
- Hypothesis: Human mirror neurons fire when producing or hearing speech—bridges production and perception.
Evidence:
- fMRI: Listening to /ba/ activates lip motor regions; /ta/ activates tongue-tip regions.
- TMS: Disrupting motor cortex impairs phoneme discrimination (/ba/ vs. /da/).

V. Critiques of Motor Theory

1. Anomalous TMS Results

TMS shows leg muscle activity when listening to speech—unrelated to articulation. Undermines “gesture-specific” claim.

2. Nonhuman Speech Perception

Japanese quail/chinchillas (no human articulators) show:
- Categorical perception (/b/ vs. /p/).
- Compensation for coarticulation.
Problem: Motor theory requires human gestures—these animals have none.

3. Perception-Production Dissociations

Aphasic Patients: Some understand speech but can’t produce it (Broca’s aphasia); others produce fluent speech but can’t understand (Wernicke’s aphasia).
Extreme Case: Patients with bilateral motor cortex damage (no articulation) still understand speech.
Problem: Motor theory predicts motor damage = perception damage—this is not true.

4. Variable Gestures for Same Phoneme

Same phoneme (/t/) can be produced with different tongue positions.
Example: Bite-Block Vowels: Speakers with objects between teeth (altered gestures) still produce perceivable vowels.
Problem: Motor theory struggles to map variable gestures to one phoneme.

VI. Alternative Theory: General Auditory (GA) Approach

1. Core Claim

Speech perception is not special—uses the same auditory mechanisms as non-speech sounds (bird calls, door slams). Listeners analyze acoustic properties directly (no gestures).

2. Key Explanations

A. Voicing Contrast (Revisited)

20 ms VOT boundary (/b/ vs. /p/) aligns with a general auditory limit: Humans can’t perceive two sounds as simultaneous if they start >20 ms apart.
GA Interpretation: Categorical perception = general auditory constraint, not speech-specific.

B. Nonhuman Perception

Quail/chinchillas’ speech skills are explained by shared general auditory mechanisms—GA accounts for cross-species similarities.

3. Fuzzy Logical Model of Speech Perception (FLMP)

A GA framework with two processes:

Bottom-Up: Analyze acoustic properties to activate matching phonemes (e.g., rising transition → /d/).
Top-Down: Use memory (words, syntax) to pick the best candidate.

A. Evidence: The Ganong Effect

Setup: Ambiguous sound (halfway between /n/ and /m/).
Result: Perceived as a real word:
- lean bacon → /n/ (→ lean, real word) vs. /m/ (→ leam, not real).
- pleam bacon → /m/ (no competing real word).
Interpretation: Top-down word knowledge biases bottom-up acoustic analysis.

B. Evidence: Phonemic Restoration

Setup: Delete a phoneme (e.g., /s/ in legislator) and insert noise (cough).
Result: Listeners “restore” the phoneme (hear legislators) and perceive noise as separate.
Key: Restoration depends on context:
- The wagon lost its (cough)eel → /w/ (→ wheel).
- The circus has a trained (cough)eel → /s/ (→ seal).
Interpretation: Bottom-up acoustic cues + top-down context fill gaps.

VII. Summary: Motor Theory vs. GA Approach

Aspect	Motor Theory	General Auditory (GA) Approach
Fundamental Unit	Articulatory gestures	Acoustic properties (frequency, amplitude)
Processing Mechanism	Specialized speech module	General auditory mechanisms (shared with non-speech)
Explains	Duplex perception, categorical perception, McGurk effect	Cross-species perception, Ganong effect, phonemic restoration
Weaknesses	Anomalous TMS, nonhuman perception, perception-production dissociations	Doesn’t explain all human speech perception abilities

Quick Review Questions

What are formants and formant transitions?
How does coarticulation challenge speech perception?
What is the core claim of motor theory?
What is the McGurk effect, and how does it support motor theory?
How does the GA approach explain categorical perception?

🧩 Self-Check Questions

Q1. Why do many exchanges preserve syllable position (onset→onset) and stress?

Q2. What evidence suggests distinct lemma vs. phonology stages?

Q3. How can planned pauses reduce errors in L2 speech?

Q4. Give one morpheme error and the implicated stage.

Q5. What pattern would support lexical bias in slips?

🧰 Key Terms

Prosodic frame, Onset/Coda, Stress preservation, Lexical bias, Semantic substitution, Phonological slip, Morpheme error, Anticipation/Perseveration, Restart, Substitution repair, Editing term, Coarticulation, Monitoring (inner/outer loops).