Participant Profile
Masahiro Sato
Permanent Conductor of the Wagner Society Male ChoirGraduated from the Department of Vocal Music, Faculty of Music, Tokyo University of the Arts. Completed a Master's degree in Piano Accompanying at the Juilliard School. Active both domestically and internationally as an opera conductor. Lecturer at Aichi Prefectural University of the Arts.
Masahiro Sato
Permanent Conductor of the Wagner Society Male ChoirGraduated from the Department of Vocal Music, Faculty of Music, Tokyo University of the Arts. Completed a Master's degree in Piano Accompanying at the Juilliard School. Active both domestically and internationally as an opera conductor. Lecturer at Aichi Prefectural University of the Arts.
Tsuyoshi Moriyama
Other : Associate Professor, Faculty of Engineering, Tokyo Institute of PolytechnicsFaculty of Science and Technology GraduatedGraduate School of Science and Technology GraduatedCompleted the Major in Electrical Engineering at the Keio University Graduate School of Science and Technology in 1999. Ph.D. (Engineering). While a student, he was a member of the Keio University Wagner Society Male Choir. He practices "voice research" as one of his specialties. He supervised the "Mote-goe Diagnosis Tool VQ Checker," which became a hot topic.
Tsuyoshi Moriyama
Other : Associate Professor, Faculty of Engineering, Tokyo Institute of PolytechnicsFaculty of Science and Technology GraduatedGraduate School of Science and Technology GraduatedCompleted the Major in Electrical Engineering at the Keio University Graduate School of Science and Technology in 1999. Ph.D. (Engineering). While a student, he was a member of the Keio University Wagner Society Male Choir. He practices "voice research" as one of his specialties. He supervised the "Mote-goe Diagnosis Tool VQ Checker," which became a hot topic.
Rie Uozumi
Other : Former Nippon TV AnnouncerOther : Freelance AnnouncerOther : Speech and Voice DesignerFaculty of Letters GraduatedGraduated from the Keio University Faculty of Letters, Major in French Literature in 1995. Utilizing her many years of announcement techniques, she practices the "Uozumi Method of Speech."
Rie Uozumi
Other : Former Nippon TV AnnouncerOther : Freelance AnnouncerOther : Speech and Voice DesignerFaculty of Letters GraduatedGraduated from the Keio University Faculty of Letters, Major in French Literature in 1995. Utilizing her many years of announcement techniques, she practices the "Uozumi Method of Speech."
One's Way of Life Creates the Tone
Mr. Moriyama, have you been specializing in "voice research" for a long time?
Yes. I have been researching the emotions in human voices since my time at Keio's Faculty of Science and Technology. However, while research into speech recognition has a long tradition, research into emotions was not yet recognized as a field of study when I started in the early 90s.
I listened to the singing of tenor Fritz Wunderlich and wondered, "Why is his voice so beautiful?" His emotional expression was also rich. But at the time, research into beautiful voices was dismissed as something for artists to do.
How do differences in voices manifest?
Humans create the source of sound with the vocal cords located in the throat. If you take just this sound source, it is a flat sound with no tone, only pitch. But when that sound source resonates in the upper part of the skull, the noise components cancel each other out, and only the beautiful overtones emerge like a refined layer. That is where the tone is created.
Therefore, as everyone works hard every day and accumulates wrinkles, the condition from the throat up changes, and all of that is reflected in the creation of this tone. In other words, one's way of life is entirely related to the tone of their voice.
Does that mean you can change the tone yourself by how you open your mouth? Do people who produce high voices tend to get wrinkles in certain parts of their faces?
Yes. High voices are difficult to produce without using the nasal cavity well. There are four large cavities in the head: the maxillary sinus, frontal sinus, ethmoid sinus, and sphenoid sinus. When the sound created by the vocal cords resonates there, it creates something like a pathway for the breath.
An important thing in vocalization methods is that the air pathway must be well-opened. Specifically, with the "i" mouth shape, the tongue moves forward, but with "a," while the front opening of the mouth is large, the tongue pulls back, making the pathway surprisingly narrow.
Therefore, the "i" shape makes nasal resonance easier. It's like the feeling of a smile. When you make the "i" mouth shape, the tone suddenly becomes bright and easier to hear. This is actually because the way it resonates has changed.
So it means making it resonate in the upper part of the head?
Exactly. In the past, when elementary school music teachers said, "Produce your voice from the top of your head," everyone would say, "That's impossible," but there actually is a resonance point at the top of the head, so they weren't wrong.
For example, do men who sing high-pitched parts tend to have the corners of their mouths turned up?
I am a bass, but when I have to produce notes that are a bit higher than my natural voice, like a tenor, I do use the corners of my mouth or lift the way the nasal cavity resonates.
However, while nasal resonance is important in opera and choral singing, I believe the most important thing is the breath.
Starting first with taking air into the lungs.
Yes. The breath goes in, and then how you push it out. What makes us a bit different from Ms. Uozumi is that we have to produce a voice that can be heard in every corner of a 2,000-seat hall without using a microphone for amplification, so a certain amount of volume is necessary.
In that case, it's not just about making the vocal cords ring strongly, but about how much resonance you create to make the voice carry, so breath pressure is vital. Without strong physical support and pressure, you can't produce a voice that resonates throughout the space.
Is the way you produce a loud voice different when singing versus when speaking normally? Quite a few people worry that they can't keep speaking loudly when they talk.
Breath is important even when speaking. If you just say, "Inhale for now, and exhale a lot," a loud voice will naturally come out.
That is the basic. Vocalization for things like opera starts with the breath, but when people go to singing lessons, the conversation often turns to how to open the mouth, and they forget about the breath. But how you control your breath is the most important part.
The word "aria" in opera means "air" in Italian. I believe Italian opera is the art of breath control. Resonance, vocalization, and beauty of voice are all necessary, but in the end, I think it's about how you connect the breath.
Connecting Breath to Voice
It's difficult, isn't it? You mean singing as long as possible in one breath after inhaling once.
To sing a long phrase, you need a long breath.
Was it Messa di voce? There is a way of singing that rides on the natural flow of the breath. Listening to that is very pleasant. You feel a natural circulation.
It is said that when speaking, if you talk in one breath without pausing, people like business executives gain more persuasiveness and charisma. Is there a trick to inhaling and then speaking all at once? I often teach people to "use their abdominal muscles."
How to connect breath to voice is actually very difficult. But fortunately, in Japanese, important sounds are at the beginning of words, so if you say the first and second characters clearly, it communicates quite well.
As long as the breath carries the start of the word, the rest follows by inertia, so if you just say the first character clearly, it somehow connects with the breath.
Is that specifically regarding Japanese?
In Italian, with words like "Buongiorno" or "Lontano," it might be easier to put the breath on the long and short accents. In German, with "Ich liebe dich," it might be good to ride the breath skillfully on the accents.
Actually, the other day while practicing, I was constantly complaining that I couldn't understand the first word. In Japanese, it's true that if you understand the first word, you can predict what comes next.
Enlisting the help of the listener as well.
That's part of it too. So, if the first sound isn't clear, the listener thinks "Huh?" and has to think about it. But if the first sound is perfectly clear, it enters the ear naturally. I'm always saying that in practice, but it's hard to do.
What is a "Good Voice"?
I think pronunciation is very important, but on the other hand, everyone says they "want to have a good voice." But what exactly is a good voice? I think there are many types of attractive voices.
Perhaps it's a voice that matches the person's character and reveals their humanity.
For example, if a very rugged-looking person had a very high, cute voice, it wouldn't match, so it probably wouldn't be called a "good voice."
I see, so when it matches the appearance and character, people think, "Oh, what a lovely voice."
In 2011, I supervised a web app called the "Mote-goe (Attractive Voice) Diagnostic Tool VQ Checker," and it easily exceeded 12 million hits. Since then, I've often been interviewed by radio and TV asking, "What is a good voice?"
First, it's about the time, place, and occasion (TPO). For example, a good voice for a clerk at Ameyoko is a spirited, husky voice shouting "Welcome, welcome!" On the other hand, people in small spaces like elevator operators speak softly, almost whispering. In this way, the yardstick for a good voice changes completely depending on the situation.
The situation matters too. When discussing a private matter, some people talk so loudly that even the person behind them can hear (laughs).
It's also about whether you can control your way of speaking like that.
Also, the preferences of the listeners are infinitely varied.
Preferences are difficult.
So, there isn't a single answer for what a "good voice" is.
Even among singers, there are those with voices that make your heart tremble.
Like Professor Sato's low notes.
Yes. For example, if someone with a low voice that perfectly matches their character spoke to me on the phone, I'd think, "Oh, I love this" (laughs).
So there are two types: those that depend on the TPO, and those that are unconditionally "What a great voice!"
Everyone is wondering how they can produce that "unconditionally good voice."
Correct Voice and Way of Speaking
Everyone has high ideals. A good voice is, after all, something influenced by talent.
But a "correct voice" is something anyone can do. Since the structure of the human body doesn't vary that much between individuals, if you are conscious of points like the importance of the breath pathway and connecting the breath and voice by saying the first character clearly, everyone can produce a "correct voice."
Furthermore, in situations where you are trying to persuade someone, the climax—as in music—is important. A way of speaking without a climax won't be listened to.
You mean bringing a peak to the way you speak?
Yes. Good music has a flow. It is often said that a beautiful melody repeats three times. Once, twice, and then it changes on the third time. Like "Sakura, sakura, yayoi no sora wa." This apparently feels rhythmic.
In speeches, too, people in high positions often mention "three points" when giving greetings.
I also think that as long as the vocal cords are healthy, the voice itself doesn't differ that much. It's about resonance and how much pressure you use to push out the breath. Persuasive politicians whose appeals stay in the heart seem to have that strong "pressure."
Additionally, from a vocalist's perspective, nasal resonance varies from person to person just like vocal cords, but a larger resonator provides better resonance for carrying in a large hall and results in a mellower sound.
So it's better to have a larger body.
Yes. The larger the resonance part, the better. Japanese women tend to be petite, so from a global perspective, the roles they can sing in operas are very limited. For example, Japanese people are often cast in roles like the soubrette, a cheerful young girl, or roles that only sing leggiero (light and graceful).
There is a wonderful young baritone, but I felt the tone and color were different from the role I had in mind. He can perform the role properly and has the character to sing it through, but when I talked to him, he said, "Actually, I used to be a tenor." He was told by a doctor that his resonator is smaller than a typical baritone's.
So, even when he produces the low notes of a baritone, he can only use the resonator originally meant for the high notes of a tenor, so I feel the richness of the low notes is a bit lacking.
Vocal cords are sometimes compared to string instruments, but there are individual differences in the body part, or rather, strengths and weaknesses.
The Secret of a "Carrying Voice"
There's a sense that for opera singers, a loud voice is everything, but music has pianissimo, and in fact, even if they aren't always singing loudly on stage, they can be heard clearly even in the very last row of the audience. That's not about volume.
We call it a "carrying voice." What is considered not good is a "near-ringing" voice, which is so loud nearby that you want to cover your ears, but when heard from a distance in a large hall, it's like, "Oh, it's not that much."
Conversely, there are voices that don't sound that loud nearby, but even in a large hall with an orchestra, every single word can be heard clearly. I always wonder what the difference in those voices is.
The secret lies in the characteristics of the human ear. Sound is a mixture of various frequencies that form a single sound, but frequency components around 3,000 Hz are amplified while passing through the ear canal before reaching the eardrum, making them resonate most sensitively in the human ear. Vocalists produce their voices so that they resonate well in the paranasal sinuses, which results in a voice with a very high concentration of frequencies around 3,000 Hz.
A Swedish researcher conducted an experiment on singers performing in an orchestra and found a large peak around 3,000 Hz. This peak, called the 'singer's formant,' appears only when they are singing.
I also conducted experiments with Kabuki actors, and when they switched to Kabuki vocalization, a peak appeared in that same range.
I see, that makes sense. For example, when I go to a concert, at first it feels like the sound is coming from very far away, but after about ten minutes, it reaches a volume that transmits properly to my body, and I no longer feel the sound is too quiet. I've had that experience many times; is that related as well?
That is further known as the 'cocktail party effect.' Humans have the mysterious ability to hear sounds clearly from wherever they direct their attention.
Does that mean the ears gradually 'open up'?
Exactly. At first, the sound of the person next to you coughing and the sound from the stage seem to be at the same level. Gradually, your hearing becomes specialized toward the sound on the stage.
On a negative note, in a coffee shop or similar place, once you start noticing the sound of the person at the next table tapping their computer keys, it starts to sound incredibly loud. Is that also the cocktail party effect?
Yes, it is. A mistake musicians make is moving their hands frequently or making various irrelevant movements that distract the audience; this lowers the cocktail party effect. It's called selective hearing, and they stop being 'selected.'
Therefore, if you want to be heard, the staging used to make the other person concentrate is very important. When a salesperson wants their opinion to be heard, they stand side-by-side with the client. Once they have the client's attention, speaking in a quieter voice actually makes the person listen more closely.
Mimicking Behavior
Many people also struggle with the inflection of their voice. If there is no inflection in the way you speak, the message doesn't get across. I have them practice by reading books. I've been doing recitations since high school, and I have them add variation—raising or lowering the pitch, taking pauses, reading quietly, loudly, or quickly. I think it's very similar to music.
It's exactly the same.
After having them read intently like that, when they move on to free talk, their emotions start to come through naturally, and they become able to speak while maintaining control. I do a lot of narration work, where I stay in a booth for about three hours reading text written by others with full inflection. After that, I can talk endlessly. I wonder if it creates some kind of circuit in the brain.
In the world of psychology, it was believed until the 19th century that the inner self and behavior were the same, and that the inner self came first, leading to behavior.
However, individuals named James and Lange proposed the James-Lange theory, stating that 'humans do not cry because they are sad, but are sad because they cry.' One observes their own behavior of crying and recognizes, 'Ah, I am sad.' In other words, they argued that behavior comes first.
Today, we know that both processes occur.
So it's okay for behavior to come first.
Yes, it's the idea of 'starting with the form.' If you want to acquire a voice full of confidence, but you feel you must first be full of confidence internally, it might take a lifetime. But if you can just mimic someone who is full of confidence, it's much easier, and eventually, the inner self may follow.
There is something called 'behavioral therapy,' where patients with depression are treated by mimicking energetic people, and this is very popular in places like the United States.
Regarding 'mimicry,' people who become good at singing are good at mimicking. They likely naturally sense how the body is being used or how the breath is being managed. Being able to do that is one of the secrets to becoming a better singer.
It's exactly 'manebu' (to mimic). The etymology of 'manabu' (to learn) comes from 'manebu,' meaning to mimic.
Even for announcers, there is a process of mimicking while listening to the live broadcasts and reports of their seniors. First, you learn the form, and then confidence emerges.
Most people enter the field because they have an announcer they admire, so they start with mimicry and then improve. And they are good at singing and love karaoke. I suppose they naturally like holding a microphone (laughs).
In our practice as well, I sing it myself to show them 'it's like this' and have them mimic me.
Therefore, I think a big reason why the Wagner Society has maintained a consistent level for so long is that the instructors, like Professor Tamotsu Kinoshita and Professor Ryosuke Hatanaka, were originally vocalists.
The reason I originally started the 'Attractive Voice Diagnosis' was precisely because of behavioral therapy. I wondered if information technology could help those who had lost confidence in their voice due to aging, or those who lacked confidence in their voice to begin with.
In the Attractive Voice Diagnosis, if you speak into a computer, it evaluates your voice based on five points and gives an 'attractiveness score' out of 100. When people are told their score, they work hard to make it higher the next time.
At the same time, advice appears such as 'Try to smile a bit more to increase clarity' or 'Improve your articulation.' Following that, they think, 'I'll try to smile more than before,' and when the score goes up, they feel happy.
Once they gain confidence, they want to let others hear their voice, so before they know it, their inner self has changed. In that sense, I believe the voice is one's alter ego. It's invisible, but being praised for it makes one supremely happy.
That's true. It's like being told 'You're handsome' or 'You're beautiful.'
If someone says, 'I want to hear more,' I feel like I could sleep soundly for several nights without a worry (laughs). It gives you that much energy.
Male Voices, Female Voices
When people get angry, their voices get higher and louder. Anger requires showing 'I am strong' against an approaching threat, so one must assert muscular or physical strength. It's said that the male hormone testosterone is also involved.
I think it's the same for humans and other animals: high sounds are symbols of being small, cute, and weak, while low voices represent things that are large or strong.
A masculine feeling.
So, to express anger and drive someone away, you need the resonance of the body, and the vocal cords must lengthen to produce low sounds.
On the other hand, children have cute, high voices that contain a lot of frequencies around 3,000 Hz so that adults can hear them well. They are intentionally designed to be piercing. They have to be heard by adults.
It certainly grabs your attention.
In the case of men, when their voice changes as they become adults, a kind of gruffness or thickness emerges instead.
Women's voices also get lower as they get older, don't they? Like Hikaru Utada—her key was incredibly high in her debut song 'Automatic,' but it has gradually become lower.
Mariah Carey is the same.
What is the reason for women's voices getting lower?
Vocalists also find it harder to produce high notes. Muscle decline is a major factor.
Also, the surface of the vocal cords has a mucous membrane that allows them to come together very delicately when moisturized, but if you speak using your throat in a way that sounds like 'Yadaaa' (No way!), it damages the vocal cords. How is this way of speaking perceived among women? (laughs)
If women use high voices with each other and it becomes a bit 'from above,' they start 'mounting' each other. So, women must be unconsciously using lower voices to keep things casual. They might be using their throats that way to avoid being disliked (laughs).
That's not very good for the throat.
Does that also contribute to aging?
Also, it's about the healing after damage. When you're young, it heals quickly, but like wounds on the skin, vocal cords become harder to heal.
Starting with "Exhaling"
When I teach vocal music, as one form of expression, I tell students to use the vocal cords like a calligraphy brush, letting it down naturally from the tip. That is the proper way to bring the vocal cords together.
I believe that kind of breath and vocal cord usage—entering smoothly—is good. However, as you get older, you can no longer do that. Even for vocalists, the vocal cords eventually wear out after long use.
What should one do to avoid tiring the vocal cords or to use them for a long time?
I mentioned earlier that breath is important, and this is truly consistent. Throat muscles decline, but respiratory muscles also decline. The ability to inhale and exhale gradually weakens, and one becomes unable to do abdominal breathing, relying only on the chest up. Then, because you have to make yourself heard with little breath, you're forced to strain and produce a 'thin, forced' voice.
If you inhale without using your body, it leads to a poor way of speaking.
So, if you continue to use your stomach to produce your voice, the vocal cords are more likely to be protected.
I believe so. That's why it's better not to speak too fast. You end up only being able to take shallow breaths.
It certainly seems like fast talkers have rougher voices.
Since the era is all about 'good, cheap, and fast,' there's a pressure to speak rapidly in quick succession, but you should hold back and speak after taking a slow breath. Wouldn't that increase your persuasiveness?
It makes you feel like a living human being with a body, rather than a speaking machine.
That's true. But when I say 'please inhale,' everyone moves their chest area. Instead, it should be the image of expanding the stomach.
When you tell beginners to 'inhale,' their stomachs get stiff. That's why I always say, 'Let's start by exhaling.' First, exhale all your breath with a 'haaa' sound.
It's the same in singing. In any case, exhale completely. Then, the breath will enter naturally.
That natural entry is very good. When you say 'let's inhale,' people try to inhale even though there's already air, so it has no choice but to go into the chest, and they have to use force to expand it.
That's wonderful. You know everything a vocalist needs to do.
Announcing and Narration
Announcers first learn how to use their breath by extending a long 'ah' sound for about 30 seconds. This is to give the voice persistence when speaking long sentences in one breath, but we also do training like producing short, sharp 'ah' 'ah' sounds.
Then there's articulation. We practice the 'Uiro Uri' (The Medicine Seller) monologue as a tongue twister. Reading a script—turning what you see with your eyes into a voice—is also a special skill, so we practice reading things at first sight without making mistakes.
Just like music, we train in dynamics, tempo, and pausing. However, there aren't many truly skilled announcers.
Is that so? (laughs)
There are very few people who can transition from being an announcer to a narrator. I have been doing recitations since high school and participated in competitions, so I trained extensively.
I had been playing the piano since I was a child, so the flow of sound and melody was already in my body. I translated that into "reading," but there aren't many people who go from being an announcer to a narrator.
That's interesting. They also say that voice actors can't really become announcers either, right?
That's true. After all, narration is difficult if the body isn't involved. When you put the movement of your body into the sound, the listeners can be deeply moved.
So narration is like singing.
Voice actors deal with dialogue. In narration, you also read the stage directions; in fact, there is more of that kind of situational description.
They use the same voice and look like similar jobs, but they are completely different.
The Exquisite Charm of Choral Singing
Is there something different about the way you produce your voice when harmonizing compared to singing solo?
When you produce a long note, a vibrato naturally attaches to the voice, but when creating harmony, you reduce that vibrato. However, if there is none at all, it's hard to harmonize well, so it's easier to create harmony at a point where there is some leeway and you can feel each other. That balance is very difficult.
Also, it depends on how much solfège ability one has. Solfège means being able to read a musical score correctly and hit the right pitches.
As is the case with Wagner, I think it's a very high hurdle for people who are seeing a score for the first time to sing with the correct pitch. It's very important to be able to read the score correctly and train your body to know if your own sound is at the right pitch; if you can do that, you can maintain the note.
Solfège ability also relates to "imitation." Even among people who have never played the piano, there are those with excellent pitch. I suppose they have a good ear.
Also, what's interesting about choral singing is that while many people sing the same part, there is an exquisite charm where the sound source produced by one person's vocal cords resonates in the body of the person next to them.
Therefore, if you use your bodies—the resonators—in the same way, the singing resonates with each other and rings out powerfully. It's not just one person's body that is sounding.
That's why, in a choir where the members use their bodies differently, they can't help each other even if they gather to sing.
With choirs that aren't very good, it feels like the pitch is right, but the quality of the sound doesn't match, doesn't it?
Conversely, you start to hear each individual voice. In a good choir, you lose track of who is singing, and it blends in a way that a single personality seems to ring out from one part.
Everything is resonating together as one.
That's why choral singing is mysterious; even if individual abilities aren't that high, if the bodies resonate with each other, a fairly good sound comes out. I think that's the accessibility—or rather, the charm—of choral singing.
That's true. So, even if ten professional singers gather to sing, it doesn't necessarily mean it will result in a charming resonance. They would have the volume and hit the pitches correctly, so it would be a proper sound, but it might not necessarily have "flavor." Sometimes it's better when people with various voices and diverse backgrounds gather and sing using a method where everyone is looking in the same direction.
This is the same for orchestras as well.
Perhaps it's at the level of "intent." A long time ago, a conductor named Yoichiro Fukunaga reportedly only said "gently" during a certain rehearsal. "Gently," then he'd stop the sound and say, "more gently" (laughs). When he did that, they aligned perfectly, and the gentle sounds became one. I think that at the moment the image of "gently" was shared, everyone's resonators aligned and rang out together.
I feel like I've gained so much from our conversation today (laughs). Thank you very much.
*Affiliations and titles are those at the time of publication.