Sound source separating apparatus, sound source separating program, sound pickup apparatus, and sound pickup program

 

There is provided a sound source separating apparatus including a bidirectionality forming unit configured to form a bidirectionality by use of a sound signal picked up by two microphones which are located to be horizontal with respect to the target direction, among three microphones disposed at vertexes of an isosceles right triangle, a unidirectionality forming unit configured to form a unidirectionality by use of a sound signal picked up by two microphones which are located in a same direction as the target direction, among the three microphones, and a target sound extracting unit configured to extract a target sound by performing a spectral subtraction of all outputs from the bidirectionality forming unit and the unidirectionality forming unit from a signal.

 

 

CROSS REFERENCE TO RELATED APPLICATION(S)
This application is based upon and claims benefit of priority from Japanese Patent Application No. 2013-179886, filed on Aug. 30, 2013, the entire contents of which are incorporated herein by reference.
BACKGROUND
The present invention relates to a sound source separating apparatus, a sound source separating program, a sound pickup apparatus, and a sound pickup program, and can be applied to a sound source separating apparatus, a sound source separating program, a sound pickup apparatus, and a sound pickup program that separate and pick up a sound source only in a specific direction in an environment in which a plurality of sound sources are present, for example.
As a technique to separate and pick up a sound (hereinafter, things including a voice and a sound, for example, are expressed as a sound) only in a specific direction in an environment in which a plurality of sound sources are present, there is a beamformer (hereinafter also referred to as a BF) employing a microphone array. The beamformer is a technique to form directionality by use of a temporal difference between signals which reach respective microphones (see Futoshi Asano, “Acoustical Technology Series 16: Array signal processing for acoustics: localization, tracking and separation of sound sources, edited by the Acoustical Society of Japan, Corona Publishing Co., Ltd, Feb. 25, 2011). Beamformers are broadly classified into two kinds: an addition type and a subtraction type. In particular, the subtraction type BF has an advantage in that the subtraction type BF can form directionality with a smaller number of microphones than the addition type BF.
FIG. 2 is a block diagram showing a configuration of the subtraction type BF in which the number of microphones is two. In the subtraction type BF, first, a sound present in a target direction (hereinafter referred to as a target sound) reaches each of microphones 1 and 2, and a delayer 91 calculates a temporal difference between signals that have reached the microphones 1 and 2. Then, by adding a delay to a signal from any one of the microphones, a phase of the target sound is adjusted.
The temporal difference is calculated using the following formula (1). Here, d represents a distance between the microphones, c represents the sound speed, and τj, represents a delay. Further, θL represents an angle between the target direction and a perpendicular direction with respect to a straight line connecting the microphones 1 and 2.
τL=(d sin θL)/c  (1)
Here, in a case where a dead angle direction is present in the direction of the microphone 1 with respect to the intermediate point between the microphones 1 and 2, a delay process is performed on an input signal x1(t) of the microphone 1. Then, a subtracter 92 performs a process in accordance with a formula (2).
α(t)=x2(t)−x1(t−τL)  (2)
The subtraction process can be performed similarly in a frequency region, in which case the formula (2) is changed as follows.
A(ω)=X2(ω)−e−jωrLX1(ω)  (3)
Here, in a case where θL=±π/2, the formed directionality becomes a cardioid unidirectionality as shown in FIG. 3A, and in a case where θL=0 or π, the formed directionality becomes an eight-shaped bidirectionality as shown in FIG. 3B. Here, a filter that forms the unidirectionality from the input signal is referred to as a unidirectional filter and a filter that forms the bidirectionality is referred to as a bidirectional filter.
Further, by use of a spectral subtraction (hereinafter also referred to as an SS), a strong directionality can be formed in the dead angle direction of the bidirectionality. The directionality is formed by use of the SS in accordance with the following formula (4).
|Y(ω)|=|X1(ω)|−β|A(ω)|  (4)
Although the input signal X1 of the microphone 1 is used in the formula (4), the same effects can be obtained by using an input signal X2 of the microphone 2. Here, β is a coefficient for adjusting the intensity of the SS. When the value becomes negative in subtraction, a flooring process is performed to replace the value by 0 or a value that is smaller than the original value. This technique makes it possible to emphasize the target sound by extracting a sound that is present in directions other than the target direction (hereinafter referred to as a non-target sound) through the bidirectional filter and by subtracting an amplitude spectrum of the extracted non-target sound from an amplitude spectrum of the input signal.
SUMMARY
In order to actually use a sound source separating apparatus for a telephone call, voice recognition, and the like, however, it is necessary to form directionality only in one direction and to have a strong directionality. Although a unidirectional filter can make a dead angle in the direction opposite to the target direction as shown in FIG. 3A, unfortunately, the directionality in the target direction might become weak. Further, although a beamformer using the spectrum subtraction (SS) can obtain a strong directionality in the target direction, unfortunately, directionality is also formed in the same manner in the direction opposite to the target direction as shown in FIG. 3B. Accordingly, JP 2006-197552A proposes a technique to form unidirectionalities and bidirectionalities in various directions by increasing the number of microphones, and to form a strong directionality only in the target direction by use of outputs from the plurality of directional filters.
The technique disclosed in JP 2006-197552A, however, compares the outputs from the respective directional filters including the target sound according to each frequency and determines whether there is a target sound component or not, thereby separating a sound; thus, in a case where the determination of the target sound component fails, the sound quality of the target sound after the separation might degrade. Further, since masking is performed in which the component that is determined to be a non-target sound is made to 0 in separation, an increase in the non-target sound rapidly degrades the separation performance.
Further, in a case of picking up only a sound that is present within a specific area (hereinafter referred to as a target area sound), the use of the subtraction type BF alone might also pick up a sound source that is present in the periphery of the area (hereinafter referred to as a non-target area sound). Accordingly, the inventor of the present application proposes, in a reference document (Japanese Application Number 2012-217315), a technique to pick up the target area sound by forming directionalities toward a target area from different directions by use of a plurality of microphone arrays and by crossing the directionalities in the target area.
However, in an environment in which reverberation is strong, in particular, in a case where a primary reflection is large, the sound pickup performance might degrade. The technique disclosed in the reference document assumes that a component that is commonly included in the directionalities of the respective microphone arrays is only the target area sound, and that the non-target area sound components are different. Thus, in a case where a sound in an area that is located at a corner of a room or beside a wall is picked up and some of the non-target area sounds are reflected by the wall and are mixed in the directionalities of the respective microphone arrays at the same time, the non-target area sound components are regarded as the target area sound component and are extracted without being suppressed.
Accordingly, a sound source separating apparatus and program are required that can form a sharp directionality only in a target direction and can extract a target sound with little degradation in sound quality. Further, a sound pickup apparatus and program are required that can form directionality only in a forward direction of a target area and can suppress an influence of reverberation and can increase an SN ratio by picking up a sound in an area.
In order to solve one or more of the above problems, according to a first aspect of the present invention, there is provided a sound source separating apparatus including a bidirectionality forming unit configured to form a bidirectionality having a dead angle in a target direction by use of a sound signal picked up by two microphones which are located to be horizontal with respect to the target direction, among three microphones disposed at vertexes of an isosceles right triangle, a unidirectionality forming unit configured to form a unidirectionality having a dead angle in the target direction by use of a sound signal picked up by two microphones which are located in a same direction as the target direction, among the three microphones, and a target sound extracting unit configured to extract a target sound by performing a spectral subtraction of all outputs from the bidirectionality forming unit and the unidirectionality forming unit from either one of sound signals picked up by the two microphones located to be horizontal with respect to the target direction or a signal obtained by averaged sound signals picked up by the two microphones.
According to a second aspect of the present invention, there is provided a sound source separating apparatus including a bidirectionality forming unit configured to form a bidirectionality having a dead angle in a target direction by use of a sound signal picked up by two microphones which are located to be horizontal with respect to the target direction, among three microphones disposed at vertexes of a regular triangle, a unidirectionality forming unit configured to form two unidirectionalities having dead angles of +60° and −60° with respect to the target direction by use of a sound signal picked up by a combination of two microphones which are located at angles of +60° and −60° with respect to the target direction, among the three microphones, and a target sound extracting unit configured to extract a target sound by performing a spectral subtraction of all outputs from the bidirectionality forming unit and the unidirectionality forming unit from either one of sound signals picked up by the two microphones located to be horizontal with respect to the target direction or a signal obtained by averaged sound signals picked up by the two microphones.
According to a third aspect of the present invention, there is provided a sound source separating apparatus including a bidirectionality forming unit configured to form a bidirectionality having a dead angle in a target direction by use of a sound signal picked up by two microphones which are located to be horizontal with respect to the target direction, among three microphones disposed at vertexes of a regular triangle, a unidirectionality forming unit configured to form a unidirectionality having a dead angle in the target direction by use of a signal obtained by averaged sound signals picked up by two microphones which are located to be horizontal with respect to the target direction and a sound signal picked up by the other microphone, among the three microphones, and a target sound extracting unit configured to extract a target sound by performing a spectral subtraction of all outputs from the bidirectionality forming unit and the unidirectionality forming unit from either one of sound signals picked up by the two microphones located to be horizontal with respect to the target direction or a signal obtained by averaged sound signals picked up by the two microphones.
According to a fourth aspect of the present invention, there is provided a sound source separating program for causing a computer to function as a bidirectionality forming unit configured to form a bidirectionality having a dead angle in a target direction by use of a sound signal picked up by two microphones which are located to be horizontal with respect to the target direction, among three microphones disposed at vertexes of an isosceles right triangle, a unidirectionality forming unit configured to form a unidirectionality having a dead angle in the target direction by use of a sound signal picked up by two microphones which are located in a same direction as the target direction, among the three microphones, and a target sound extracting unit configured to extract a target sound by performing a spectral subtraction of all outputs from the bidirectionality forming unit and the unidirectionality forming unit from either one of sound signals picked up by the two microphones located to be horizontal with respect to the target direction or a signal obtained by averaged sound signals picked up by the two microphones.
According to a fifth aspect of the present invention, there is provided a sound source separating program for causing a computer to function as a bidirectionality forming unit configured to form a bidirectionality having a dead angle in a target direction by use of a sound signal picked up by two microphones which are located to be horizontal with respect to the target direction, among three microphones disposed at vertexes of a regular triangle, a unidirectionality forming unit configured to form two unidirectionalities having dead angles of +60° and −60° with respect to the target direction by use of a sound signal picked up by a combination of two microphones which are located at angles of +60° and −60° with respect to the target direction, among the three microphones, and a target sound extracting unit configured to extract a target sound by performing a spectral subtraction of all outputs from the bidirectionality forming unit and the unidirectionality forming unit from either one of sound signals picked up by the two microphones located to be horizontal with respect to the target direction or a signal obtained by averaged sound signals picked up by the two microphones.
According to a sixth aspect of the present invention, there is provided a sound source separating program for causing a computer to function as a bidirectionality forming unit configured to form a bidirectionality having a dead angle in a target direction by use of a sound signal picked up by two microphones which are located to be horizontal with respect to the target direction, among three microphones disposed at vertexes of a regular triangle, a unidirectionality forming unit configured to form a unidirectionality having a dead angle in the target direction by use of a signal obtained by averaged sound signals picked up by two microphones which are located to be horizontal with respect to the target direction and a sound signal picked up by the other microphone, among the three microphones, and a target sound extracting unit configured to extract a target sound by performing a spectral subtraction of all outputs from the bidirectionality forming unit and the unidirectionality forming unit from either one of sound signals picked up by the two microphones located to be horizontal with respect to the target direction or a signal obtained by averaged sound signals picked up by the two microphones.
According to a seventh aspect the present invention, there is provided a sound pickup apparatus including a plurality of microphone arrays each including three microphones disposed at vertexes of an isosceles right triangle or a regular triangle, a directionality forming unit which corresponds to the sound source separating apparatus according to claim 1, which is configured to form directionality, for each of the microphone arrays, only in a forward direction of each of the microphone arrays with respect to a target area by use of beamformers, for each output from each of the microphone arrays, a power correction coefficient calculating unit configured to calculate, with respect to each frequency, a ratio of amplitude spectra of beamformer outputs between outputs for each of the microphone arrays from the directionality forming unit and set a mode or a median of the calculated ratio of amplitude spectra as a correction coefficient which corrects power of beamformer outputs for each of the microphone arrays, and a target area sound extracting unit configured to extract a target area sound by performing the following processes in sequence, correcting a beamformer output from each of the microphone arrays from the directionality forming unit by use of the correction coefficient calculated by the power correction coefficient calculating unit, performing a spectral subtraction of the beamformer output from each of the microphone arrays, the beamformer output being obtained by the correction, to extract a non-target area sound which is present in the target area direction when seen from each of the microphone arrays, and performing a spectral subtraction of the extracted non-target area sound from the beamformer output from each of the microphone arrays from the directionality forming unit.
According to an eighth aspect of the present invention, there is provided a sound pickup program for causing computer including a plurality of microphone arrays each including three microphones disposed at vertexes of an isosceles right triangle or a regular triangle to function as a directionality forming unit which corresponds to the function of the sound source separating program according to claim 5, which is configured to form directionality only in a forward direction of each of the microphone arrays with respect to a target area by use of beamformers for each output from each of the microphone arrays, a power correction coefficient calculating unit configured to calculate, with respect to each frequency, a ratio of amplitude spectra of beamformer outputs between outputs for each of the microphone arrays from the directionality forming unit and set a mode or a median of the calculated ratio of amplitude spectra as a correction coefficient which corrects power of beamformer outputs for each of the microphone arrays, and a target area sound extracting unit configured to extract a target area sound by performing the following processes in sequence, correcting a beamformer output from each of the microphone arrays from the directionality forming unit by use of the correction coefficient calculated by the power correction coefficient calculating unit, performing a spectral subtraction of the beamformer output from each of the microphone arrays, the beamformer output being obtained by the correction, to extract a non-target area sound which is present in the target area direction when seen from each of the microphone arrays, and performing a spectral subtraction of the extracted non-target area sound from the beamformer output from each of the microphone arrays from the directionality forming unit.
According to one or more of the embodiments of the present invention, it is possible to form a sharp directionality only in a target direction and extract a target sound with little degradation in sound quality. Further, it is possible to form directionality only in a forward direction of a target area, and suppress an influence of reverberation and increase an SN ratio by picking up a sound in an area.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a block diagram showing a configuration of a sound source separating apparatus according to a first embodiment;
FIG. 2 is a block diagram showing a configuration of a subtraction type beamformer in which the number of microphones is two;
FIGS. 3A and 3B show directional characteristics formed by a subtraction type beamformer by use of two microphones;
FIG. 4 shows an example of directional characteristics formed by respective directional filters according to embodiments of the present invention;
FIG. 5 is a block diagram showing a configuration of a sound source separating apparatus according to a second embodiment;
FIG. 6 shows directional characteristics formed by directional filters according to a second embodiment;
FIG. 7 is a block diagram showing a configuration of a sound source separating apparatus according to a third embodiment;
FIG. 8 is a block diagram showing a configuration of a sound pickup apparatus according to a fourth embodiment;
FIG. 9 is a block diagram showing a configuration of a directionality forming unit of a sound pickup apparatus according to a fourth embodiment;
FIG. 10 shows an image of sound pickup in an area performed by a sound pickup apparatus according to a fourth embodiment;
FIG. 11 shows another image of sound pickup in an area performed by a sound pickup apparatus according to a fourth embodiment;
FIG. 12 is a block diagram showing a configuration of a sound pickup apparatus according to a fifth embodiment; and
FIG. 13 shows an example of an image of a situation in which, by use of two microphone arrays each including three microphones according to a fifth embodiment, two areas are switched to pick up a sound.
DETAILED DESCRIPTION OF THE EMBODIMENT(S)
Hereinafter, referring to the appended drawings, preferred embodiments of the present invention will be described in detail. It should be noted that, in this specification and the appended drawings, structural elements that have substantially the same function and structure are denoted with the same reference numerals, and repeated explanation thereof is omitted.
(A) Description of Technical Idea of Embodiments of the Present Invention
First, a technical idea of a sound source separating apparatus and program according to embodiments of the present invention will be described below.
In embodiments of the present invention, a bidirectionality and a unidirectionality are formed by use of three omnidirectional microphones, and perform a spectral subtraction (SS) of outputs from the respective directional filters from input signals, thereby forming a sharp directionality only in a target direction.
FIG. 4 shows an example of directional characteristics formed by the respective directional filters according to embodiments of the present invention.
Here, for example, two microphones are disposed to be horizontal with respect to the target direction, and are called a first microphone M1 and a second microphone M2. Further, a third microphone M3 is disposed on a straight line that intersects with a straight line connecting the first microphone M1 and the second microphone M2 and passes through any one of the first microphone M1 and the second microphone M2 (here, the second microphone M2). In this case, the distance between the third microphone M3 and the second microphone M2 is equal to the distance between the first microphone M1 and the second microphone M2. That is, the three microphones M1, M2, and M3 are located to be the vertexes of an isosceles right triangle.
First, signals from the first microphone M1 and the second microphone M2 are input to the bidirectional filter. Further, signals from the second microphone M2 and the third microphone M3 are input to the unidirectional filter having a dead angle toward the target direction.
In this manner, as shown in FIG. 4, it is found that the two directionalities each have a dead angle in the target direction. An output from the bidirectional filter becomes a non-target sound that is present in the left and right direction of the target direction, and an output from the unidirectional filter becomes a non-target sound that is present in a backward direction of the target direction. The use of these two directional filters enables extraction of all the non-target sounds that are present in directions other than the target direction. Finally, an SS of all the outputs from the respective directional filters from an input signal is performed to extract the target sound. Here, the target input signal is an input signal to the first microphone M1 or the second microphone M2, or a signal that is obtained by averaged input signals to the first microphone M1 and the second microphone M2.
In the above technique, the SS is performed by use of two output signals: an output signal from the bidirectional filter and an output signal from the unidirectional filter. As shown in a shaded area in FIG. 4, part of the bidirectionality overlaps with part of the unidirectionality, so that in a simple SS, the overlapped area is subtracted twice. The SS is a technique to extract the target sound by use of a nature called sparsity, with which individual sound components are unlikely to overlap in a frequency domain.
However, whether or not a certain sound component is present alone in a specific frequency depends on the number of sound sources and a frequency resolution. Thus, a situation can be considered where a plurality of sound components are present in the same frequency. Plural times of SS in such a situation might degrade the sound quality because the target sound component would be reduced every time the subtraction is performed.
Accordingly, in embodiments of the present invention, the area where the bidirectionality overlaps with the unidirectionality is canceled prior to the SS. When an amplitude spectrum of the non-target sound extracted by the unidirectional filter is subtracted from an amplitude spectrum of the non-target sound extracted by the bidirectional filter, among the non-target sound components extracted by the bidirectional filter, a component that is commonly included in the non-target sound component extracted by the unidirectional filter is canceled. After that, an SS of the non-target sound component extracted by the unidirectional filter and of the non-target sound extracted by the bidirectional filter from which the overlapped component is canceled from the input signal is performed. Thus, too much subtraction of the target sound component is not caused and the sound quality of the target sound can be prevented from degrading.
(B) First Embodiment
A first embodiment of a sound source separating apparatus and program according to an embodiment of the present invention will be described below in detail with reference to appended drawings.
(B-1) Configuration of the First Embodiment
FIG. 1 is a block diagram showing a configuration of a sound source separating apparatus 10A according to the first embodiment. Portions shown in FIG. 1 other than microphones may be configured by connecting various circuits in a hardware manner, or may be configured to execute corresponding functions by causing a general device or unit including a CPU, ROM, RAM, and the like to execute a predetermined program. In a case of employing either configuration method, the functions thereof can be expressed as FIG. 1.
In FIG. 1, the sound source separating apparatus 10A according to the first embodiment includes a first microphone M1, a second microphone M2, a third microphone M3, signal input units 1-1, 1-2, and 1-3, a signal adding unit 2, a bidirectionality forming unit 3, a unidirectionality forming unit 4, an overlapped directionality canceling unit 5, and a target signal extracting unit 6.
The first microphone M1, the second microphone M2, and the third microphone M3 are each an omnidirectional microphone.
The first microphone M1 and the second microphone M2 are disposed to be horizontal with respect to the target direction. The third microphone M3 is disposed to be present on the same plane as the first microphone M1 and the second microphone M2, to intersect with a straight line connecting the first microphone M1 and the second microphone M2, and to be on a straight line passing through the second microphone M2.
In this case, the distance between the third microphone M3 and the second microphone M2 is set to be equal to the distance between the first microphone M1 and the second microphone M2. Thus, the first microphone M1, the second microphone M2, and the third microphone M3 are located at the vertexes of an isosceles right triangle.
Note that the first microphone M1, the second microphone M2, and the third microphone M3 are disposed at the vertexes of an isosceles right triangle on the same plane in a space.
The signal input unit 1-1 is connected to the signal adding unit 2 and the bidirectionality forming unit 3, inputs a sound signal (things including a voice signal and a sound signal) picked up by the first microphone M1 by converting the sound signal from an analog signal into a digital signal, and outputs the sound signal to the signal adding unit 2 and the bidirectionality forming unit 3.
The signal input unit 1-2 is connected to the signal adding unit 2, the bidirectionality forming unit 3, and the unidirectionality forming unit 4, inputs a sound signal picked up by the second microphone M2 by converting the sound signal from an analog signal into a digital signal, and outputs the sound signal to the signal adding unit 2, the bidirectionality forming unit 3, and the unidirectionality forming unit 4.
The signal input unit 1-3 is connected to the unidirectionality forming unit 4, inputs a sound signal (voice signal, sound signal) picked up by the third microphone M3 by converting the sound signal from an analog signal into a digital signal, and outputs the sound signal to the unidirectionality forming unit 4.
In FIG. 1, in order to convert the input signal from a time domain into a frequency domain, the signal input units 1-1, 1-2, and 1-3 each perform, for example, fast Fourier transform.
The signal adding unit 2 adds signals output from the signal input unit 1-1 and the signal input unit 1-2, multiplies the power of the added signal by ½, and outputs the multiplied signal to the target signal extracting unit 6. An output signal from the signal adding unit 2 becomes an input signal when the spectral subtraction (SS) is performed in the target signal extracting unit 6. In the first embodiment, a case is shown in which a signal obtained by averaged sound signals from the first microphone M1 and the second microphone M2 by the signal adding unit 2 is output to the target signal extracting unit 6; however, either of the signals from the first microphone M1 or the second microphone M2 may be output to the target signal extracting unit 6.
The bidirectionality forming unit 3 is a bidirectional filter that forms a bidirectionality having a dead angle in the target direction by use of a beamformer (BF) with respect to the outputs (digital signals) from the signal input unit 1-1 and the signal input unit 1-2, and outputs the formed bidirectionality to the overlapped directionality canceling unit 5.
The unidirectionality forming unit 4 is a unidirectional filter that forms a unidirectionality having a dead angle in the target direction by use of the beamformers with respect to the outputs (digital signals) from the signal input unit 1-2 and the signal input unit 1-3, and outputs the formed unidirectionality to the overlapped directionality canceling unit 5.
The overlapped directionality canceling unit 5 cancels, in order to cancel the overlapped directionality area of the bidirectionality and the unidirectionality prior to the spectral subtraction (SS) performed in the target signal extracting unit 6, a signal component that is commonly included in the output signal from the bidirectionality forming unit 3 and the output signal from the unidirectionality forming unit 4.
The target signal extracting unit 6 is connected to the signal adding unit 2 and the overlapped directionality canceling unit 5, and extracts the target sound by performing the spectral subtraction of the output signal from the overlapped directionality canceling unit 5 from an input signal which is a signal from the signal adding unit 2.
In a process for extracting the target sound, all the outputs are expected to be expressed in a frequency domain. Therefore, as described above, the signal input units 1-1, 1-2, and 1-3 each include a conversion unit that converts a signal in a time domain into a signal in a frequency domain.
(B-2) Operation in the First Embodiment
Next, an operation in the sound source separating apparatus 10A according to the first embodiment will be described.
The first microphone M1, the second microphone M2, and the third microphone M3 are disposed at the vertexes of an isosceles right triangle. Let us assume that the interval between the first microphone M1 and the second microphone M2 and the interval between the second microphone M2 and the third microphone M3 are each 3 cm, for example.
A sound (voice and sound) emitted from a target sound source is picked up (captured) by the first microphone M1, the second microphone M2, and the third microphone M3.
A sound signal (analog signal) captured by the first microphone M1 is converted into a digital signal by the signal input unit 1-1, further converted by the signal input unit 1-1 by use of fast Fourier transformation, for example, from a time domain into a frequency domain, and given to the signal adding unit 2 and the bidirectionality forming unit 3.
Further, a sound signal (analog signal) captured by the second microphone M2 is converted into a digital signal by the signal input unit 1-2, further converted by the signal input unit 1-2 by use of fast Fourier transformation, for example, from a time domain into a frequency domain, and given to the signal adding unit 2, the bidirectionality forming unit 3, and the unidirectionality forming unit 4.
Further, a sound signal (analog signal) captured by the third microphone M3 is converted into a digital signal by the signal input unit 1-3, further converted by the signal input unit 1-3 by use of fast Fourier transformation, for example, from a time domain into a frequency domain, and given to the unidirectionality forming unit 4.
In the signal adding unit 2, the output signal from the signal input unit 1-1 and the output signal from the signal input unit 1-2, which have the same time axis, are added, and the power of the added signal is multiplied by ½, so that the target sound component is emphasized.
In the bidirectionality forming unit 3, in accordance with the formula (1) in which θL=0, on the basis of a distance d (e.g., 3 cm) between the first microphone M1 and the second microphone M2, a temporal difference between a signal that has reached the first microphone M1 and a signal that has reached the second microphone M2 is calculated. Further, in the bidirectionality forming unit 3, in accordance with the formula (3), on the basis of the output signal in the frequency domain from the signal input unit 1-1 and the output signal in the frequency domain from the signal input unit 1-2, the bidirectionality having a dead angle in the target direction is formed.
That is, as shown in FIG. 4, the bidirectionality formed by the bidirectionality forming unit 3 becomes a non-target sound that is present in a straight line direction (the left and right direction in FIG. 4) connecting the first microphone M1 and the second microphone M2 with respect to the target direction.
In the unidirectionality forming unit 4, in accordance with the formula (1) in which θL=−π/2, on the basis of a distance d (e.g., 3 cm) between the second microphone M2 and the third microphone M3, a temporal difference between a signal that has reached the second microphone M2 and a signal that has reached the third microphone M3 is calculated. Further, in the unidirectionality forming unit 4, in accordance with the formula (3), on the basis of the output signal in the frequency domain from the signal input unit 1-2 and the output signal in the frequency domain from the signal input unit 1-3, the unidirectionality having a dead angle in the target direction is formed.
That is, as shown in FIG. 4, the unidirectionality formed by the unidirectionality forming unit 4 becomes a non-target sound that is present in a backward direction of the target direction (that is, the opposite direction to the target direction).
In the overlapped directionality canceling unit 5, a signal component that is commonly included in an amplitude spectrum NBD of an output from the bidirectionality forming unit 3 and an amplitude spectrum NUD of an output from the unidirectionality forming unit 4 is canceled.
Here, the overlapped directionality canceling unit 5 cancels the overlapped signal component in accordance with a formula (5).
N UD 1 = { N UD - N BD 0 if N UD 1 < 0 ( 5 )
Here, NUD1 is an amplitude spectrum of an output signal from which the overlapped component of NUD and NBD is canceled.
In a case where NUD1 becomes negative as a result of the subtraction of the overlapped signal component, performed by the overlapped directionality canceling unit 5, the overlapped directionality canceling unit 5 performs a flooring process. Although in this example, the overlapped directionality canceling unit 5 performs subtraction of NBD from NUD, the subtraction of NUD from NBD may be performed so that an amplitude spectrum NBD1 of an output signal from which the overlapped component is canceled can be obtained.
Although the gain of the directionality according to frequencies due to beamformers (BFs) differs according to the intervals between microphones, let us assume that the gain correction is performed on the amplitude spectrum NBD of the output from the bidirectionality forming unit 3 and the amplitude spectrum NUD of the output from the unidirectionality forming unit 4. For example, the overlapped directionality canceling unit 5 may obtain the ratio of the amplitude spectrum according to frequencies on the basis of the amplitude spectrum NBD of the output from the bidirectionality forming unit 3 and the amplitude spectrum NUD of the output from the unidirectionality forming unit 4, which have the same time axis, and may perform the gain correction by use of a correction coefficient for making output power equal.
To the target signal extracting unit 6, an amplitude spectrum XDS of an output is given as the target sound from the signal adding unit 2, and the amplitude spectrum NBD of the output and the amplitude spectrum NUD1 of the output obtained after the subtraction of the overlapped area are given as the non-target sound from the overlapped directionality canceling unit 5.
Then, in the target signal extracting unit 6, by subtracting, from the amplitude spectrum XDS of the output from the signal adding unit 2, the amplitude spectrum NBD of the output from the overlapped directionality canceling unit 5 and the amplitude spectrum NUD1 of the output obtained after the subtraction of the overlapped area, an emphasized target sound is extracted.
The target signal extracting unit 6 extracts the target sound in accordance with a formula (6).
Y=XDS−β1NBD−β2NUD1  (6)
Here, β1 and β2 are coefficients for adjusting the intensity through the spectrum subtraction.
(B-3) Effects of the First Embodiment
As described above, according to the first embodiment, by performing the SS of the non-target sound from the input signal, the non-target sound being extracted by use of sound signals picked up by the three omnidirectional microphones through the unidirectional filter and the bidirectional filter, it is possible to form a sharp directionality only in the target direction.
Further, according to the first embodiment, since only the SS is used for formation of the directionality in the target direction, even when a noise is increased, the sound source separating performance does not degrade rapidly. Furthermore, according to the first embodiment, the SS performed after canceling the directionality overlapped area in which the bidirectionality overlaps with the unidirectionality prevents degradation of the sound quality of the target sound due to plural times of subtractions of the overlapped area.
(C) Second Embodiment
Next, a second embodiment of a sound source separating apparatus and program according to an embodiment of the present invention will be described in detail with reference to appended drawings.
The first embodiment shows the case where three microphones are disposed at the vertexes of an isosceles right triangle, and the second embodiment will show a case where three microphones are disposed at the vertexes of a regular triangle.
(C-1) Configuration of the Second Embodiment
FIG. 5 is a block diagram showing a configuration of a sound source separating apparatus 10B according to the second embodiment. The same or corresponding parts as FIG. 1 according to the first embodiment are denoted by the same reference numerals.
In FIG. 5, the sound source separating apparatus 10B according to the second embodiment includes a first microphone M1, a second microphone M2, a third microphone M3, signal input units 1-1, 1-2, and 1-3, a signal adding unit 2, a bidirectionality forming unit 3, unidirectionality forming units 4-1 and 4-2, an overlapped directionality canceling unit 5, and a target signal extracting unit 6.
The first microphone M1 and the second microphone M2 are disposed to be horizontal with respect to the target direction. The third microphone M3 is located to be present on the same plane as the first microphone M1 and the second microphone M2, and to be opposite to the target direction. Thus, the first microphone M1, the second microphone M2, and the third microphone M3 are disposed at the vertexes of a regular triangle.
The signal input unit 1-1 is connected to the signal adding unit 2, the bidirectionality forming unit 3, and the unidirectionality forming unit 4-1, and gives an output signal to the signal adding unit 2, the bidirectionality forming unit 3, and the unidirectionality forming unit 4-1.
The signal input unit 1-2 is connected to the signal adding unit 2 and the unidirectionality forming unit 4-2, and gives an output signal to the signal adding unit 2 and the unidirectionality forming unit 4-2.
The signal input unit 1-3 is connected to the unidirectionality forming units 4-1 and 4-2, and gives an output signal to the unidirectionality forming units 4-1 and 4-2.
The unidirectionality forming unit 4-1 is a unidirectional filter that forms a unidirectionality having a dead angle of +60° to the target direction by use of beamformers with respect to the outputs (digital signals) from the signal input unit 1-1 and the signal input unit 1-3, and outputs the formed unidirectionality to the overlapped directionality canceling unit 5.
The unidirectionality forming unit 4-2 is a unidirectional filter that forms a unidirectionality having a dead angle of −60° to the target direction by use of beamformers with respect to the outputs (digital signals) from the signal input unit 1-2 and the signal input unit 1-3, and outputs the formed unidirectionality to the overlapped directionality canceling unit 5.
The overlapped directionality canceling unit 5 cancels a signal component that is commonly included in the outputs from the bidirectionality forming unit 3 and the unidirectionality forming units 4-1 and 4-2.
(C-2) Operation in the Second Embodiment
Operations of the unidirectionality forming units 4-1 and 4-2, the overlapped directionality canceling unit 5, and the target signal extracting unit 6 in the sound source separating apparatus 10B according to the second embodiment are different from those in the first embodiment; therefore, the operations of these structural elements will be described below.
As described above, the first microphone M1, the second microphone M2, and the third microphone M3 are disposed at the vertexes of a regular triangle.
In the second embodiment, a unidirectionality is formed on the basis of a sound signal of the first microphone M1 and the third microphone M3, and a unidirectionality is formed on the basis of a sound signal of the second microphone M2 and the third microphone M3.
In the unidirectionality forming unit 4-1, in accordance with the formula (1) in which θL=−π/2, on the basis of a distance d (e.g., 3 cm) between the first microphone M1 and the third microphone M3, a temporal difference between a signal that has reached the first microphone M1 and a signal that has reached the third microphone M3 is calculated. Further, in the unidirectionality forming unit 4-1, in accordance with the formula (3), on the basis of the output signal in the frequency domain from the signal input unit 1-1 and the output signal in the frequency domain from the signal input unit 1-3, the unidirectionality having a dead angle of +60° to the target direction is formed.
In the unidirectionality forming unit 4-2, in accordance with the formula (1) in which θL=−π/2, on the basis of a distance d (e.g., 3 cm) between the second microphone M2 and the third microphone M3, a temporal difference between a signal that has reached the second microphone M2 and a signal that has reached the third microphone M3 is calculated. Further, in the unidirectionality forming unit 4-2, in accordance with the formula (3), on the basis of the output signal in the frequency domain from the signal input unit 1-2 and the output signal in the frequency domain from the signal input unit 1-3, the unidirectionality having a dead angle of −60° to the target direction is formed.
In the overlapped directionality canceling unit 5, a component that is commonly included in the output from the bidirectionality forming unit 3 and the output from the unidirectionality forming units 4-1 and 4-2 is canceled.
FIG. 6 shows directional characteristics formed by the directional filters according to the second embodiment.
As shown in FIG. 6, there exist overlapped directionality areas of the bidirectionality from the bidirectionality forming unit 3 and the unidirectionality from the unidirectionality forming unit 4-1 and of the bidirectionality from the bidirectionality forming unit 3 and the unidirectionality from the unidirectionality forming unit 4-2, and also of the unidirectionalities from the unidirectionality forming units 4-1 and 4-2.
The overlapped directionality canceling unit 5 cancels the overlapped areas in accordance with formulas (7) to (9) which are extended formulas of the formula (5).
N UDL 1 = { N UDL - N BD 0 if N UDL 1 < 0 ( 7 ) N UDR 1 = { N UDR - N BD 0 if N UDR 1 < 0 ( 8 ) N UDR 2 = { N UDR 1 - N UDL 1 0 if N UDR 2 < 0 ( 9 )
Here, NBD is an amplitude spectrum of an output from the bidirectionality forming unit 3, NUDL is an amplitude spectrum of an output from the unidirectionality forming unit 4-1, and NUDR is an amplitude spectrum of an output from the unidirectionality forming unit 4-2.
In the overlapped directionality canceling unit 5, a signal component that is commonly included in an amplitude spectrum NBD of an output from the bidirectionality forming unit 3 and the amplitude spectrum NUDL of an output from the unidirectionality forming unit 4-1 is canceled. That is, in the overlapped directionality canceling unit 5, in accordance with the formula (7), by subtracting the amplitude spectrum NBD of the output from the bidirectionality forming unit 3 from the amplitude spectrum NUDL of the output from the unidirectionality forming unit 4-1, an amplitude spectrum NUDL1 of an output obtained after the subtraction of the overlapped area is obtained.
In the overlapped directionality canceling unit 5, a signal component that is commonly included in an amplitude spectrum NBD of an output from the bidirectionality forming unit 3 and the amplitude spectrum NUDR of an output from the unidirectionality forming unit 4-2 is canceled. That is, in the overlapped directionality canceling unit 5, in accordance with the formula (8), by subtracting the amplitude spectrum NBD of the output from the bidirectionality forming unit 3 from the amplitude spectrum NUDR of the output from the unidirectionality forming unit 4-2, an amplitude spectrum NUD1 of an output obtained after the subtraction of the overlapped area is obtained.
Further, in the overlapped directionality canceling unit 5, a signal component that is commonly included in the amplitude spectrum NUDL1 and the amplitude spectrum NUD1 is canceled, the amplitude spectrum NUDL1 being of an output from which the component overlapped with NBD is canceled, the amplitude spectrum NUDR1 being of an output from which the component overlapped with NBD is canceled. That is, in the overlapped directionality canceling unit 5, in accordance with the formula (9), by subtracting, from the amplitude spectrum NUDR1 of the output from which the component overlapped with NBD is canceled, the amplitude spectrum NUDL1 of the output from which the component overlapped with NBD is canceled, an amplitude spectrum NUDR2 of an output obtained after the subtraction of the overlapped areas is obtained.
Further, in the formulas (7) to (9), the order of cancel of the overlapped components may be changed. That is, the amplitude spectra may be interchanged to execute the process as follows: NUDL2=NUDL1−NUDR1 or NBD1=NBD−NUDL.
Note that in the formulas (7) to (9), in a case where the values of the amplitude spectra NUDL1, NUDR1, and NUDR2 of the outputs obtained after the subtraction of the overlapped areas are negative, a flooring process is performed in which the values of the amplitude spectra NUDL1, NUDR1, and NUDR2 of the outputs obtained after the subtraction of the overlapped areas are each replaced by 0. Note that in the flooring process, the values may be replaced by the values smaller than the original values (values immediately before) of the amplitude spectra of the outputs obtained after the subtraction of the overlapped areas.
As in the first embodiment, the gain of the directionality according to frequencies due to BFs differs according to the intervals between microphones; therefore, the gain correction may be performed on each frequency for the amplitude spectra of the outputs.
To the target signal extracting unit 6, an amplitude spectrum XDS of the output is given as the target sound from the signal adding unit 2, and the amplitude spectrum NUDL1 of the output and the amplitude spectrum NUDR2 of the output which are obtained after the subtraction of the overlapped areas are given as the non-target sound from the overlapped directionality canceling unit 5.
Then, in the target signal extracting unit 6, in accordance with the formula (10), by subtracting the amplitude spectrum NUDL1 and the amplitude spectrum NUDR2 of the outputs obtained after the subtraction of the overlapped areas from the amplitude spectrum XDS of the output from the signal adding unit 2, an emphasized target sound is extracted. Here, β1, β2, and β3 are coefficients for adjusting the intensity through the SS.
Y=XDS−β1NBD−β2NUDL1−β3NUDR2  (10)
(C-3) Effects of the Second Embodiment
As described above, according to the second embodiment, in a case where three omnidirectional microphones are disposed at the vertexes of a regular triangle, effects as in the first embodiment are obtained.
(D) Third Embodiment
Next, a third embodiment of a sound source separating apparatus and program according to an embodiment of the present invention will be described in detail with reference to appended drawings.
In the second embodiment described above, the combination of the first microphone M1 and the third microphone M3 and the combination of the second microphone M2 and the third microphone M3 each form the unidirectionality.
Here, since the sound source that is present in the target direction reach the first microphone M1 and the second microphone M2 at the same time, the output from the signal adding unit 2 can be regarded as a sound signal that is picked up by a pseudo microphone located in the intermediate point between the first microphone M1 and the second microphone M2.
Accordingly, the third embodiment will show a case where the unidirectionality having a dead angle in the target direction is formed by use of the output from the signal adding unit 2 and the output from the signal input unit 1-3.
(D-1) Configuration of the Third Embodiment
FIG. 7 is a block diagram showing a configuration of a sound source separating apparatus 10C according to the third embodiment. The same or corresponding parts as in FIG. 1 and FIG. 5 according to the first and second embodiments are denoted by the same reference numerals.
In FIG. 7, the sound source separating apparatus 10C according to the third embodiment includes a first microphone M1, a second microphone M2, a third microphone M3, signal input units 1-1, 1-2, and 1-3, a signal adding unit 2, a bidirectionality forming unit 3, a unidirectionality forming unit 4, an overlapped directionality canceling unit 5, and a target signal extracting unit 6.
The signal input unit 1-1 is connected to the signal adding unit 2 and the bidirectionality forming unit 3, and gives an output signal to the signal adding unit 2 and the bidirectionality forming unit 3, as in the first embodiment.
The signal input unit 1-2 is connected to the signal adding unit 2 and the bidirectionality forming unit 3, and gives an output signal to the signal adding unit 2 and the bidirectionality forming unit 3.
The signal input unit 1-3 is connected to the unidirectionality forming unit 4, and gives an output signal to the unidirectionality forming unit 4.
The signal adding unit 2 adds signals output from the signal input unit 1-1 and the signal input unit 1-2, as in the first embodiment, and multiplies the power of the added signal by ½, and outputs the multiplied signal to the target signal extracting unit 6 and the unidirectionality forming unit 4.
The unidirectionality forming unit 4 is a unidirectional filter that forms the unidirectionality having a dead angle in the target direction by use of beamformers with respect to the outputs from the signal input unit 1-3 and the signal adding unit 2, and outputs the formed unidirectionality to the overlapped directionality canceling unit 5.
The bidirectionality forming unit 3, the overlapped directionality canceling unit 5, and the target signal extracting unit 6 have the same configurations as those in the first embodiment.
(D-2) Operation in the Third Embodiment
The operation of the unidirectionality forming unit 4 in the sound source separating apparatus 10C according to the third embodiment are different from those in the first and second embodiments; therefore, the operation of the unidirectionality forming unit 4 will be described below.
In the signal adding unit 2, signals output from the signal input unit 1-1 and the signal input unit 1-2 are added, and a signal obtained by multiplying the power of the added signal by ½ is output to the unidirectionality forming unit 4.
Since the outputs from the signal input units 1-1 and 1-2 which are disposed to be horizontal with respect to the target direction are averaged, the output from the signal adding unit 2 can be regarded as a sound signal that is picked up by a microphone (a pseudo microphone) located in the intermediate point between the first microphone M1 and the second microphone M2.
In the unidirectionality forming unit 4, in accordance with the formula (1) in which θL=−π/2, a temporal difference between the output from the third microphone M3 and the output from the signal adding unit 2 is calculated. Further, in the unidirectionality forming unit 4, in accordance with the formula (3), on the basis of the output signal in the frequency domain from the signal input unit 1-3 and the output signal in the frequency domain from the signal adding unit 2, the unidirectionality having a dead angle in the target direction is formed.
Operations of the bidirectionality forming unit 3, the overlapped directionality canceling unit 5, and the target signal extracting unit 6 are the same as those in the first embodiment, so that an emphasized target sound is extracted by the target signal extracting unit 6.
(D-3) Effects of the Third Embodiment
As described above, according to the third embodiment, even in a case where three omnidirectional microphones are disposed at the vertexes of a regular triangle, effects as in the first and second embodiments are obtained by regarding the output from the signal adding unit 2 as the sound signal picked up by the microphone located in the intermediate point between the first microphone M1 and the second microphone M2 because output signals reach the first microphone M1 and the second microphone at the same time.
(E) Fourth Embodiment
Next, a fourth embodiment of a sound source separating apparatus, sound source separating program, sound pickup apparatus, and sound pickup program according to an embodiment of the present invention will be described in detail with reference to appended drawings.
The fourth embodiment will show a case in which the present invention is applied to a sound pickup apparatus that picks up a target area sound that is present within a specific area by use of the microphone array including three omnidirectional microphones described in the first embodiment.
(E-1) Configuration of the Fourth Embodiment
FIG. 8 is a block diagram showing a configuration of a sound pickup apparatus 20A according to the fourth embodiment. In FIG. 8, the same or corresponding parts as in FIG. 1 according to the first embodiment are denoted by the same reference numerals.
Portions shown in FIG. 8 other than microphones may be configured by connecting various circuits in a hardware manner, or may be configured to execute corresponding functions by causing a general device or unit including a CPU, ROM, RAM, and the like to execute a predetermined program. In a case of employing either configuration method, the functions thereof can be expressed as FIG. 8.
In FIG. 8, the sound pickup apparatus 20A according to the fourth embodiment includes a first microphone array MA1, a second microphone array MA2, a data input unit 1, a directionality forming unit 21, a delay correcting unit 22, a spatial coordinate data holding unit 23, a target area sound power correction coefficient calculating unit 24, and a target area sound extracting unit 25.
The first microphone array MA1 is disposed in a space where the target area (hereinafter also referred to as TAR, see FIG. 10) is present and in a position where the target area TAR can be directed.
As shown in FIG. 8, the first microphone array MA1 includes three microphones M1, M2, and M3. The three microphones M1, M2, and M3 are disposed at the vertexes of an isosceles right triangle. A sound signal picked up (captured) by each of the microphones M1, M2, and M3 is input to a main body of the sound pickup apparatus 20A.
In the same manner as that of the first microphone array MA1, the second microphone array MA2 has a configuration in which three microphones M1, M2, and M3 are disposed at the vertexes of an isosceles right triangle. A sound signal picked up (captured) by each of the microphones M1, M2, and M3 is input to the main body of the sound pickup apparatus 20A.
Further, the second microphone array MA2 is disposed at a position where the target area TAR can be directed, which is different from the position of the first microphone array MA1. That is, the positions of the first and second microphone arrays MA1 and MA2 may be disposed differently with respect to the target area TAR, for example, such that the first and second microphone arrays MA1 and MA2 face each other with the target area TAR interposed therebetween, as long as the directionalities of the microphone arrays MA1 and MA2 overlap with each other at least in the target area TAR.
Note that the number of microphone arrays is not limited to two. In a case where a plurality of the target areas TAR are present, the number of microphone arrays may be large enough to cover all the target areas TAR.
Further, the microphones M1, M2, and M3 included in each of the first and second microphone arrays MA1 and MA2 may be disposed at the vertexes of an isosceles right triangle or may be disposed at the vertexes of a regular triangle.
The data input unit 1 converts the sound signal picked up by the first and second microphone arrays MA1 and MA2 from an analog signal to a digital signal. The data input unit 1 converts a signal from a time domain into a frequency domain, for example, by use of fast Fourier transformation or the like, and outputs the converted signal to the directionality forming unit 21.
The directionality forming unit 22 forms a directional beam which sets the directionality toward a forward direction of each of the microphone arrays MA1 and MA2 with respect to the target area direction by use of a beamformer with respect to an output (digital signal) from each of the microphone arrays MA1 and MA2 and obtains beamformer outputs of the microphone arrays MA1 and MA2. In a technique using a beamformer, any one of various methods can be used, such as an addition type delay-and-sum method, a subtraction type spectrum-and-subtraction method, and the like. Further, the intensity of directionality may be changed in accordance with the range of the target area TAR.
The spatial coordinate data holding unit 23 holds position information of (the center of) the target area TAR and position information of each of the microphone arrays MA1 and MA2.
The delay correcting unit 22 calculates a difference of a delay (propagation delay time) generated by a difference between the distance between the target area TAR and the microphone array MA1 and the distance between the target area TAR and the microphone array MA2, and corrects at least one of beamformer outputs of the microphone arrays MA1 and MA2 so as to absorb the difference. Specifically, first, the position of the target area TAR and the position of each microphone array are acquired from the spatial coordinate data holding unit 23 and a difference in time when the target area sound reaches each microphone array (propagation delay time) is calculated. By using, as a reference, the timing at which the target area sound reaches the microphone array that is disposed at the farthest position from the target area TAR, delays are added to beamformer outputs of all the microphone arrays other than the reference microphone array so that the target area sounds can reach all the microphone arrays at the same time.
Note that in a case where the target area TAR is not changed and the distances between the target area TAR and each of the microphone arrays MA1 and MA2 are equal, the delay correcting unit 22 and the spatial coordinate data holding unit 23 can be omitted.
The target area sound power correction coefficient calculating unit 24 calculates a correction coefficient for making the power of the target area sounds at all of the beamformer outputs equal.
Here, as an example of the calculation of the correction coefficient, performed by the target area sound power correction coefficient calculating unit 24, the ratio of power of the target area sound included in the BF output from each of the microphone array may be estimated to be used as the correction coefficient.
The target area sound extracting unit 25 extracts the target area sound on the basis of each beamformer output which is output from the delay correcting unit 22 and the correction coefficient which is output from the target area sound power correction coefficient calculating unit 24.
FIG. 9 is a block diagram showing an internal configuration of the directionality forming unit 21 according to the fourth embodiment.
The directionality forming unit 21 has, for each of the microphone arrays MA1 and MA2, the same or corresponding configuration as in the sound source separating apparatus 10A described in the first embodiment, and the corresponding structural elements are denoted by the same reference numerals as in FIG. 1 in the first embodiment.
That is, since the directionality forming unit 21 forms directionality that has a directional direction in a forward direction of the microphone array with respect to the target direction for each of the microphone arrays MA1 and MA2, the directionality forming unit 21 has the internal configuration shown in FIG. 9 for each of the microphone arrays MA1 and MA2.
In FIG. 9, the directionality forming unit 21 according to the fourth embodiment includes a signal adding unit 2, a bidirectionality forming unit 3, a unidirectionality forming unit 4, an overlapped directionality canceling unit 5, and a target signal extracting unit 6.
(E-2) Operation in the Fourth Embodiment
Next, the operation of the sound pickup apparatus 20A according to the fourth embodiment will be described.
A sound emitted from all the sound sources located in the target area TAR is captured by all the microphones M1, M2, and M3 of the microphone arrays MA1 and MA2, which set the target area TAR as a processing target. Note that the microphones M1, M2, and M3 of the microphone arrays MA1 and MA2 also capture a sound from a sound source that is present in an area other than the target area TAR.
The sound signal (analog signal) picked up (captured) by all the microphones M1, M2, and M2 of the first microphone array MA1 is converted into a digital signal by the data input unit 1 and is given to the directionality forming unit 21. Similarly, the sound signal (analog signal) picked up (captured) by all the microphones M1, M2, and M2 of the second microphone array MA2 is converted into a digital signal by the data input unit 1 and is given to the directionality forming unit 21.
All the sound signals from the first microphone array MA1, which have been converted into digital signals, are subjected to a beamformer process performed by the directionality forming unit 21 such that the directional direction is set to a forward direction of the microphone array MA1 with respect to the direction of the target area TAR, and the beamformer output is given to the delay correcting unit 22. Further, all the sound signals from the second microphone array MA2, which have been converted into digital signals, are subjected to a beamformer process performed by the directionality forming unit 21 such that the directional direction is set to a forward direction of the microphone array MA1 with respect to the direction of the target area TAR, and the beamformer output is given to the delay correcting unit 22.
Here, a detailed operation in the directionality forming unit 21 will be described with reference to FIG. 9.
An input signal X11 and an input signal X12, which are output from the microphone M1 and the microphone M2, respectively, located to be horizontal with respect to the target direction, of the first microphone array MA1 are given to the signal adding unit 2. In the signal adding unit 2, after adding the input signal X11 and the input signal X12, the power of the added signal is multiplied by ½, so that the target sound component is emphasized.
Further, the input signals X11 and X12 from the microphones M1 and M2 of the first microphone array MA1 are given to the bidirectionality forming unit 3. In the bidirectionality forming unit 3, by use of the input signals X11 and X12, a bidirectional filter having a dead angle in the target direction is formed. As in the first embodiment, the bidirectionality is formed in accordance with the formulas (1) and (3) in which θL=0.
Further, the input signal X12 and an input signal X13 from the microphones M2 and M3 of the first microphone array MA1, the microphones being located in the same direction as the target direction, are given to the unidirectionality forming unit 4. In the unidirectionality forming unit 4, by use of the input signals X12 and X13 which are inputs from the microphones M2 and M3 located in the same direction as the target direction, a unidirectional filter having a dead angle in the target direction is formed. As in the first embodiment, the unidirectionality is formed in accordance with the formulas (1) and (3) in which θL=−π/2.
In the overlapped directionality canceling unit 5, a signal component that is commonly included in an amplitude spectrum NBD of an output from the bidirectionality forming unit 3 and an amplitude spectrum NUD of an output from the unidirectionality forming unit 4 is canceled. That is, in the overlapped directionality canceling unit 5, in accordance with the formula (5), an amplitude spectrum NUD1 of an output obtained after subtraction of an overlapped area is obtained by subtracting the amplitude spectrum NBD of the output from the bidirectionality forming unit 3 from the amplitude spectrum NUD of an output from the unidirectionality forming unit 4.
In a case where the amplitude spectrum NUD1 of an output obtained after the subtraction of the overlapped area is negative, a flooring process is performed in which the value of the amplitude spectrum NUD1 of the output obtained after the subtraction of the overlapped area is replaced by 0 or a value smaller than the original value. Note that in the flooring process, the value may be replaced by a value that is smaller than the original value (value immediately before) of the amplitude spectrum NUD1 of the output obtained after the subtraction of the overlapped area.
Although the gain of the directionality according to frequencies due to beamformers (BFs) differs according to the intervals between microphones, let us assume that the gain correction is performed on the amplitude spectrum NBD of the output from the bidirectionality forming unit 3 and the amplitude spectrum NUD of the output from the unidirectionality forming unit 4. For example, the overlapped directionality canceling unit 5 may obtain the ratio of the amplitude spectrum according to frequencies on the basis of the amplitude spectrum NBD of the output from the bidirectionality forming unit 3 and the amplitude spectrum NUD of the output from the unidirectionality forming unit 4, which have the same time axis, and may perform the gain correction by use of a correction coefficient for making the output power equal.
To the target signal extracting unit 6, an amplitude spectrum XDS of an output is given as the target sound from the signal adding unit 2, and the amplitude spectrum NBD of the output and the amplitude spectrum NUD1 of the output obtained after the subtraction of the overlapped area are given as the non-target sound from the overlapped directionality canceling unit 5. Then, in the target signal extracting unit 6, in accordance with the formula (6), by subtracting, from the amplitude spectrum XDS of the output from the signal adding unit 2, the amplitude spectrum NBD of the output from the overlapped directionality canceling unit 5 and the amplitude spectrum NUD1 of the output obtained after the subtraction of the overlapped area, an emphasized target sound is extracted.
As for the second microphone array MA2, input signals X21, X22, and X23 from the microphones M1, M2, and M3 are given to the directionality forming unit 21, and in the same manner as that in the case of the first microphone array MA1, an emphasized target sound is extracted only to a forward direction of the second microphone array MA2 with respect to the target direction.
In the delay correcting unit 3, on the basis of data held by the spatial coordinate data holding unit 23, a difference between a propagation delay time from the target area TAR to the first microphone array MA1 and a propagation delay time from the target area TAR to the second microphone array MA2, the difference being generated by the difference between the distance between the target area TAR and the microphone array MA1 and the distance between the target area TAR and the microphone array MA2, is calculated, and at least one of time axes of beamformer outputs Xma1(t) and Xma2(t−τ) for each of the microphone arrays MA1 and MA2 is corrected so as to absorb the temporal difference.
In the above manner, the beamformer outputs Xma1(t) and Xma2(t−τ) having the same time axis are given to the target area sound extracting unit 25 and the target area sound power correction coefficient calculating unit 24.
Further, in the target area sound power correction coefficient calculating unit 24, on the basis of the beamformer outputs Xma1(t) and Xma2(t−τ) having the same time axis, a correction coefficient for making the power of the target area sounds equal in the beamformer outputs Xma1(t) and Xma2(t−τ) is calculated.
In a case of using two microphone arrays MA1 and MA2, for example, the correction coefficient of the target area sound power is calculated using formulas (11) and (12) or formulas (13) and (14).
α 1 ( n ) = mod e ( X 2 k ( n ) X 1 k ( n ) ) k = 1 , 2 , , N ( 11 ) α 2 ( n ) = mod e ( X 1 k ( n ) X 2 k ( n ) ) k = 1 , 2 , , N ( 12 ) α 1 ( n ) = median ( X 2 k ( n ) X 1 k ( n ) ) k = 1 , 2 , , N ( 13 ) α 2 ( n ) = median ( X 1 k ( n ) X 2 k ( n ) ) k = 1 , 2 , , N ( 14 )
Here, X1k(n) and X2k(n) represent amplitude spectra of the beamformer outputs from the microphone arrays MA1 and MA2, N represents the total number of frequency bins, k represents a frequency, and α1(n) and α2(n) represent power correction coefficients with respect to each of the beamformer outputs.
The target area sound extracting unit 25 performs a spectral subtraction of each beamformer output data that has been corrected by any one of the correction coefficients α1(n) and α2(n) from the target area sound power correction coefficient calculating unit 24, in accordance with the formulas (15) and (16), and extracts noise that is present in the target area direction. That is, each beamformer output is corrected by any one of the correction coefficients α1(n) and α2(n), and the spectral subtraction is performed, thereby extracting the non-target area sound that is present in the target area direction.
N1(n)=X1(n)−α2(n)X2(n)  (15)
N2(n)=X2(n)−α1(n)X1(n)  (16)
In order to extract a non-target area sound N1(n) that is present in the target area direction when seen from the microphone array MA1, as shown in the formula (15), a spectral subtraction, from the beamformer output X1(n) of the microphone array MA1, of a value obtained by multiplying the beamformer output X2(n) from the microphone array MA2 by the power correction coefficient α2 is performed. Similarly, a non-target area sound N2(n) that is present in the target area direction when seen from the microphone array MA2 is extracted in accordance with the formula (16).
Further, the target area sound extracting unit 25 performs a spectral subtraction of the extracted noise from each beamformer output in accordance with formulas (17) and (18), thereby extracting the target area sound. Here, γ1(n) and γ2(n) are coefficients for changing the intensity at the time of the spectral subtraction.
Y1(n)=X1(n)−γ1(n)N1(n)  (17)
Y2(n)=X2(n)−γ2(n)N2(n)  (18)
FIG. 10 shows an image of sound pickup in an area performed by the sound pickup apparatus 20A according to the fourth embodiment. A dotted line in FIG. 10 represents the directionality of a conventional subtraction-type BF using bidirectionality, the BF being proposed in Japanese Application Number 2012-217315, and a painted portion represents the directionality obtained by the technique according to the fourth embodiment.
As shown in FIG. 10, in each of the microphone arrays MA1 and MA2, the microphones M1 and M2 are disposed to be horizontal with respect to the target direction, and the microphone M3 is disposed on a straight line that intersects with a straight line connecting the microphone M1 and M2 and passes through any of the microphones (here, the microphone M2).
Since the directionality of each of the microphone arrays MA1 and MA2 is formed only in the forward direction, an effect of reverberation from the backward direction can be suppressed. Further, by suppressing non-target area sounds 1 and 2 located in the backward direction of each of the microphone arrays MA1 and MA2 beforehand, the non-target area sounds being denoted by the dotted line in FIG. 10, the SN ratio of picking up a sound in an area can be improved.
A conventional area-sound pickup technique requires the directionalities of the microphone arrays MA1 and MA2 to overlap with each other only in the target area. Therefore, as shown in FIG. 10, indeed the conventional bidirectional subtraction-type BF can form a sharp directionality in the target direction, but a straight directionality is formed not only in the forward direction, but also in the backward direction, of the microphone arrays MA1 and MA2 with respect to the target direction. Accordingly, even when a sound is to be picked up in an area between the two microphone arrays MA1 and MA2, all the directionalities of the microphone arrays MA1 and MA2 overlap with each other, resulting in a sound pickup of all the areas that are present on the straight line connecting the two microphone arrays MA1 and MA2.
However, in a case of the fourth embodiment, the directionalities of the microphone arrays MA1 and MA2 are formed only in the forward direction of the target area TAR; thus, it is possible to pick up a sound in an area between the two microphone arrays MA1 and MA2.
FIG. 11 shows another image of sound pickup in an area performed by the sound pickup apparatus 20A according to the fourth embodiment. In FIG. 11, the two microphone arrays MA1 and MA2 are disposed to face each other with the target area TAR interposed therebetween.
In this case, when the directionalities of the two microphone arrays MA1 and MA2 are formed, the directionality of the microphone array MA1 includes the target area sound and a non-target area sound 2.
Further, the directionality of the microphone array MA2 includes the target area sound and a non-target area sound 1.
Since the non-target area sound components included in the directionalities are different, only the target area sound that is commonly included therein can be extracted. An area-sound pickup with the microphone arrays MA1 and MA2 disposed in this manner, can further suppress the effects of reverberation.
That is, in a case where the area-sound pickup is performed by use of the two microphone arrays MA1 and MA2, in the conventional area-sound technique proposed in Japanese Application Number 2012-217315, the angle made by the directionalities of the microphone arrays MA1 and MA2 is 90°, while it is 180° according to the fourth embodiment. Accordingly, the reflected non-target area sound is less likely to be mixed into the directionalities of the microphone arrays MA1 and MA2 at the same time, and the area-sound pickup performance is less likely to degrade.
(E-3) Effects of the Fourth Embodiment
As described above, according to the fourth embodiment, by use of a microphone array including three omnidirectional microphones, the directionality is formed only in the forward direction of the target area, and the area-sound pickup can suppress the effects of reverberation and improve the SN ratio.
(F) Fifth Embodiment
Next, a fifth embodiment of a sound source separating apparatus, sound source separating program, sound pickup apparatus, and sound pickup program according to an embodiment of the present invention will be described in detail with reference to appended drawing.
In a case of using microphone arrays each including three microphones, a change in combination of the microphones that form the bidirectionality or the unidirectionality can change the direction in which the directionality is formed.
Accordingly, in the fifth embodiment, an embodiment will be shown in which a change in the directional direction of each microphone array enables sound pickup of another area without moving the microphone arrays.
(F-1) Configuration of the Fifth Embodiment
FIG. 12 is a block diagram showing a configuration of a sound pickup apparatus 20B according to the fifth embodiment. The same or corresponding parts as in FIG. 8 according to the fourth embodiment are denoted by the same reference numerals.
In FIG. 12, the sound pickup apparatus 20B according to the fifth embodiment includes a first microphone array MA1, a second microphone array MA2, a data input unit 1, a directionality forming unit 21, a delay correcting unit 22, a spatial coordinate data holding unit 23, a target area sound power correction coefficient calculating unit 24, and a target area sound extracting unit 25, and in addition, an area selecting unit 26 and an area switching unit 27.
The area selecting unit 26 receives information on the target area TAR that is selected by a user through a GUI, for example, and gives the information to the area switching unit 8. The number of the target areas TAR is not limited to one, and a plurality of the target areas can be selected at the same time.
On the basis of the information of the target area TAR given from the area selecting unit 26, the area switching unit 27 acquires position information of the target area TAR, each of the microphone arrays MA1 and MA2, and the microphones M1, M2, and M3 included in each of the microphone arrays MA1 and MA2, from the spatial coordinate data holding unit 23, determines combination of microphone arrays and microphones that are necessary for forming the directionality toward the target area TAR, and controls a signal to be input to the directionality forming unit 21.
(F-2) Operation in the Fifth Embodiment
Operations of the area selecting unit 26 and the area switching unit 27 in the operation of the sound pickup apparatus 20B according to the fifth embodiment are different from those in the sound pickup apparatus 20A according to the fourth embodiment; therefore, the operations of the area selecting unit 26 and the area switching unit 27 will be described in detail.
The area selecting unit 26 receives information on one or more target areas TAR that are selected by the user through a GUI, for example, and transmits the information to the area switching unit 27.
In the area switching unit 27, on the basis of the information on the target area transmitted from the area selecting unit 26, position information of the target area TAR selected from the spatial coordinate data holding unit 23, position information of each of the microphone arrays MA1 and MA2, and position information of the microphones M1, M2, and M3 included in each of the microphone arrays are acquired. Further, the area switching unit 27 determines combination of microphone arrays and microphones that are necessary for forming the directionality toward the target area, and controls a signal to be input to the directionality forming unit 21.
FIG. 13 shows an example of an image of a situation in which, by use of two microphone arrays MA1 and MA2, each including three microphones according to the fifth embodiment, two areas are switched to pick up a sound.
The microphone array MA1 includes microphones M11, M12, and M13, and the microphone array MA2 includes microphones M21, M22, and M23.
For example, when a target area A is selected by the user, selection information of the target area A is given from the area selecting unit 26 to the area switching unit 27. The area switching unit 27 acquires position information of the selected target area A from the spatial coordinate data holding unit 23.
In this case, the microphone arrays MA1 and MA2 which can form the directionality in the target area A are selected from the area selecting unit 26, and position information of the microphone arrays MA1 and MA2 and position information of the microphones M11, M12, and M13 of the microphone array MA1 and of the microphones M21, M22, and M23 of the microphone array MA2 are acquired from the spatial coordinate data holding unit 23. As a selection method of the microphone arrays MA1 and MA2, for example, in a case where a plurality of microphone arrays are disposed, given two microphone arrays MA1 and MA2 may be selected or the microphone arrays MA1 and MA2 which can form the directionality according to the target area may be determined beforehand.
Next, the area switching unit 27 controls input signals to the directionality forming unit 21 such that the bidirectionality is formed by combination of the microphones M12 and M13 of the microphone array MA1 and the microphones M22 and M23 of the microphone array MA2 and the unidirectionality is formed by combination of the microphones M11 and M12 of the microphone array MA1 and the microphones M21 and M22 of the microphone array MA2.
In accordance with an instruction from the area switching unit 27, the directionality forming unit 21 inputs the input signals from the data input unit 1 to the bidirectionality forming unit 3 and the unidirectionality forming unit 4, thereby forming the bidirectionality and the unidirectionality.
Meanwhile, in a case where a target area B is selected, the area switching unit 27 controls input signals to the directionality forming unit 21 such that the bidirectionality is formed by combination of the microphones M11 and M12 of the microphone array MA1 and the microphones M21 and M22 of the microphone array MA2 and the unidirectionality is formed by combination of the microphones M12 and M13 of the microphone array MA1 and the microphones M22 and M23 of the microphone array MA2, thereby switching the sound pickup area. Also in this case, the directionality forming unit 21 inputs the input signals from the data input unit 1 to the bidirectionality forming unit 3 and the unidirectionality forming unit 4 in accordance with an instruction from the area switching unit 27, thereby forming the bidirectionality and the unidirectionality.
Further, in a case where the target area A and the target area B are selected at the same time as the target area, the area switching unit 27 makes instructions by selecting combination of microphone arrays and microphones in parallel for each of the selected target areas. Thus, the bidirectionality and the unidirectionality for each of the selected target areas can be formed.
(F-3) Effects of the Fifth Embodiment
As described above, according to the fifth embodiment, in addition to the effects of the fourth embodiment, by changing the directional direction of each microphone array, it is possible to pick up a sound in another area without moving the microphone arrays.
(G) Other Embodiments
Although a variety of modified embodiments are described in the above embodiments, the following modified embodiments can be further given.
Each of the above-described embodiments is made by including the signal adding unit 2; however, the signal adding unit 2 may be omitted in a case where the input signal to be given to the target signal extracting unit 6 is used as a signal captured by the microphone M1 or M2.
Although the fourth and fifth embodiments show cases where the microphone array in which three microphones are disposed at the vertexes of an isosceles right triangle is used, a microphone array in which three microphones are disposed at the vertexes of a regular triangle may be used. In this case, the directionality forming unit 21 includes the signal adding unit 2, the bidirectionality forming unit 3, the unidirectionality forming unit 4 (4-1 and 4-2), the overlapped directionality canceling unit 5, and the target signal extracting unit 6, which are described in the second or third embodiment, and the target signal may be extracted through the operations described in the second or third embodiment.
Although the fourth and fifth embodiments show two microphone arrays, three or more microphone arrays may be used. For example, in a case where three microphones are used, the target area sound may be determined from three target area sounds in total, which are the target area sound obtained from first and second microphone arrays by the method shown in the fourth and fifth embodiments and the target area sounds obtained from the second microphone array and a third microphone array by the method shown in each of the embodiments.
In each of the above embodiments, the sound signal captured by the microphone is processed in real time; however, the sound signal captured by the microphone may be stored in a storage medium and is then read out from the storage medium to be processed, thereby obtaining the emphasized signal of the target sound or the target area sound. In a case where a storage medium is used in this manner, the position where the microphone is set may be away from the position where the process of extracting the target sound or the target area sound is performed. Similarly, even in a case where the process is performed in real time, the position where the microphone is set may be away from the position where the process of extracting the target sound or the target area sound is performed, and a signal may be supplied to a remote area by communication.
The case where the above-described storage medium or communication is used is also included in the concept of the sound pickup apparatus according to an embodiment of the present invention.
Heretofore, preferred embodiments of the present invention have been described in detail with reference to the appended drawings, but the present invention is not limited thereto. It should be understood by those skilled in the art that various changes and alterations may be made without departing from the spirit and scope of the appended claims.


1. A sound source separating apparatus comprising:
a bidirectionality forming unit configured to form a bidirectionality having a dead angle in a target direction by use of a sound signal picked up by two microphones which are located to be horizontal with respect to the target direction, among three microphones disposed at vertexes of an isosceles right triangle;
a unidirectionality forming unit configured to form a unidirectionality having a dead angle in the target direction by use of a sound signal picked up by two microphones which are located in a same direction as the target direction, among the three microphones;
an overlapped directionality canceling unit configured to cancel a signal component overlap between an output from the bidirectionality forming unit and an output from the unidirectionality forming unit by performing a spectral subtraction of the output from the unidirectionality forming unit from the output from the bidirectionality forming unit or by performing a spectral subtraction of the output from the bidirectionality forming unit from the output from the unidirectionality forming unit, and
a target sound extracting unit configured to extract a target sound by performing a spectral subtraction of the output from the overlapped directionality canceling unit from either one of sound signals picked up by the two microphones located to be horizontal with respect to the target direction or a signal obtained by averaged sound signals picked up by the two microphones located to be horizontal with respect to the target direction.
2. A sound source separating apparatus comprising:
a bidirectionality forming unit configured to form a bidirectionality having a dead angle in a target direction by use of a sound signal picked up by two microphones which are located to be horizontal with respect to the target direction, among three microphones disposed at vertexes of a regular triangle;
a unidirectionality forming unit configured to form two unidirectionalities having dead angles of +60° and −60° with respect to the target direction by use of a sound signal picked up by a combination of two microphones which are located at angles of +60° and −60° with respect to the target direction, among the three microphones;
an overlapped directionality canceling unit configured to cancel a signal component overlap between an output from the bidirectionality forming unit and an output from the unidirectionality forming unit by performing a spectral subtraction of the output from the unidirectionality forming unit from the output from the bidirectionality forming unit or by performing a spectral subtraction of the output from the bidirectionality forming unit from the output from the unidirectionality forming unit, and
a target sound extracting unit configured to extract a target sound by performing a spectral subtraction of the output from the overlapped directionality canceling unit from either one of sound signals picked up by the two microphones located to be horizontal with respect to the target direction or a signal obtained by averaged sound signals picked up by the two microphones located to be horizontal with respect to the target direction.
3. A sound source separating apparatus comprising:
a bidirectionality forming unit configured to form a bidirectionality having a dead angle in a target direction by use of a sound signal picked up by two microphones which are located to be horizontal with respect to the target direction, among three microphones disposed at vertexes of a regular triangle;
a unidirectionality forming unit configured to form a unidirectionality having a dead angle in the target direction by use of a signal obtained by averaged sound signals picked up by two microphones which are located to be horizontal with respect to the target direction and a sound signal picked up by the other microphone, among the three microphones;
an overlapped directionality canceling unit configured to cancel a signal component overlap between an output from the bidirectionality forming unit and an output from the unidirectionality forming unit by performing a spectral subtraction of the output from the unidirectionality forming unit from the output from the bidirectionality forming unit or by performing a spectral subtraction of the output from the bidirectionality forming unit from the output from the unidirectionality forming unit, and
a target sound extracting unit configured to extract a target sound by performing a spectral subtraction of the output from the overlapped directionality canceling unit from either one of sound signals picked up by the two microphones located to be horizontal with respect to the target direction or a signal obtained by averaged sound signals picked up by the two microphones located to be horizontal with respect to the target direction.
4. A sound source separating apparatus comprising:
a bidirectionality forming unit configured to form a bidirectionality having a dead angle in a target direction by use of a sound signal picked up by two microphones which are located to be horizontal with respect to the target direction, among three microphones disposed at vertexes of a triangle;
a unidirectionality forming unit configured to form a unidirectionality having a dead angle in the target direction by use of a sound signal picked up by two microphones among the three microphones;
an overlapped directionality canceling unit configured to cancel a signal component overlap between an output from the bidirectionality forming unit and an output from the unidirectionality forming unit by performing a spectral subtraction of the output from the unidirectionality forming unit from the output from the bidirectionality forming unit or by performing a spectral subtraction of the output from the bidirectionality forming unit from the output from the unidirectionality forming unit, and
a target sound extracting unit configured to extract a target sound by performing a spectral subtraction of the output from the overlapped directionality canceling unit from either one of sound signals picked up by the two microphones located to be horizontal with respect to the target direction or a signal obtained by averaged sound signals picked up by the two microphones located to be horizontal with respect to the target direction.

 

 

Patent trol of patentswamp
Similar patents
disclosed herein is an apparatus. the apparatus includes a housing, electronic circuitry, and an audio-visual source tracking system. the electronic circuitry is in the housing. the audio-visual source tracking system includes a first video camera and an array of microphones. the first video camera and the array of microphones are attached to the housing. the audio-visual source tracking system is configured to receive video information from the first video camera. the audio-visual source tracking system is configured to capture audio information from the array of microphones at least partially in response to the video information. the audio-visual source tracking system might include a second video camera that is attached to the housing, wherein the first and second video cameras together estimate the beam orientation of the array of microphones.
a reference signal generating unit of an active-noise-reduction device of the present invention outputs a referencing signal having a correlation with a vibration to an adaptive filter unit. a filter coefficient update unit receives an input of an error signal, and successively updates a filter coefficient of the adaptive filter unit. the error signal is generated by a cancelling sound based on the output of the adaptive filter unit and noise. the detection unit detects a filter coefficient of the filter coefficient update unit, and determines a size of the output of the adaptive filter unit. then, the amplitude of the cancelling sound is adjusted based on the size of the output of the adaptive filter unit estimated by the detection unit.
a method of adjusting an anc system is disclosed in which a microphone is acoustically coupled to a loudspeaker via a secondary path and the loudspeaker is electrically coupled to the microphone via an anc filter. the method includes measuring phase characteristics of the secondary path in various modes of operation; determining from the measured phase characteristics a statistical dispersion of the phase characteristics in the various modes of operation; determining from the statistical dispersion a minimum phase margin; adjusting the anc filter to exhibit in any one of the modes of operation phase characteristics that are equal to or greater than the minimum phase margin; and adjusting the anc filter to exhibit in any one of the modes of operation amplitude characteristics that are equal to or smaller than a maximum gain margin.
methods and apparatuses for addressing open space noise are disclosed. in one example, a method for masking open space noise includes outputting from a speaker a speaker sound corresponding to a flow of water, and displaying a water element system, the water element system generating a sound of flowing water.
a noise suppressing apparatus that calculates a suppression coefficient for suppressing noise of an input signal by using a frequency spectrum of the input signal includes a frequency converting section that converts the input signal into a frequency spectrum; a noise level estimating section that calculates an estimated noise level of the input signal; a weight coefficient calculating section that calculates n weight coefficients at predetermined intervals; and a suppression coefficient calculating section that calculates a joint distribution model of sound by weighting n statistical distribution models with the n weight coefficients, derives an estimation expression for a sound spectrum of the input signal on the basis of posteriori probability using the calculated joint distribution model of sound as priori probability, and calculates the suppression coefficient on the basis of the derived estimation expression and level of the input signal.
Condenser microphone // US9445188
a condenser microphone that provides a balanced output of audio signals from initial steps of a diaphragm and a fixed electrode is provided. the condenser microphone includes: a condenser microphone unit including a diaphragm being arranged opposite a fixed electrode; a first impedance converter being connected to the fixed electrode of the condenser microphone unit and outputting a first electric signal generated in the fixed electrode; and a second impedance converter being connected to the diaphragm of the condenser microphone unit and outputting a second electric signal generated in the diaphragm. by this structure, balanced outputs of the audio signals having phases reverse to each other are provided by the first and second impedance converters immediately after the condenser microphone unit.
a method of processing an audio signal, the method including receiving a downmix signal and a first information, the downmix signal including at least one object, the first information including object information indicating an attribute of the at least one object; receiving a second information, the second information including external preset information and applied object number information, the external preset information being an external input and including an external preset rendering parameter and external preset metadata, the applied object number information indicating a number of objects to which the external preset information is applied; generating downmix processing information controlling panning or gain of the downmix signal by using the object information and the external preset information based on the applied object number information; and modifying the downmix signal by using the downmix processing information.
a system for audio content delivery to an in-the-ear device from a local computing device. also, a system for audio content delivery to an in-the-ear device from a content delivery network. the in-the-ear device is sized and shaped such that it universally and ergonomically fits into the human ear without slipping out and provides the user with a comfortable fit. the in-the-ear device is secured in the user's ear taking advantage of the natural curvature of the human to provide support and shift the center of gravity from outside the ear to further inside the pinna to prevent the device from slipping out while retaining a high level of comfort.
according to embodiments described herein, a circuit includes an interface circuit configured to be coupled to a transducer and a detection circuit. the interface circuit is configured to provide a digital output signal to a signal input terminal of a processing circuit. the detection circuit is configured to receive the digital output signal and provide a low power enable signal to a low power enable terminal of the processing circuit. in the various embodiments, the digital output signal is based on a transduced signal from the transducer and the low power enable signal is determined by comparing the digital output signal with a first threshold.
an actuation system for generating a physical effect, the system comprising at least one array of translating elements each constrained to travel alternately back and forth along a respective axis, toward first and second extreme positions respectively, in response to activation of first and second forces respectively; and a controller operative to use the first and second forces to selectably latch at least one subset of said translating elements into the first and second extreme positions respectively.
To top