Method and apparatus for determining dominant sound source directions in a higher order Ambisonics representation of a sound field

 

In Higher Order Ambisonics, a problem is the tracking of time variant directions of dominant sound sources. The following processing is carried out: from a current time frame of HOA coefficients, estimating a directional power distribution of dominant sound sources, from said directional power distribution and from an a-priori probability function for dominant sound source directions, computing an a-posteriori probability function for said dominant sound source directions, depending on said a-posteriori probability function and on dominant sound source directions for the previous time frame, searching and assigning dominant sound source directions for said current time frame of said HOA coefficients.

 

 

This application claims the benefit, under 35 U.S.C. §365 of International Application PCT/EP2013/074039, filed Nov. 18, 2013, which was published in accordance with PCT Article 21(2) on Jun. 5, 2014 in English and which claims the benefit of European patent application No. 12306485.9, filed Nov. 29, 2012.
The invention relates to a method and to an apparatus for determining dominant sound source directions in a Higher Order Ambisonics representation of a sound field.
BACKGROUND
Higher Order Ambisonics (HOA) is a representation of the acoustic pressure of a sound field within the vicinity of the origin of a virtual coordinate system in the three dimensional space, which is called the sweet spot. Such HOA representation is independent of a specific loudspeaker set-up, in contrast to channel-based techniques like stereo or surround. But this flexibility is at the expense of a decoding process required for playback of the HOA representation on a particular loudspeaker set-up.
A sound field is generated in a room or in the outside by one or more sound sources: e.g. by a single voice or music instrument, or by an orchestra, or by any noise producers like traffic and/or trees in the wind. As soon as any sound waves are generated, a sound field will be produced.
HOA is based on the description of the complex amplitudes of the air pressure for individual angular wave numbers for positions in the vicinity of a desired listener position, using a truncated Spherical Harmonics expansion. The spatial resolution of this representation improves with a growing maximum order N of the expansion.
A problem is the tracking of the time variant directions (with respect to the coordinate origin) of the dominant sound sources. Such a problem arises for example in the context of the compression of an HOA representation based on its decomposition into a directional and an ambient component, which processing has been described in patent application EP 12305537.8.
It is assumed that from the HOA representation a temporal sequence of spherical likelihood functions is computed that provides the likelihood for the occurrence of dominant sound sources at a high number of predefined directions. Such a likelihood function can be the directional power distribution of the dominant sources, cf. EP 12305537.8.
Then the problem to be solved is determining from the spherical likelihood functions a number of temporal sequences of direction estimates related to the dominant sound sources, which can be used to extract the directional component from the HOA sound field representation. The particular challenges of this problem are two-fold: to provide relatively smooth temporal trajectories of direction estimates, i.e. to avoid outliers in the direction trajectories, which might occur due to direction estimation errors, and to accurately capture abrupt direction changes or directions related to onsets of new directional signals.
In EP 12305537.8 an estimation of temporal sequences of direction estimates related to the dominant sound sources is described. Its principle is illustrated in FIG. 1. The processing starts in step or stage 11 with estimating from a time frame C(l) of HOA coefficients a directional power distribution σ2(l) with respect to the dominant sound sources, where lε
denotes the frame index. From σ2(l), the directional power distribution is computed for a predefined number of Q discrete test directions Ωq, q=1, . . . , Q, which are nearly equally distributed on the unit sphere. Each test direction Ωq is defined as a vector containing an inclination angle θqε[0,π] and an azimuth angle φqε[0,2π] according to
Ωq:=(θqq)T.  (1) The directional power distribution is represented by the vector
σ2(l):=(σ2(l,Ω1), . . . ,σ2(l,ΩQ))T,  (2)
whose components σ2(l,Ωq) denote the joint power of all dominant sound sources related to the direction Ωq for the l-th time frame.
An example of a directional power distribution resulting from two sound sources obtained from an HOA representation of order 4 is illustrated in FIG. 2, where the unit sphere is unrolled so as to represent the inclination angle θ on the y-axis and the azimuth angle φ on the x-axis. The brightness indicates the power on a logarithmic scale (i.e. in dB). Note the spatial power dispersion (i.e. the limited spatial resolution) resulting from a limited order of 4 of the underlying HOA representation.
Depending on the estimated directional power distribution σ2(l) of the dominant sound sources, in FIG. 1 a predefined number D of dominant sound source directions {circumflex over (Ω)}DOM,1(l), . . . , {circumflex over (Ω)}DOM,D(l) are computed in step/stage 12, which are arranged in the matrix A{circumflex over (Ω)}(l) as
A{circumflex over (Ω)}(l):=[{circumflex over (Ω)}DOM,1(l) . . . {circumflex over (Ω)}DOM,D(l)].  (3)
Thereafter in step/stage 13 the estimated directions {circumflex over (Ω)}DOM,d(l), d=1, . . . , D, are assigned to the appropriate smoothed directions {circumflex over (Ω)}DOM,d(l−1) from the previous frame, and are smoothed with them in order to obtain the smoothed directions {circumflex over (Ω)}DOM,d(l). The smoothed directions {circumflex over (Ω)}DOM,d(l−1) from the previous frame are determined from matrix A{circumflex over (Ω)}(l−1) output from HOA coefficient frame delay 14 that receives A{circumflex over (Ω)}(l) at its input. Such smoothing is accomplished by computing the exponentially-weighted moving average with a constant smoothing factor. The smoothed directions are arranged in the matrix A{circumflex over (Ω)}(l) output from step/stage 13 as follows:
A{circumflex over (Ω)}(l):=[{circumflex over (Ω)}DOM,1(l) . . . {circumflex over (Ω)}DOM,D(l)].  (4)
EP 2469741 A1 describes a method for compression of HOA presentations by using a transformation into signals of general plane waves which are coming from pre-defined directions.
Invention
The major problem with this processing is that, due to the constant smoothing factor, it is not possible to capture accurately abrupt direction changes or onsets of new dominant sounds. Although a possible option would be to employ an adaptive smoothing factor, a major remaining problem is how to adapt the factor exactly.
A problem to be solved by the invention is to determine from spherical likelihood functions temporal sequences of direction estimates related to dominant sound sources, which can be used for extracting the directional component from a HOA sound field representation. This problem is solved by the method disclosed in claim 1. An apparatus that utilises this method is disclosed in claim 2.
The invention improves the robustness of the direction tracking of multiple dominant sound sources for a Higher Order Ambisonics representation of the sound field. In particular, it provides smooth trajectories of direction estimates and contributes to the accurate capture of abrupt direction changes or directions related to onsets of new directional signals.
“Dominant” means that (for a short period of time) the respective sound source contributes to the total sound field by creating a general acoustic plane with high power from the direction of arrival. That is why for the direction tracking the directional power distribution of the total sound field is analysed.
More general, the invention can be used for tracking arbitrary objects (not necessarily sound sources) for which a directional likelihood function is available.
The invention overcomes the two above-mentioned problems: it provides relatively smooth temporal trajectories of direction estimates and it is able to capture abrupt direction changes or onsets of new directional signals. The invention uses a simple source movement prediction model and combines its information with the temporal sequence of spherical likelihood functions by applying the Bayesian learning principle.
In principle, the inventive method is suited for determining dominant sound source directions in a Higher Order Ambisonics representation denoted HOA of a sound field, said method including the steps:
    • from a current time frame of HOA coefficients, estimating a directional power distribution with respect to dominant sound sources;
    • from said directional power distribution and from an a-priori probability function for dominant sound source directions, computing an a-posteriori probability function for said dominant sound source directions;
    • depending on said a-posteriori probability function and on dominant sound source directions for the previous time frame of said HOA coefficients, searching and assigning dominant sound source directions for said current time frame of said HOA coefficients,
    • wherein said a-priori probability function is computed from a set of estimated sound source movement angles and from said dominant sound source directions for the previous time frame of said HOA coefficients,
    • and wherein said set of estimated sound source movement angles is computed from said dominant sound source directions for the previous time frame of said HOA coefficients and from dominant sound source directions for the penultimate time frame of said HOA coefficients.
In principle the inventive apparatus is suited for determining dominant sound source directions in a Higher Order Ambisonics representation denoted HOA of a sound field, said apparatus including:
    • means being adapted for estimating from a current time frame of HOA coefficients a directional power distribution with respect to dominant sound sources;
    • means being adapted for computing from said directional power distribution and from an a-priori probability function for dominant sound source directions an a-posteriori probability function for said dominant sound source directions;
    • means being adapted for searching and assigning, depending on said a-posteriori probability function and on dominant sound source directions for the previous time frame of said HOA coefficients, dominant sound source directions for said current time frame of said HOA coefficients;
    • means being adapted for computing said a-priori probability function from a set of estimated sound source movement angles and from said dominant sound source directions for the previous time frame of said HOA coefficients;
    • means being adapted for computing said set of estimated sound source movement angles from said dominant sound source directions for the previous time frame of said HOA coefficients and from dominant sound source directions for the penultimate time frame of said HOA coefficients.
Advantageous additional embodiments of the invention are disclosed in the respective dependent claims.
DRAWINGS
Exemplary embodiments of the invention are described with reference to the accompanying drawings, which show in:
FIG. 1 known estimation of dominant source directions for HOA signals;
FIG. 2 exemplary power distribution on the sphere resulting from two sound sources obtained from an HOA representation of order 4;
FIG. 3 basic block diagram of the inventive direction estimation processing.
FIG. 4 a relationship between a concentration parameter and a source movement angle in accordance with various embodiments;
FIG. 5 a shape of a von Mises-Fisher distribution around a mean direction.
EXEMPLARY EMBODIMENTS
In the block diagram of the inventive dominant sound source direction estimation processing depicted in FIG. 3, like in FIG. 1, the directional power distribution σ2(l) with respect to the dominant sound sources is computed from the time frame C(l) of HOA coefficients in a step or stage 31 for estimation of directional power distribution. However, the directions of the dominant sound sources {circumflex over (Ω)}DOM,d(l), d=1, . . . , D, are not computed like in step/stage 12 in FIG. 1 directly from the directional power distribution σ2(l), but from a-posterior probability function PPOST(l, Ωq) calculated in step/stage 32, which provides the posterior probability that any of the dominant sound sources is located at any test direction Ωq at time frame l. The values of the posterior probability function for all test directions at a specific time frame l are summarized in the vector PPOST(l) as follows:
PPOST(l): =[PPOST(l,Ω1) . . . PPOST(l,ΩQ)].  (5)
There is no explicit smoothing of the estimated directions {circumflex over (Ω)}DOM,d(l), but rather an implicit smoothing which is performed in the computation of the posterior probability function. Advantageously, this implicit smoothing can be regarded as a smoothing with an adaptive smoothing constant, where the smoothing constant is automatically optimally chosen depending on a sound source movement model.
The a-posterior probability function PPOST(l) is computed in step/stage 32 according to the Bayesian rule from the directional power distribution σ2(l) and from an a-priori probability function PPRIO(l,Ωq), which predicts depending on the knowledge at frame l−1 the probability that any of the dominant sound sources is located at any test direction Ωq at time frame l.
The term “a-priori probability” denotes knowledge about the prior distribution (see e.g. http://en.wikipedia.org/wiki/A_priori_probability) and is well established in the context of Bayesian data analysis, see e.g. A. Gelman, J. B. Carlin, H. S. Stern, D. B. Rubin, “Texts in Statistical Science, Bayesian Data Analysis”, Second Edition, Chapman&Hall/CRC, 29 Jul. 2003. In the context of this application it means the probability that any of the dominant sound sources is located at any test direction Ωq at time frame l temporally before the observation of the l-th frame.
In the ‘Bayesian inference’ Bayes' rule is used for updating the probability estimate for a hypothesis as additional evidence is acquired, cf. http://en.wikipedia.org/wiki/Bayesian_inference.
The term “a-posteriori probability” denotes the conditional probability that is assigned after the relevant evidence is taken into account (see e.g. http://en.wikipedia.org/wiki/A_posteriori_probability) and is also well established in the context of Bayesian data analysis. In the context of the invention it means the posterior probability that any of the dominant sound sources is located at any test direction Ωq at time frame l temporally after the observation of the l-th frame.
The values of the a-priori probability function for all test directions at a specific time frame l are calculated in step/stage 37 and are summarized in the vector PPRIO(l) as follows:
PPRIO(l):=[PPRIO(l,Ω1)PPRIO(l,ΩQ)].  (6)
Step/stage 37 receives as input signals matrix A{circumflex over (Ω)}(l−1) from a frame delay 34 that gets matrix A{circumflex over (Ω)}(l) as input from a step or stage 33 for search and assignment of dominant sound source directions, and gets vector a{circumflex over (Θ)}(l−1) from a source movement angle estimation step or stage 36.
The a-priori probability function PPRIO(l, Ωq) computed in step/stage 37 is based on a simplified sound source movement prediction model calculated in step/stage 36, which requires estimates of the dominant sound source directions for the previous time frame l−1 of the HOA coefficients, i.e. {circumflex over (Ω)}DOM,d(l−1), d=1, . . . , D represented by matrix A{circumflex over (Ω)}(l−1), as well as estimates of the angles {circumflex over (Θ)}d(l−1), d=1, . . . , D, of sound source movements from penultimate frame l−2 to previous frame l−1 of the HOA coefficients. These sound source movement angles are defined by
{circumflex over (Θ)}d(l−1):=∠({circumflex over (Ω)}DOM,d(l−1),{circumflex over (Ω)}DOM,d(l−2))  (7)
and are arranged in vector a{circumflex over (Θ)}(l−1) as follows:
a{circumflex over (Θ)}(l−1):=[{circumflex over (Θ)}1(l−1) . . . {circumflex over (Θ)}D(l−1)]T.  (8)
The dominant sound source directions at time frame l−2, i.e. {circumflex over (Ω)}DOM,d(l−2), d=1, . . . , D are represented by matrix A{circumflex over (Ω)}(l−2), which is received via frame delay 35 from the output of frame delay 34.
Source Movement Prediction Model
The source movement prediction model and the respective computation of the a-priori probability function calculated in step/stage 37 are determined as follows.
A statistical source movement prediction model is assumed. For simplifying the explanation of this model, the single source case is considered first, and the more relevant multi-source case is described afterwards.
Single-Source Case
It is assumed that only the d-th sound source denote by sd of a total of D sound sources is tracked. It is further assumed that an estimate {circumflex over (Ω)}DOM,d(l−1) of its direction at time frame l−1 is available and additionally an estimate of its movement angle {circumflex over (Θ)}d(l−1) covered between the time frames l−2 and l−1.
The predicted probability of the direction of sd at time frame l is assumed to be given by the following discrete von Mises-Fisher distribution (see the corresponding below section for a detailed explanation of that distribution):
P Ω ~ DOM , d ( l ) PRIO , SINGLE ( Ω q ) := P Ω ~ DOM , d ( l ) | Ω ~ DOM , d ( l - 1 ) , Θ ^ d ( l - 1 ) ( Ω q ) ( 9 ) := ( κ d ( l - 1 ) Q · sin h ( κ d ( l - 1 ) ) · exp { κ d ( l - 1 ) · cos ( Θ q , d ) } if κ d ( l - 1 ) 0 1 Q if κ d ( l - 1 ) = 0 . ( 10 )
In equations (9) and (10), {tilde over (Ω)}DOM,d(l) denotes the discrete random variable indicating the direction of the d-th source at the l-th time frame, which can only have the values Ωq, q=1, . . . , Q. Hence, formally the right hand side expression in (9) denotes the probability with which the random variable {tilde over (Ω)}DOM,d(l) assumes the value Ωq, given that the values {tilde over (Ω)}DOM,d(l−1) and {circumflex over (Θ)}d(l−1) are known.
In equation (10), Θq,d denotes the angle distance between the estimated direction ΩDOM,d(l−1) and the test direction, which is expressed as follows:
Θq,d:=∠(ΩqDOM,d(l−1)).  (11)
The concentration of the distribution around the mean direction is determined by the concentration parameter κd(l−1). The concentration parameter determines the shape of the von Mises-Fisher distribution. For κd(l−1)=0, the distribution is uniform on the sphere. The concentration increases with the value of κd(l−1). For κd(l−1)>0, the distribution is uni-modal and circular symmetric, and centred about the mean direction ΩDOM,d(l−1). The variable κd(l−1) can be computed from the movement angle estimate {circumflex over (Θ)}d(l−1). An example for such computation is presented below.
The a-priori probability function P{tilde over (Ω)}DOM,dPRIO,SINGLEq) satisfies
Σq=1QP{tilde over (Ω)}DOM,dPRIO,SINGLEq)=1.  (12)
Computation of Concentration Parameter
One way of computing the concentration parameter is postulating that the ratio of the values of the a-priori probability evaluated at {circumflex over (Ω)}DOM,d(l−2) and {circumflex over (Ω)}DOM,d(l−1) is satisfying a constant value CR:
PΩ~DOM,d(l)PRIO,SINGLE(Ω^DOM,d(l-2))PΩ~DOM,d(l)PRIO,SINGLE(Ω^DOM,d(l-1))=!CR,(13)
where 0R<1 because the a-priori probability has its maximum at {circumflex over (Ω)}DOM,d(l−1). By using equations (10) and (7), equation (13) can be reformulated:
exp{κd(l−1)[cos({circumflex over (Θ)}d(l−1))−1]}
CR,  (14)
which provides the desired expression for the concentration parameter κ d ( l - 1 ) = ln ( C R ) cos ( Θ ^ d ( l - 1 ) ) - 1 . ( 15 )
The principle behind this computation is to increase the concentration of the a-priori probability function the less the sound source has moved before. If the sound source has moved significantly before, the uncertainty about its successive direction is high and thus the concentration parameter shall get a small value.
In order to avoid the concentration becoming too high (especially becoming infinitely large for {circumflex over (Θ)}d(l−1)=0), it is reasonable to replace equation (15) by
κd(l-1)=ln(CR)cos(Θ^d(l-1))-1-CD,(16)
where CD may be set to
CD=ln(CR)-κMAX(17)
in order to obtain a maximum value κMAX of the concentration parameter for a source movement angle of zero. The following values have been experimentally found to be reasonable:
κMAX=8 CR=0.5.  (18)
In any case, κMAX>0, and 0R<1 as mentioned above. The resulting relationship between the concentration parameter κd(l−1) and the source movement angle {circumflex over (Θ)}d(l−1) is shown in FIG. 4.
Multi-Source Case
Now it is assumed that the aim is tracking D dominant sound sources sd, d=1, . . . , D with directions independent of each other. If it is further assumed that, according to the considerations in the single-source case section, the probability of the d-th sound source being located at direction Ωq in the l-th time frame is given by P{tilde over (Ω)}DOM,dPRIO,SINGLEq), and it can be concluded that the probability of no sound source being located at direction Ωq in the l-th time frame must be
πd=1D[1−P{tilde over (Ω)}DOM,dPRIO,SINGLEq)].  (19)
Hence, the probability PPRIO(l, Ωq) of any one of the D sound sources being located at direction Ωq in the l-th time frame is given by
PPRIO(l,Ωq)=1−πd=1D[1−P{tilde over (Ω)}DOM,dPRIO,SINGLEq)].  (20)
Bayesian Learning
Regarding the processing in step/stage 32, Bayesian learning is a general method of inferring posterior information about a quantity from a-priori knowledge, in form of a probability function or distribution and a current observation that is related to the desired quantity and thus provides a likelihood function.
In this special case of tracking dominant sound source directions, the likelihood function is given by the directional power distribution σ2(l). The a-priori probability function PPRIO(l,Ωq) is obtained from the sound source movement model described in section SOURCE MOVEMENT PREDICTION MODEL and is given by equation (20).
According to the Bayesian rule, the a-posteriori probability of any of the D sound sources being located at direction Ωq in the l-th time frame is given by
PPOST(l,Ωq)=PPRIO(l,Ωq)·σ2(l,Ωq)q=1QPPRIO(l,Ωq)·σ2(l,Ωq)(21)PPRIO(l,Ωq)·σ2(l,Ωq),(22)
where ∝ means ‘proportional to’.
In equation (21) the fact is exploited that its denominator does not depend on the test direction Ωq.
Instead of the bare directional power distribution σ2(l), now the posterior probability function PPOST(l, Ωq) can be used for the search of the directions of the dominant sound sources in step/stage 33, which in addition receives matrix A{circumflex over (Ω)}(l−1) and which outputs matrix A{circumflex over (Ω)}(l). That search is more stable because it applies an implicit smoothing onto the directional power distribution. Advantageously, such implicit smoothing can be regarded as a smoothing with adaptive smoothing constant, which feature is optimal with respect to the assumed sound source model.
The following section provides a more detailed description of the individual processing blocks for the estimation of the dominant sound source directions.
Estimation of Directional Power Distribution
The directional power distribution σ2(l) for the l-th time frame and a predefined number Q of test directions Ωq, q=1, . . . , Q, which are nearly uniformly distributed on the unit sphere, is estimated in step/stage 31 from the time frame C(l) of HOA coefficients. For this purpose the method described in EP 12305537.8 can be used.
Computation of a-Posteriori Probability Function for Dominant Source Directions
The values PPOST(l, Ωq), q=1, . . . , Q, of the a-posteriori probability function PPOST(l) are computed in step/stage 32 according to equation (21), using the values PPRIO(l, Ωq), q=1, . . . , Q, of the a-priori probability function PPRIO(l) and the values σ2(l, Ωq), q=1, . . . , Q, of the directional power distribution σ2(l):
PPOST(l,Ωq)=PPRIO(l,Ωq)·σ2(l,Ωq)q=1QPPRIO(l,Ωq)·σ2(l,Ωq).
Computation of a-Priori Probability Function for Dominant Source Directions
The values PPRIO(l, Ωq), q=1, . . . , Q, of the a-priori probability function PPRIO(l) are computed in step/stage 37 from the dominant sound source directions {circumflex over (Ω)}DOM,d(l−1), d=1, . . . , D, in the (l−1)-th time frame, which are contained in the matrix A{circumflex over (Ω)}(l−1), and from the dominant sound source movement angles {circumflex over (Θ)}d(l−1), d=1, . . . , D, which are contained in the vector a{circumflex over (Θ)}(l−1), according to equation (20) as
PPRIO(l,Ωq)=1−πd=1D[−P{tilde over (Ω)}DOM,dPRIO,SINGLEq)],
where P{tilde over (Ω)}DOM,dPRIO,SINGLEq) is computed according to equation (10) as
PΩ~DOM,d(l)PRIO,SINGLE(Ωq)=(κd(l-1)O·sinh(κd(l-1))·exp{κd(l-1)·cos(θq,d)}ifκd(l-1)01Oifκd(l-1)=0.
with Θq,d:=∠(Ωq, {circumflex over (Ω)}DOM,d)(l−1)).
The concentration parameters κd(l−1) of the individual probability functions P{tilde over (Ω)}DOM,dPRIO,SINGLEq) are obtained as
κd(l-1)=ln(CR)cos(θ^d(l-1))-1-CD,
where CD is set to
CD=ln(CR)-κMAX
with κMAX=8 and CR=0.5.
Concerning the initialisation of the concentration parameter, it should be noted that for the first two frames, i.e. l=1 and l=2, the source movement angle estimates {circumflex over (Θ)}d(0) and {circumflex over (Θ)}d(1) are not yet available. For these first two frames, the concentration parameter is set to zero, i.e., κd(0)=κd(1)=0 for all d=1, . . . , D, thereby assuming a uniform a-priori probability distribution for all dominant directions.
Source Movement Angle Estimation
The movement angles {circumflex over (Θ)}d(l−1), d=1, . . . , D, of the dominant sound sources, which are contained in the vector a{circumflex over (Θ)}(l−1), are computed according to equation (7) by
{circumflex over (Θ)}d(l−1):=∠(ΩDOM,d(l−1),{circumflex over (Ω)}DOM,d(l−2)).
Search and Assignment of Dominant Sound Source Directions
In step/stage 36, the current dominant directions {circumflex over (Ω)}CURRDOM,d(l), d=1, . . . , D, are searched in a first step and are then assigned to the appropriate sources, i.e. to the directions found in the previous frame {circumflex over (Ω)}DOM,d(l−1), d=1, . . . , D.
Search of Directions
In step/stage 37, the search of the dominant sound source direction is depending on the a-posteriori probability function PPOST(l), not on the directional power distribution σ2(l). As an example, the direction search method described in EP 12305537.8 can be used. This processing assumes that the dominant sound source directions are pair-wise separated by at least an angle distance of ΘMIN:=π/N, where N denotes the order of the HOA representation. This assumption origins from the spatial dispersion of directional signals resulting from a spatial band limitation due to a bounded HOA representation order. According to EP 12305537.8, the first dominant direction {circumflex over (Ω)}CURRDOM,1(l) is set to that with the maximum value of the a-posteriori probability function PPOST(l), i.e.
{circumflex over (Ω)}CURRDOM,1(l)=Ωq1 with q1:=argmax
and : ={1, . . . ,Q}.  (23) For the search of the second dominant direction {circumflex over (Ω)}CURRDOM,2(l) all test directions Ωq in the neighbourhood of {circumflex over (Ω)}CURRDOM,1(l) with ∠(Ωq,{circumflex over (Ω)}CURRDOM,1(l))≦ΘMIN are excluded. Then, the second dominant direction {circumflex over (Ω)}CURRDOM,2(l) is set to that with the maximum power in the remaining direction set

2: ={qε |∠(Ωq,{circumflex over (Ω)}CURRDOM,1(l))>ΘMIN}.  (24) The remaining dominant directions are determined in an analogous way.
The overall procedure for the computation of all dominant directions is summarised by the following program:
Algorithm 1
Search of dominant directions based on
the a posteriori probability function
d=11={1,2,,Q}repeatqd=argmaxqdPPOST(l,Ωq)Ω^CURRDOM,d(l)=Ωqdd+1={qd|(Ωq,Ωqd)>θMIN}untild>D

Assignment of Directions
After having found all current dominant sound source direct ions {circumflex over (Ω)}CURRDOM,d(l), d=1, . . . , D, these directions are assigned in step/stage 33 to the dominant sound source directions {circumflex over (Ω)}DOM,d(l−1), d=1, . . . , D from the previous frame (l−1) contained in matrix A{circumflex over (Ω)}(l−1). The assignment function
: {1, . . . , D}→{1, . . . , D} is determined such that the sum of angles between assigned directions d=1D<(Ω^CURRDOM,d(l),Ω^DOM,f??,1(d)(l-1))(25)
is minimised. Such an assignment problem can be solved using the Hungarian algorithm described in H. W. Kuhn, “The Hungarian method for the assignment problem”, Naval research logistics quarterly, vol. 2, pp. 83-97, 1955.
Following computation of the assignment function, the directions {circumflex over (Ω)}DOM,d(l), d=1, . . . , D and the corresponding output matrix A{circumflex over (Ω)}(l) according to equation (4) are obtained by
Ω^DOM,(d)(l):=Ω^CURRDOM,f??,l-1(d)(l)ford=1,,D,(26)
where
(•) denotes the inverse assignment function. It should be noted that for the first time frame, i.e. l=1, the estimates of the dominant sound source directions from the previous time frame are not yet available. For this frame the assignment should not be based on the direction estimates from the previous frames, but instead can be chosen arbitrary. I.e., in an initialization phase the direction estimates of the dominant sound source directions are chosen arbitrarily for a non-available previous time frame of said HOA coefficients (C(l)).
Regarding equations (9) and (10), the von Mises-Fisher distribution on the unit sphere
2:={xε 3|∥x∥=1} in the three-dimensional Euclidean space 3 is defined by: fMF,κ,x0(x):=κ4π·sinh(κ)exp{κ·x0Tx}forx??2.(27)
where (•)T denotes transposition, κ≧0 is called the concentration parameter and x0ε
3 is called the mean direction, see e.g. Kwang-Il Seon, “Smoothing of an All-sky Survey Map with a Fisher-von Mises Function”, J. Korean Phys. Soc., 2007). For κ=0, the distribution is uniform on the sphere because
lim κ 0 f MF , κ , x 0 ( x ) = 1 4 π . ( 28 )
For κ>0, the distribution is uni-modal and circular symmetric, centred around the mean direction x0. The concentration of the distribution around the mean direction is determined by the concentration parameter κ. In particular, the concentration increases with the value of κ. Because each vector xε
2 has unit modulus, it can be uniquely represented by the direction vector
Ω:=(θ,φ)T  (29)
containing an inclination angle θε[0,π] and an azimuth angle φε[0,2π] of a spherical coordinate system. Hence, by considering the identity
x0Tx=cos(∠(x0,x)),  (30)
where ∠(x0,x) denotes the angle between x0 and x, the von Mises-Fisher distribution can be formulated in an equivalent manner as fMF,Sphere,κ,x0(Ω):=κ4π·sinh(κ)exp{κ·cos((Ω,Ω0))}(31)
with Ω0 representing x0. In the special case where the mean direction points into the direction of the z-axis, i.e. θ0=0, the von Mises-Fisher distribution is symmetrical with respect to the z-axis and depends on the inclination angle θ only:
f MF , Sphere , κ ( θ ) := f MF , Sphere , κ , x 0 ( Ω ) θ 0 = 0 = κ 4 π · sinh ( κ ) exp { κ · cos ( θ ) } . ( 32 )
The shape of the von Mises-Fisher distribution ƒMF,Sphere,κ vs θ around the mean direction is illustrated in FIG. 5 for different values of the concentration parameter κ.
Obviously, the von Mises-Fisher distribution satisfies the condition

2ƒMF,Sphere,κ,x0(Ω)dΩ=1.  (33) This can be seen from
??2fMF,Sphere,κ,x0(Ω)θ0=0Ω=2π0πfMF,Sphere,κ(θ)sin(θ)θ   (34)=κ2·sinh(κ)-11exp{κz}z   (35)=1,   (36)
i.e. the integral of functions over the sphere is invariant with respect to rotations.
A discrete probability function ƒMF,DISC,Sphere,κ,x0 q) having the shape of the von Mises-Fisher distribution ƒMF,Sphere,κ,x0(Ω) can be obtained by spatially sampling the sphere using a number of Q discrete sampling positions (or synonymous sampling directions) Ωq, q=1, . . . , Q, which are approximately uniformly distributed on the unit sphere
2. For assuring the appropriate scaling of ƒMF,DISC,Sphere,κ,x0q) in order to satisfy the property
Σq=1QƒMF,DISC,Sphere,κ,x0q) =1  (37)
of a probability function, the numerical approximation of the integral of fMF,Sphere,κ,x0(Ω) over the sphere
1=∫ 2ƒMF,Sphere,κ,x0(Ω)dΩ≈Σq=1QƒMF,Sphere,κ,x0q)ΔΩ  (38)
is considered, where ΔΩ=4πQ
is the surface area assigned to each spatial sampling direction. Note that the surface area does not depend on the sampling direction Ωq because a nearly uniform sampling was assumed. By comparing equation (38) with equation (37), the desired solution is finally found to be
fMF,DISC,Sphere,κ,x0(Ωq)=ΔΩ·fMF,Sphere,κ,x0(Ωq)       (39)=κQ·sinh(κ)exp{κ·cos((Ωq,Ω0))}       (40)q=1,,Q,
where in the last step equation (31) is substituted.
The inventive processing can be carried out by a single processor or electronic circuit, or by several processors or electronic circuits operating in parallel and/or operating on different parts of the inventive processing.
The invention can be applied e.g. for the compression of three-dimensional sound fields represented by HOA, which can be rendered or played on a loudspeaker arrangement in a home environment or on a loudspeaker arrangement in a cinema.


1. A method for determining dominant sound source directions in a Higher Order Ambisonics representation denoted HOA of a sound field, said method comprising:
from a current time frame of HOA coefficients, estimating a directional power distribution with respect to dominant sound sources;
from said directional power distribution and from an a-priori probability function for dominant sound source directions, computing an a-posteriori probability function for said dominant sound source directions;
depending on said a-posteriori probability function and on dominant sound source directions for the previous time frame of said HOA coefficients, searching and assigning dominant sound source directions for said current time frame of said HOA coefficients,
wherein said a-priori probability function is computed from a set of estimated sound source movement angles and from said dominant sound source directions for the previous time frame of said HOA coefficients, and wherein said set of estimated sound source movement angles is computed from said dominant sound source directions for the previous time frame of said HOA coefficients and from dominant sound source directions for the penultimate time frame of said HOA coefficients.
2. The method according to claim 1, further comprising:
computing said a-posterior probability function according to the Bayesian rule, wherein said a-priori probability function predicts, depending on the knowledge at said previous time frame of said HOA coefficients, the probability that any of the dominant sound sources is located at any test direction at said current time frame of HOA coefficients.
3. The method according to claim 1, further comprising:
calculating said a-priori probability function according to
PPRIO (l,Ωq)=1−Πd=1 D[1−P{circumflex over (Ω)} DOM,d (l) PRIO,SINGLEd)] and determining the probability of any one of D sound sources being located at direction Ωq in said current time frame l of HOA coefficients,
wherein
P Ω ~ DOM , d ( l ) PRIO , SINGLE ( Ω q ) = ( κ d ( l - 1 ) Q · sinh ( κ d ( l - 1 ) ) · exp { κ d ( l - 1 ) · cos ( θ q , d ) } if κ d ( l - 1 ) 0 1 Q if κ d ( l - 1 ) = 0. ,
{circumflex over (Ω)}DOM,d(l) denotes a discrete random variable indicating the direction of the d-th source at the l-th time frame and has values Ωq, q=1, . . . Q,
κd(l−1) is a concentration parameter determining the shape of a von Mises-Fisher distribution around the mean direction,
θq, d denotes the angle distance between an estimated direction
{circumflex over (Ω)}DOM,d(l−1) and a test direction.
4. The method according to claim 2, further comprising:
computing said a-posterior probability function according to:
P POST ( l , Ω q ) = P PRIO ( l , Ω q ) · σ 2 ( l , Ω q ) q = 1 Q P PRIO ( l , Ω q ) · σ 2 ( l , Ω q ) ,
wherein σ2(l) is said directional power distribution.
5. The method according to claim 1, further comprising
carrying out said assigning of dominant sound source directions for said current time frame l of said HOA coefficients by:
following determining of all current dominant sound source directions {circumflex over (Ω)}CURRDOM,d(l), d=1, . . . D, assigning these directions to the dominant sound source directions {circumflex over (Ω)}DOM,d(l−1), d=1, . . . , D from the previous frame, wherein the assignment function :{1, . . . D}→{1, . . . , D} is determined such that the sum of angles
d = 1 D < ( Ω ^ CURRDOM , d ( l ) , Ω ^ DOM , f ?? , 1 ( d ) ( l - 1 ) )
between assigned directions is minimised;
obtaining said dominant sound source directions by
Ω ^ DOM , ( d ) ( l ) := Ω ^ CURRDOM , f ?? , l - 1 ( d ) ( l ) for d = 1 , , D ,
where (•) denotes the inverse assignment function.
6. The method according to claim 3, further comprising:
setting for an initialisation of said concentration parameter for the first two time frames (l=1, l=2) of said HOA coefficients said concentration parameter to zero by κd(0)=κd(1)=0 for all d=1, . . . , D.
7. The method according to claim 1, further comprising:
choosing arbitrarily for an initialisation, for a non-available previous time frame of said HOA coefficients, the direction estimates of said dominant sound source directions.
8. An apparatus for determining dominant sound source directions in a Higher Order Ambisonics representation denoted HOA of a sound field, said apparatus comprising a processor configured to:
estimating from a current time frame of HOA coefficients a directional power distribution with respect to dominant sound sources;
computing from said directional power distribution and from an a-priori probability function for dominant sound source directions an a-posteriori probability function for said dominant sound source directions;
searching and assigning, depending on said a-posteriori probability function and on dominant sound source directions for the previous time frame of said HOA coefficients, dominant sound source directions for said current time frame of said HOA coefficients;
computing said a-priori probability function from a set of estimated sound source movement angles and from said dominant sound source directions for the previous time frame of said HOA coefficients;
computing said set of estimated sound source movement angles from said dominant sound source directions for the previous time frame of said HOA coefficients and from dominant sound source directions for the penultimate time frame of said HOA coefficients.
9. The apparatus according to claim 8, wherein said a-posterior probability function is computed according to the Bayesian rule, and wherein said a-priori probability function predicts, depending on the knowledge at said previous time frame of said HOA coefficients, the probability that any of the dominant sound sources is located at any test direction at said current time frame of HOA coefficients.
10. The apparatus according to claim 8, wherein said a-priori probability function is calculated according to
PPRIO(l,Ωq)=1−Πd=1 D[1−P{tilde over (Ω)} DOM,d (l) PRIO,SINGLEq)] and determines the probability of any one of D sound sources being located at direction Ωq in said current time frame l of HOA coefficients,
and wherein
P Ω ~ DOM , d ( l ) PRIO , SINGLE ( Ω q ) = ( κ d ( l - 1 ) Q · sinh ( κ d ( l - 1 ) ) · exp { κ d ( l - 1 ) · cos ( θ q , d ) } if κ d ( l - 1 ) 0 1 Q if κ d ( l - 1 ) = 0. ,
{tilde over (Ω)}DOM,d(l) denotes a discrete random variable indicating the direction of the d-th source at the l-th time frame and has values Ωq, q=1, . . . , Q,
κd(l−1) is a concentration parameter determining the shape of a von Mises-Fisher distribution around the mean direction,
θq,d denotes the angle distance between an estimated direction
{circumflex over (Ω)}DOM,d(l−1) and a test direction.
11. The apparatus according to claim 9, wherein said a-posterior probability function is computed according to:
P POST ( l , Ω q ) = P PRIO ( l , Ω q ) · σ 2 ( l , Ω q ) q = 1 Q P PRIO ( l , Ω q ) · σ 2 ( l , Ω q ) ,
wherein σ2(l) is said directional power distribution.
12. The apparatus according to claim 8, wherein said assigning of dominant sound source directions for said current time frame of said HOA coefficients is carried out by:
following determining of all current dominant sound source directions {circumflex over (Ω)}CURRDOM,d(l), d=1, . . . , D, assigning these directions to the dominant sound source directions ΩDOM,d(l−1), d=1, . . . , D from the previous frame, wherein the assignment function :{1, . . . , D}→{1, . . . , D} is determined such that the sum of angles
d = 1 D < ( Ω ^ CURRDOM , d ( l ) , Ω ^ DOM , f ?? , 1 ( d ) ( l - 1 ) )
between assigned directions is minimised;
obtaining said dominant sound source directions by
Ω ^ DOM , ( d ) ( l ) := Ω ^ CURRDOM , f ?? , l - 1 ( d ) ( l ) for d = 1 , , D ,
where (•) denotes the inverse assignment function.
13. The apparatus according to claim 10, wherein for an initialisation of said concentration parameter for the first two time frames of said HOA coefficients said concentration parameter is set to zero by κd(0)=κd(1)=0 for all d=1, . . . , D.
14. The apparatus according to claim 8 wherein, for an initialisation, for a non-available previous time frame of said HOA coefficients the direction estimates of said dominant sound source directions are chosen arbitrarily.

 

 

Patent trol of patentswamp
Similar patents
an automobile audio system having at least two near-field speakers located close to an intended position of a listener's head is configured by determining a first binaural filter that causes sound produced by each of the near-field speakers to have characteristics at the intended position of the listener's head of sound produced by a sound source located at a first designated position other than the actual locations of the near-field speakers, determining an up-mixing rule to generate at least three component channel signals from an input audio signal having at least two channels, and configuring the audio system to, determine a first binaural signal corresponding to a combination of the component channel signals originating at the first designated position, and filter the first binaural signal using the first binaural filter and to output the filtered signals using the near-field speakers.
a method of processing an audio signal, the method including receiving a downmix signal and a first information, the downmix signal including at least one object, the first information including object information indicating an attribute of the at least one object; receiving a second information, the second information including external preset information and applied object number information, the external preset information being an external input and including an external preset rendering parameter and external preset metadata, the applied object number information indicating a number of objects to which the external preset information is applied; generating downmix processing information controlling panning or gain of the downmix signal by using the object information and the external preset information based on the applied object number information; and modifying the downmix signal by using the downmix processing information.
an apparatus comprising: an audio signal analyser configured to analyse at least one audio signal to determine at least one audio component with an associated orientation parameter; a reference definer configured to define at least one of: a reference orientation for an apparatus; and a reference position for the apparatus; a directional determiner configured to determine a direction value based on the reference orientation/position for the apparatus and at least one of: an orientation of the apparatus; a position of the apparatus; an orientation of a further apparatus co-operating with the apparatus; and a position of the further apparatus; and a directional processor configured to process at least one associated directional parameter for the at least one audio component dependent on the direction value.
embodiments described herein involve configuring a playback device based on the detection of a barrier in proximity to the playback device. one embodiment may involve causing the playback device to play an audio content, where a first speaker of the playback device is arranged to output in a first direction and configured to play a component of the audio content; receiving proximity data that includes an indication of a barrier that is proximate to the playback device; detecting that the barrier is within a threshold proximity to the first speaker of the playback device; based on the detecting, causing the first speaker to be deactivated; and causing a second speaker of the playback device to play at least the component of the audio content, where the second speaker arranged to output in a second direction that is different from the first direction.
embodiments described herein involve configuring a playback device based on the detection of a barrier in proximity to the playback device. one embodiment may involve receiving proximity data that includes an indication of a barrier that is proximate to a playback device; detecting that the barrier is within a threshold proximity to a first speaker of the playback device based on the indication of the barrier that is proximate to the playback device; in response to the detecting, setting a playback configuration of the playback device; and causing the playback device to play an audio content according to the playback configuration.
the invention concerns an audio signal processing unit including a receiving unit adapted to receive at least a first audio signal and a pilot tone signal, a first audio signal amplifier adapted to obtain at least a second audio signal from an audio signal source by way of an audio signal input, and a pilot tone signal unit adapted to obtain the received pilot tone signal. the pilot tone signal unit is adapted to output a control signal in a first state if the pilot tone signal is received, and to output the control signal in a second state if the pilot tone signal is not received. the first audio signal amplifier is adapted to process the at least second audio signal in dependence on the state of the control signal.
a sound signal processing method and apparatus are provided that relate to the audio signal processing field. the method in the present invention includes acquiring, by a mobile terminal, sound signals from a three-dimensional sound field, where at least three microphones are disposed on the mobile terminal and one microphone is configured to receive a sound signal in at least one direction; acquiring, according to the acquired sound signals, a direction of a sound source relative to the mobile terminal; and obtaining spatial audio signals according to the direction of the sound source relative to the mobile terminal and the acquired sound signals, where the spatial audio signals are used for simulating the three-dimensional sound field. the present invention is applicable to a process of collecting and processing signals in a three-dimensional sound field surrounding a terminal.
an audio tuning system including: an interface configured to provide a test signal to a reference device and to a target device; and a controller configured to acquire a result of audio processing in response to the test signal from the reference device and a result of audio processing in response to the test signal from the target device; compare the results; and adjust an audio processing characteristic value of the target device to match that of the reference device based on a result of the comparison. the audio tuning system may automatically tune the audio characteristic of the target device to match the reference device.
a multifunctional led device and multifunctional speaker system are disclosed. a multifunctional led device consistent with the present disclosure includes a power supply unit configured to supply power; a control unit configured to process audio signals and control commands; an audio power amplifier configured to drive a speaker, a speaker configured to play audio signals, a first wireless transceiver configured to communicate with a smart terminal, a second wireless transceiver configured to communicate with other led devices, and an led light source. the multifunctional speaker system includes several multifunctional wireless led devices configured to works as wireless speakers, and a smart terminal to control the system remotely. the smart terminal may communicate with and control all the multifunctional led devices. the multifunctional led devices may communicate with each other as well. two of the multifunctional led devices may be configured as a 2.0-channel speaker system. other speaker systems, such as 2.1-channel, 5.1-channel speaker systems etc., may also be realized by using more multifunctional wireless led devices. embodiments consistent with the present disclosure are easy to install, require little wiring, and do not require adding a separate wireless transceiver, a separate remote control unit or a light switch. embodiments consistent with the present disclosure support wireless lighting controls and provide efficient indoor lighting.
a reverberation suppression device comprises: an echo canceller that removes an echo component included in an input signal; a howling suppressor that detects occurrence of howling based on a frequency characteristic of the input signal from which the echo component has been removed and attenuates a frequency level of a component of the detected howling; and an initial sound suppressor that detects a sound section of the input signal in which the frequency level of the howling component has been attenuated and suppresses a signal value at a sound start portion of the detected sound section.
To top