Page 30 - Use cases and requirements for the vehicular multimedia networks - Focus Group on Vehicular Multimedia (FG-VM)
P. 30

It becomes apparent that the determination of the location (zone) of the various emitting sources
            (talkers) in the cabin and the acoustic treatment of each transmitted signal (voice command) from
            each location (zone) will facilitate the correct processing of voice commands by the voice recognition
            system.

            8.1.2   Use-cases

            8.1.2.1    Use case A – Initiating a voice recognition session
            A person in a vehicle containing multiple occupants wishes to initiate a voice recognition session by
            uttering a keyword, such as ''Hey Siri'', ''Alexa'' or ''Okay Google''. Each occupant is in a separate
            zone  of  the  cabin.  The  cabin  may  contain  one  or  more  microphones  which  may  or  may  not  be
            dedicated  for  each  zone.  Each  microphone  picks  up  the  voice  of  the  occupant,  but  also  the
            voices of other  occupants,  or  ''interference  speech''.  One  or  more  multiple  microphone  signals
            (or audio  channels)  may  be  available  to  a  keyword  spotter  (KWS),  which  must  decide  not  only
            whether/when the keyword was spoken, but also from which zone the keyword was spoken.
            The following problem scenarios may result in inadequate behavior of the KWS:
            •       A-1 If there is no dedicated microphone for each zone, or no means to identify the zone of
                    the target talker, the command may not be detected, may be rejected or wrongly executed.
            •       A-2 Otherwise:

                    –  A-2-A Interfering speech may cause a KWS to fail to detect (false reject) the keyword
                        spoken by the target talker in the target zone microphone.
                    –  A-2-B Concurrent sources (e.g., music, video) played into the vehicle, resulting in echo
                        on the microphones, may cause a KWS to fail to detect (false reject) the keyword spoken
                        by the target talker in the target zone microphone.
                    –  A-2-C Interference of the target talker onto microphones outside of the target zone may
                        cause the KWS to detect the keyword but from the wrong zone.


















                  Figure 8 – Acoustic processing (AEC and ZIC) on each zone dedicated microphone


            Figure 8 is an illustration of use case A-2 involving KWS with N microphones/zones in a vehicle,
            depicting  the  waveforms.  Each  microphone  contains  target  speech,  interfering  speech  and  echo
            (black). Talker in zone 1 is yellow, talker in zone 2 is red and talker in zone 3 is blue. Acoustic echo
            cancellation  (AEC)  is  used  to  subtract  the  echo  from  each  microphone,  and  zone  interference
            cancellation (ZIC) is used to isolate the target speech from interfering speech in each microphone.

            8.1.2.2    Use case B – Interference during a voice recognition session
            Once  a  voice  recognition  session  has  been  initiated  and  the  target  zone  has  been  identified
            (e.g., using KWS or push-to-talk), an occupant in the target zone will use voice commands to interact
            with the voice recognition system. The target speech in the target zone will potentially be mixed with





            20       FGVM-01R1 (2019)
   25   26   27   28   29   30   31   32   33   34   35