Automatic Language Identification in Broadcast News

Email
Print

submitted to World Congress on Computational Intelligence, IJCNN 2002 (Honolulu, May 2002)

In January 2021, SAIL LABS Technology GmbH was acquired by the sensor specialist HENSOLDT and became HENSOLDT Analytics.

We present experiments on automatic language identification in the broadcast news domain. Because of the inherent diversity of news broadcasts, speech is extracted from the raw audio data by means of phone-level decoding using broad classes of phonemes. Training and testing was performed on recordings of German, English, Spanish, and French news shows from a variety of European TV channels. Each language is characterized by a Gaussian mixture model solely created from corresponding acoustic features. The overall average error rate on speech segments is 16.32%. The current system disregards (almost) any
kind of linguistic information; however it is therefore easily extensible to new languages.

    Your name and e-mail are going to be used in order to send you only the research file and not any additional commercial material. You can change your mind at any time by clicking the unsubscribe in the footer of the email that you receive from us, or by contacting dataprotectionofficer@hensoldt-analytics.com. Please find out about your rights and choices and how we use your information in our Privacy Policy.

    HENSOLDT Analytics
    HENSOLDT Analytics

    HENSOLDT Analytics is a global leading provider of Open Source Intelligence (OSINT) systems and Natural Language Processing technologies, such as Automatic Speech Recognition, which are key elements for media monitoring and analysis.