Shoghi VPA is a speech analysis system intended for use in a law enforcement and intelligence agency. VPA is capable of analyzing audio files for Speech/Non-speech detection, Language identification and Speaker identification. The core parts of VPA executing this analysis are called Classification modules, which are responsible for Speech Detection, Language Identification, Speaker Identification, Gender Detection, Emotion Detection, Age Detection and Keyword Spotter.
This module marks the areas in a signal which contain speech. This is based on the one hand on threshold values and on the other hand on the analysis of defined frequency bands which are characteristic of speech.
Each language has its acoustic characteristics that can be used in order to assign an unknown signal to a language. With the help of such a pattern comparison, a language identifier can compare an available signal with the languages known to him.
The Speaker Identification is based on a comparison of the existing speakers (already identified) with the current signal. When there is enough speech in the signal (e.g. more than 10seconds), speaker identification can identify a speaker.
For gender detection, the fundamental voice frequency is used as an essential parameter. This results in a good estimation whether the voice belongs to a man or a woman
The emotion detection observes the whole signal analyzing the modifications in the frequency spectrum and the intensity. This way, conversations with participants who, e.g. react more and more heatedly can be filtered out well. The quality of this differentiation depends, however, strongly on the data of the training quality of the classifier Speaker Identification in signal under analysis from known database of speaker voice samples.
The age detection is based on typical modifications of the voice when people grow older. The data that have to be analyzed can thus at least be attributed to a certain age group Multiple language identification from a known database of languages
The keyword spotter searches words or phrases in the signals. Besides the keywords, you also have the possibility to vary a background balance in order to have many keywords (but also including more failures „false detection") available or only few keywords leading to few failures