Abstract
Having received a signal, unlike traditional speech processing, the aim of this research goal is not to identify where individual word boundaries begin and end or detect the pattern set, using supervised techniques, which comprise the signal's lexicon. The rationale that underpins this approach is therefore, not to decipher the audio signal content, as this is a secondary task and assumes language content exists, but to identify what constitutes the physical structure of spoken language, in contrast to other structured phenomena. In essence, to develop an automated (artificially intelligent) intuitive 'ear' that can detect the rhythm and structure of language with the same accuracy (or better) of the human ear. To achieve this, unsupervised learning techniques, which do not rely on prior knowledge of a specific system, underpin generic methods devised to facilitate classification of unknown phenomena, if encountered. Results show that amplitude frequency histograms, derived from vertical, horizontal and thresholded analysis, clearly distinguish speech, 'noise', and music with distinctive leptokurtic, platykurtic and either a 'tooth-comb' or bimodal profiles respectively. Birds and Apes demonstrate similar but coarser-grained versions of a leptokurtic distribution; however, dolphins and orcas produce almost identical profiles to humans, which indicate a similar complexity of sound pattern construction. Individually, the two types of visualisation methods (SAS time series and amplitude frequency histogram) mentioned above are reasonably robust in their ability to differentiate language from other signals. In particular, time series analysis of Significant Activity Segments is able to identify language-like communication within a transmission, which includes other structured phenomena, whether natural or artificial. However, combining these two methods produces a significantly more robust system, which is believed to be an extremely useful automated first-pass filter for identifying and distinguishing intelligent language-like audio communication, without the intervention of supervised techniques.
Original language | English |
---|---|
Title of host publication | Proceedings of the second IASTED International Conference on Circuits, Signals, and Systems (CSS 2004) |
Subtitle of host publication | Clearwater Beach, Florida, USA, November 28, 2004 - December 1, 2004 |
Editors | M.H. Rashid |
Place of Publication | Calgary, AB |
Publisher | IASTED/ACTA Press |
Pages | 237-242 |
Number of pages | 6 |
ISBN (Print) | 9780889864559 |
Publication status | Published - 2004 |
Event | Proceedings of the IASTED International Conference on Circuits, Signals, and Systems - Clearwater Beach, FL, United States Duration: 28 Nov 2004 → 1 Dec 2004 |
Conference
Conference | Proceedings of the IASTED International Conference on Circuits, Signals, and Systems |
---|---|
Country/Territory | United States |
City | Clearwater Beach, FL |
Period | 28/11/04 → 1/12/04 |
Keywords
- Audio, language
- Detection
- Significant Activity Segments (SAS)
- Unsupervised, learning