Detection of Non-Speech Human Sounds for Surveillance

IJCSEC Front Page

The objective of this research is to develop feature extraction and classification techniques for the task of Acoustic Event Detection (AED) in unstructured environments, which are those where adverse effects such as noise, distortion and multiple sources are likely to occur. The objective is to design a system that can achieve human-like sound recognition performance on a range of hearing tasks in different circumstances. The research is important, as the field is commonly overshadowed by the more popular area of Automatic Speech Recognition (ASR), and typical AED systems are often based on techniques taken directly from this. However, the direct application presents difficulties, as the characteristics of acoustic events are less well defined than those of speech, and there is no sub-word dictionary available like the phonemes in speech. Therefore, it is relevant to develop a system that can accomplish well for this challenging task.

Keywords: acoustic event detection, Feature extraction, Classification, Deep belief networks.


  1. American Pierre Laffitte, David Sodoyer, Charles Tatkeu, Laurent Girin, “Deep Neural Networksfor Automatic detection of scream and shouted speech in subway trains”,IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),Shanghai,pp. 6460-6464, 2016.
  2. AnnamariaMesaros, Toni.Heittola, AnttiEronen, and T. Virtanen, “Acoustic event detection in real-life recordings”, 18th European Signal Processing Conference, (Aalborg, Denmark), pp. 12671271, 2010.
  3. D. Stowell and D. Clayton, “Acoustic event detection for multiple overlapping similarsources”,IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY, pp. 1-5, 2015.
  4. Anil Sharma and SanjitKaul,”Two-staged supervised learning based method to detectscreams and cries in urban environment”, IEEE ACM Trans.Audio,speech and language processing., vol. 24, no. 22, Feb. 2016.
  5. OguzhanGencoglu, Tuomas Virtanen, HeikkiHuttunen: “Recognition of acousticevents using deep neural networks”, EUSIPCO pp. 506-510,2014.
  6. Rahna K M and Baby C J,”A survey on scream detection methods”,International Conference on Advanced Computing and Communication Systems (ICACCS ), pp.1948-1952, 2017.
  7. A.S. Bregman, “Auditory scene analysis: The perceptual organization of sound”, TheMIT Press, 1994.
  8. C. Clavel, T. Ehrette, and G. Richard, “Events detection for an audio-based surveillance system,” in Multimedia and Expo, 2005. ICME 2005. IEEE International Conference on. IEEE, 2005, pp. 1306–1309.
  9. SiddharthSigtia, Adam M. Stark, SachaKrstulovic, Mark D. Plumbley, “Automatic environmental sound recognition”performance versus computational cost”, IEEE/ACM Trans Audio,Speech And Language,2016.