Deep learning, Audio Classification, and Beyond

franky
3 min readJul 24, 2018

--

Source: Listening to the Roar of 1920s New York

If you are a beginner in deep learning and are looking for some ideas on deep learning for audio processing, probably you should start by checking 10 Audio Processing Tasks to get you started with Deep Learning Applications (with Case Studies) — which describes a wide range of applications in this area, such as, audio classification, audio fingerprinting, automatic music tagging, audio segmentation, audio source separation, beat tracking, music recommendation, music retrieval, music transcription, and onset detection.

Out of these applications, one interesting but challenging task for beginners (and experts) is multi-class classification on Urban Sound Datasets created by New York University.

Under these datasets, UrbanSound8K contains 8732 labeled sound excerpts of urban sounds from 10 classes:

  • air_conditioner (1000 audio files)
  • car_horn (429 audio files)
  • children_playing (1000 audio files)
  • dog_bark (1000 audio files)
  • drilling (1000 audio files)
  • enginge_idling (1000 audio files)
  • gun_shot (374 audio files)
  • jackhammer (1000 audio files)
  • siren (929 audio files)
  • street_music (1000 audio files)

Using deep learning to solve this problem is an interesting and challenging task [ref1, ref2]. However, the story behind the scenes is even more compelling. Check the following video on SONYC (Sounds of New York City) first.

Source: Sounds of New York City on YouTube

Quoted from YouTube:

Noise pollution is one of the topmost quality of life issues for urban residents in the United States. It has been estimated that 9 out of 10 adults in New York City are exposed to excessive noise levels, i.e. beyond the limit of what the EPA considers to be harmful….

The objectives of SONYC are to create technological solutions for: (1) the systematic, constant monitoring of noise pollution at city scale; (2) the accurate description of acoustic environments in terms of its composing sources; (3) broadening citizen participation in noise reporting and mitigation; and (4) enabling city agencies to take effective, information-driven action for noise mitigation.

Clearly we can see that the whole project tries to address serious issues on a daily basis with a boarder scope, for example, are car horns correlating with specific traffic patterns or how does construction noises near schools affect how well children perform at schools.

To push the idea of detecting urban sounds a little bit further, check the following podcast on ShotSpotter from Harvard Business School.

Source: ShotSpotter on Cold Call from Harvard Business School

Quoted from the blog:

ShotSpotter provides gunfire detection sensors to cities across the United States. CEO Ralph Clark is interested in taking the company beyond its business-to-government sales model and into new markets. Could his company sell to schools and colleges? Could the technology be adapted for indoor applications like shopping malls and movie theaters? Could cities use it as an early alert to a terrorist attack? Professor Mitch Weiss discusses the difficulties moving from one business model to another, and how successful companies make the transition.

From the video of SONYC and the podcast of ShotSpotter, I think that we can certainly appreciate technlogy contributing to the quality of our life, and the improvements over the past century.

--

--