Abstract
Underwater passive acoustics is used worldwide for multi-year monitoring of marine mammals. Yet, the large amount of audio recordings raises the need to automate the detection of acoustic events. For instance, the increasing number of Offshore Wind Farms (OWF) raises key environmental and societal issues relating to their impacts on wildlife. In this context, monitoring marine mammals along with information on their acoustic environment throughout the OWF life cycle is crucial. The objective of this study is to evaluate the ability of a single deep learning model to precisely detect and localize, in time and in frequency, the marine mammal sounds over a wide frequency range and classify them by species and sound types.
A broadband hydrophone, deployed at the Fécamp OWF (Normandy, France), recorded the underwater soundscape including sounds from marine mammals occurring in the area. To visualize these sounds, 15-s spectrograms were computed. From these images, dolphin (D) and porpoise (P) sounds were manually annotated, including different types of sounds: Click-Trains (DCT, PCT), Buzzes (DB, PB) and Whistles (DW). The spectrograms were then split into five-fold cross-validation datasets, each containing one half of manual annotations and one half of only background noise. A Faster R-CNN model was trained to precisely detect and classify the marine mammal sounds in the spectrograms.
Three model output configurations were used: (1) overall detection of marine mammals (presence vs. absence), (2) detection and classification of species (two classes: dolphin, porpoise) and (3) sound types (five classes: DCT, DB, DW, PCT, PB). For the simplest configuration (1) 15.4 % of the spectrogram dataset had detections while missing only 6.6 % of annotated spectrograms. For the more precise configurations, (2) and (3), the mean Average Precision (mAP) achieved were 92.3 % (2) and 84.3 % (3), and the macro average Area under the curve (AUC) 95.7 % (2) and 94.9 % (3).
This model will help to speed up the annotation processes, by reducing the spectrogram quantity to be manually analyzed and having time-frequency boxes already drawn. Several model parameters can be adjusted to trade off missed detections and false positives which need to be carefully considered and adapted to the problem. For instance, these adjustments would be particularly relevant depending on the human resources available to manually check the model detections and the criticality of missing marine mammal sounds. These models are promising, ranging from the simple detection of marine mammal presence to precise ecological inferences over the long term.