A data science approach to studying the whale language

A data science approach to studying the whale language


Modern audio sensors enables the recording of many hours communication between whales. But with thousands of hours being recorded, it is very difficult to analyze these data, profile it, and turn it into discoveries. That’s when data science comes to our aid.

One approach is to apply supervised machine learning to classify the sounds and identify the whale or species by their voice. Such an experiment can demonstrate that whale sounds can be analyzed automatically, but it does not provide any new knowledge.

Another approach is to apply unsupervised machine learning, and see what can be learnt from the results. To do that, we can use sound samples recorded from whales of two different species, and in different parts of the ocean. The two whale species are killer whales and pilot whales, both are dolphins. The recordings were collected in different parts of the ocean: The Caribbean islands, Iceland, and near Norway.

The unsupervised analysis produced a graph of similarities between the audio samples of each pod of whales, recorded in different geographic locations. That is, each node is a collection of audio samples of a certain pod of whales, and the graph shows the similarities between these collections.
 
 
The graph shows that the computer was able to identify that pods of whales of the same species also make similar sounds, which can be expected. Pods of killer whales are placed in the top part of the graph, while the pods of pilot whales are at the bottom part.

But the data also shows something that was not expected. The unsupervised analysis separated the Icelandic killer whales from the Norwegian killer whales, and the Norwegian pilot whales from the Caribbean pilot whales. That shows something that we did not know about whales – whales of the same species, just like people, have different dialects based on the geographic location in which they live.

That discovery could not have been made with supervised machine learning or simple predictive analytics, but the unsupervised machine learning provided a completely new way to profile the data and identify discoveries in it.

Reference:
Shamir, L., Yerby, C., Simpson, R., von Benda-Beckmann, A., Tyack, P., Samarra, F., Miller, P., Wallin , J.; Classification of large acoustic datasets using machine learning and crowdsourcing – application to whale calls, Journal of the Acoustical Society of America 135(2), 953-962, 2014.

Link: A data science approach to studying the whale language