Species Distribution Modelling via Feature Engineering and Machine Learning for Pelagic Fishes in the Mediterranean Sea
In this work a fish species distribution model (SDM) was developed, by merging species occurrence data with environmental layers, with the scope to produce high resolution habitability maps for the whole Mediterranean Sea. The final model is capable to predict the probability of occurrence of each f...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2020-12-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/10/24/8900 |
id |
doaj-6147e9a0e5434e6d82d17d523bc8487c |
---|---|
record_format |
Article |
spelling |
doaj-6147e9a0e5434e6d82d17d523bc8487c2020-12-14T00:01:13ZengMDPI AGApplied Sciences2076-34172020-12-01108900890010.3390/app10248900Species Distribution Modelling via Feature Engineering and Machine Learning for Pelagic Fishes in the Mediterranean SeaDimitrios Effrosynidis0Athanassios Tsikliras1Avi Arampatzis2Georgios Sylaios3Database & Information Retrieval research Unit, Democritus University of Thrace, 671 00 Xanthi, GreeceLaboratory of Ichthyology, School of Biology, Aristotle University of Thessaloniki, 541 24 Thessaloniki, GreeceDatabase & Information Retrieval research Unit, Democritus University of Thrace, 671 00 Xanthi, GreeceLaboratory of Ecological Engineering & Technology, Department of Environmental Engineering, Democritus University of Thrace, 671 00 Xanthi, GreeceIn this work a fish species distribution model (SDM) was developed, by merging species occurrence data with environmental layers, with the scope to produce high resolution habitability maps for the whole Mediterranean Sea. The final model is capable to predict the probability of occurrence of each fish species at any location in the Mediterranean Sea. Eight pelagic, commercial fish species were selected for this study namely <i>Engraulis encrasicolus</i>, <i>Sardina pilchardus</i>, <i>Sardinella aurita</i>, <i>Scomber colias</i>, <i>Scomber scombrus</i>, <i>Spicara smaris</i>, <i>Thunnus thynnus</i> and <i>Xiphias gladius</i>. The SDM environmental predictors were obtained from the databases of Copernicus Marine Environmental Service (CMEMS) and the European Marine Observation and Data Network (EMODnet). The probabilities of fish occurrence data in low resolution and with several gaps were obtained from Aquamaps (FAO Fishbase). Data pre-processing involved feature engineering to construct 6830 features, representing the distribution of several mean-monthly environmental variables, covering a time-span of 10 years. Feature selection with the ensemble Reciprocal Ranking method was used to rank the features according to their relative importance. This technique increased model’s performance by 34%. Ten machine learning algorithms were then applied and tested based on their overall performance per species. The XGBoost algorithm performed better and was used as the final model. Feature categories were explored, with neighbor-based, extreme values, monthly and surface ones contributing most to the model. Environmental variables like salinity, temperature, distance to coast, dissolved oxygen and nitrate were found the strongest ones in predicting the probability of occurrence for the above eight species.https://www.mdpi.com/2076-3417/10/24/8900species distribution modelsfish speciesfeature extractionfeature selectionXGBoosthabitability maps |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Dimitrios Effrosynidis Athanassios Tsikliras Avi Arampatzis Georgios Sylaios |
spellingShingle |
Dimitrios Effrosynidis Athanassios Tsikliras Avi Arampatzis Georgios Sylaios Species Distribution Modelling via Feature Engineering and Machine Learning for Pelagic Fishes in the Mediterranean Sea Applied Sciences species distribution models fish species feature extraction feature selection XGBoost habitability maps |
author_facet |
Dimitrios Effrosynidis Athanassios Tsikliras Avi Arampatzis Georgios Sylaios |
author_sort |
Dimitrios Effrosynidis |
title |
Species Distribution Modelling via Feature Engineering and Machine Learning for Pelagic Fishes in the Mediterranean Sea |
title_short |
Species Distribution Modelling via Feature Engineering and Machine Learning for Pelagic Fishes in the Mediterranean Sea |
title_full |
Species Distribution Modelling via Feature Engineering and Machine Learning for Pelagic Fishes in the Mediterranean Sea |
title_fullStr |
Species Distribution Modelling via Feature Engineering and Machine Learning for Pelagic Fishes in the Mediterranean Sea |
title_full_unstemmed |
Species Distribution Modelling via Feature Engineering and Machine Learning for Pelagic Fishes in the Mediterranean Sea |
title_sort |
species distribution modelling via feature engineering and machine learning for pelagic fishes in the mediterranean sea |
publisher |
MDPI AG |
series |
Applied Sciences |
issn |
2076-3417 |
publishDate |
2020-12-01 |
description |
In this work a fish species distribution model (SDM) was developed, by merging species occurrence data with environmental layers, with the scope to produce high resolution habitability maps for the whole Mediterranean Sea. The final model is capable to predict the probability of occurrence of each fish species at any location in the Mediterranean Sea. Eight pelagic, commercial fish species were selected for this study namely <i>Engraulis encrasicolus</i>, <i>Sardina pilchardus</i>, <i>Sardinella aurita</i>, <i>Scomber colias</i>, <i>Scomber scombrus</i>, <i>Spicara smaris</i>, <i>Thunnus thynnus</i> and <i>Xiphias gladius</i>. The SDM environmental predictors were obtained from the databases of Copernicus Marine Environmental Service (CMEMS) and the European Marine Observation and Data Network (EMODnet). The probabilities of fish occurrence data in low resolution and with several gaps were obtained from Aquamaps (FAO Fishbase). Data pre-processing involved feature engineering to construct 6830 features, representing the distribution of several mean-monthly environmental variables, covering a time-span of 10 years. Feature selection with the ensemble Reciprocal Ranking method was used to rank the features according to their relative importance. This technique increased model’s performance by 34%. Ten machine learning algorithms were then applied and tested based on their overall performance per species. The XGBoost algorithm performed better and was used as the final model. Feature categories were explored, with neighbor-based, extreme values, monthly and surface ones contributing most to the model. Environmental variables like salinity, temperature, distance to coast, dissolved oxygen and nitrate were found the strongest ones in predicting the probability of occurrence for the above eight species. |
topic |
species distribution models fish species feature extraction feature selection XGBoost habitability maps |
url |
https://www.mdpi.com/2076-3417/10/24/8900 |
work_keys_str_mv |
AT dimitrioseffrosynidis speciesdistributionmodellingviafeatureengineeringandmachinelearningforpelagicfishesinthemediterraneansea AT athanassiostsikliras speciesdistributionmodellingviafeatureengineeringandmachinelearningforpelagicfishesinthemediterraneansea AT aviarampatzis speciesdistributionmodellingviafeatureengineeringandmachinelearningforpelagicfishesinthemediterraneansea AT georgiossylaios speciesdistributionmodellingviafeatureengineeringandmachinelearningforpelagicfishesinthemediterraneansea |
_version_ |
1724383937480884224 |