Species Distribution Modelling via Feature Engineering and Machine Learning for Pelagic Fishes in the Mediterranean Sea

In this work a fish species distribution model (SDM) was developed, by merging species occurrence data with environmental layers, with the scope to produce high resolution habitability maps for the whole Mediterranean Sea. The final model is capable to predict the probability of occurrence of each f...

Full description

Bibliographic Details
Main Authors: Dimitrios Effrosynidis, Athanassios Tsikliras, Avi Arampatzis, Georgios Sylaios
Format: Article
Language:English
Published: MDPI AG 2020-12-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/10/24/8900
id doaj-6147e9a0e5434e6d82d17d523bc8487c
record_format Article
spelling doaj-6147e9a0e5434e6d82d17d523bc8487c2020-12-14T00:01:13ZengMDPI AGApplied Sciences2076-34172020-12-01108900890010.3390/app10248900Species Distribution Modelling via Feature Engineering and Machine Learning for Pelagic Fishes in the Mediterranean SeaDimitrios Effrosynidis0Athanassios Tsikliras1Avi Arampatzis2Georgios Sylaios3Database & Information Retrieval research Unit, Democritus University of Thrace, 671 00 Xanthi, GreeceLaboratory of Ichthyology, School of Biology, Aristotle University of Thessaloniki, 541 24 Thessaloniki, GreeceDatabase & Information Retrieval research Unit, Democritus University of Thrace, 671 00 Xanthi, GreeceLaboratory of Ecological Engineering & Technology, Department of Environmental Engineering, Democritus University of Thrace, 671 00 Xanthi, GreeceIn this work a fish species distribution model (SDM) was developed, by merging species occurrence data with environmental layers, with the scope to produce high resolution habitability maps for the whole Mediterranean Sea. The final model is capable to predict the probability of occurrence of each fish species at any location in the Mediterranean Sea. Eight pelagic, commercial fish species were selected for this study namely <i>Engraulis encrasicolus</i>, <i>Sardina pilchardus</i>, <i>Sardinella aurita</i>, <i>Scomber colias</i>, <i>Scomber scombrus</i>, <i>Spicara smaris</i>, <i>Thunnus thynnus</i> and <i>Xiphias gladius</i>. The SDM environmental predictors were obtained from the databases of Copernicus Marine Environmental Service (CMEMS) and the European Marine Observation and Data Network (EMODnet). The probabilities of fish occurrence data in low resolution and with several gaps were obtained from Aquamaps (FAO Fishbase). Data pre-processing involved feature engineering to construct 6830 features, representing the distribution of several mean-monthly environmental variables, covering a time-span of 10 years. Feature selection with the ensemble Reciprocal Ranking method was used to rank the features according to their relative importance. This technique increased model’s performance by 34%. Ten machine learning algorithms were then applied and tested based on their overall performance per species. The XGBoost algorithm performed better and was used as the final model. Feature categories were explored, with neighbor-based, extreme values, monthly and surface ones contributing most to the model. Environmental variables like salinity, temperature, distance to coast, dissolved oxygen and nitrate were found the strongest ones in predicting the probability of occurrence for the above eight species.https://www.mdpi.com/2076-3417/10/24/8900species distribution modelsfish speciesfeature extractionfeature selectionXGBoosthabitability maps
collection DOAJ
language English
format Article
sources DOAJ
author Dimitrios Effrosynidis
Athanassios Tsikliras
Avi Arampatzis
Georgios Sylaios
spellingShingle Dimitrios Effrosynidis
Athanassios Tsikliras
Avi Arampatzis
Georgios Sylaios
Species Distribution Modelling via Feature Engineering and Machine Learning for Pelagic Fishes in the Mediterranean Sea
Applied Sciences
species distribution models
fish species
feature extraction
feature selection
XGBoost
habitability maps
author_facet Dimitrios Effrosynidis
Athanassios Tsikliras
Avi Arampatzis
Georgios Sylaios
author_sort Dimitrios Effrosynidis
title Species Distribution Modelling via Feature Engineering and Machine Learning for Pelagic Fishes in the Mediterranean Sea
title_short Species Distribution Modelling via Feature Engineering and Machine Learning for Pelagic Fishes in the Mediterranean Sea
title_full Species Distribution Modelling via Feature Engineering and Machine Learning for Pelagic Fishes in the Mediterranean Sea
title_fullStr Species Distribution Modelling via Feature Engineering and Machine Learning for Pelagic Fishes in the Mediterranean Sea
title_full_unstemmed Species Distribution Modelling via Feature Engineering and Machine Learning for Pelagic Fishes in the Mediterranean Sea
title_sort species distribution modelling via feature engineering and machine learning for pelagic fishes in the mediterranean sea
publisher MDPI AG
series Applied Sciences
issn 2076-3417
publishDate 2020-12-01
description In this work a fish species distribution model (SDM) was developed, by merging species occurrence data with environmental layers, with the scope to produce high resolution habitability maps for the whole Mediterranean Sea. The final model is capable to predict the probability of occurrence of each fish species at any location in the Mediterranean Sea. Eight pelagic, commercial fish species were selected for this study namely <i>Engraulis encrasicolus</i>, <i>Sardina pilchardus</i>, <i>Sardinella aurita</i>, <i>Scomber colias</i>, <i>Scomber scombrus</i>, <i>Spicara smaris</i>, <i>Thunnus thynnus</i> and <i>Xiphias gladius</i>. The SDM environmental predictors were obtained from the databases of Copernicus Marine Environmental Service (CMEMS) and the European Marine Observation and Data Network (EMODnet). The probabilities of fish occurrence data in low resolution and with several gaps were obtained from Aquamaps (FAO Fishbase). Data pre-processing involved feature engineering to construct 6830 features, representing the distribution of several mean-monthly environmental variables, covering a time-span of 10 years. Feature selection with the ensemble Reciprocal Ranking method was used to rank the features according to their relative importance. This technique increased model’s performance by 34%. Ten machine learning algorithms were then applied and tested based on their overall performance per species. The XGBoost algorithm performed better and was used as the final model. Feature categories were explored, with neighbor-based, extreme values, monthly and surface ones contributing most to the model. Environmental variables like salinity, temperature, distance to coast, dissolved oxygen and nitrate were found the strongest ones in predicting the probability of occurrence for the above eight species.
topic species distribution models
fish species
feature extraction
feature selection
XGBoost
habitability maps
url https://www.mdpi.com/2076-3417/10/24/8900
work_keys_str_mv AT dimitrioseffrosynidis speciesdistributionmodellingviafeatureengineeringandmachinelearningforpelagicfishesinthemediterraneansea
AT athanassiostsikliras speciesdistributionmodellingviafeatureengineeringandmachinelearningforpelagicfishesinthemediterraneansea
AT aviarampatzis speciesdistributionmodellingviafeatureengineeringandmachinelearningforpelagicfishesinthemediterraneansea
AT georgiossylaios speciesdistributionmodellingviafeatureengineeringandmachinelearningforpelagicfishesinthemediterraneansea
_version_ 1724383937480884224