SUDIR: An Approach of Sensing Urban Text Data From Internet Resources Based on Deep Learning

Urban data is a imperative resource for urban computing, which can promote the establishment of urban knowledge, the collection of urban information and the construction of smart cities. Seen that the urban data collected through hardware sensors and crowdsourcing has the limitations of uneven infor...

Full description

Bibliographic Details
Main Authors: Chaoran Zhou, Jianping Zhao, Chenghao Ren
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9270026/
id doaj-03654b81ad72473bb61ab60716e9ec9e
record_format Article
spelling doaj-03654b81ad72473bb61ab60716e9ec9e2021-03-30T03:52:31ZengIEEEIEEE Access2169-35362020-01-01821445421446810.1109/ACCESS.2020.30404089270026SUDIR: An Approach of Sensing Urban Text Data From Internet Resources Based on Deep LearningChaoran Zhou0https://orcid.org/0000-0003-0971-7422Jianping Zhao1Chenghao Ren2School of Computer Science and Technology, Changchun University of Science and Technology, Changchun, ChinaSchool of Computer Science and Technology, Changchun University of Science and Technology, Changchun, ChinaSchool of Computer Science and Technology, Jilin University, Changchun, ChinaUrban data is a imperative resource for urban computing, which can promote the establishment of urban knowledge, the collection of urban information and the construction of smart cities. Seen that the urban data collected through hardware sensors and crowdsourcing has the limitations of uneven information distribution, poor data comprehensiveness and high resource costs, we turn to the Internet resources of real-time updates and extensive information coverage. Therefore, we propose an approach to Sensing Urban text Data from Internet Resources (SUDIR). We put forward innovative work on two key issues: urban data recognition for Chinese context and urban data sensing for multi-source web resources. On one hand, we design a Chinese urban data recognition model based on Whole Word Masking for Bidirectional Encoder Representations from Transformers (BERT-WWM) embedding model and Bidirectional Long-Short Term Memory with a Conditional Random Field (BLSTM-CRF) sequence labeling model. We introduce Chinese Word Segmentaion (CWS) concept in BERT embedding model to make the text embedding effect better represent semantic information on Chinese context. BLSTM-CRF model based on deep learning is used to achieve high-quality coding and prediction. On the other hand, we propose a method of Extracting Urban text data based on Web page features and Clustering operation (EUWC). EUWC is used to correct the false negative samples labeled by BERT-WWM+BLSTM-CRF recognition model and enable SUDIR to sense more accurate and comprehensive city data from multi-source web resources. The experimental results show that our work outperforms the other baseline methods, and it also proves that SUDIR using Internet resources and deep learning technology has the advantages of low-cost, high-quality urban data sensing.https://ieeexplore.ieee.org/document/9270026/Urban data sensingurban computingInternet resourcesdeep learningChinese textweb page features
collection DOAJ
language English
format Article
sources DOAJ
author Chaoran Zhou
Jianping Zhao
Chenghao Ren
spellingShingle Chaoran Zhou
Jianping Zhao
Chenghao Ren
SUDIR: An Approach of Sensing Urban Text Data From Internet Resources Based on Deep Learning
IEEE Access
Urban data sensing
urban computing
Internet resources
deep learning
Chinese text
web page features
author_facet Chaoran Zhou
Jianping Zhao
Chenghao Ren
author_sort Chaoran Zhou
title SUDIR: An Approach of Sensing Urban Text Data From Internet Resources Based on Deep Learning
title_short SUDIR: An Approach of Sensing Urban Text Data From Internet Resources Based on Deep Learning
title_full SUDIR: An Approach of Sensing Urban Text Data From Internet Resources Based on Deep Learning
title_fullStr SUDIR: An Approach of Sensing Urban Text Data From Internet Resources Based on Deep Learning
title_full_unstemmed SUDIR: An Approach of Sensing Urban Text Data From Internet Resources Based on Deep Learning
title_sort sudir: an approach of sensing urban text data from internet resources based on deep learning
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2020-01-01
description Urban data is a imperative resource for urban computing, which can promote the establishment of urban knowledge, the collection of urban information and the construction of smart cities. Seen that the urban data collected through hardware sensors and crowdsourcing has the limitations of uneven information distribution, poor data comprehensiveness and high resource costs, we turn to the Internet resources of real-time updates and extensive information coverage. Therefore, we propose an approach to Sensing Urban text Data from Internet Resources (SUDIR). We put forward innovative work on two key issues: urban data recognition for Chinese context and urban data sensing for multi-source web resources. On one hand, we design a Chinese urban data recognition model based on Whole Word Masking for Bidirectional Encoder Representations from Transformers (BERT-WWM) embedding model and Bidirectional Long-Short Term Memory with a Conditional Random Field (BLSTM-CRF) sequence labeling model. We introduce Chinese Word Segmentaion (CWS) concept in BERT embedding model to make the text embedding effect better represent semantic information on Chinese context. BLSTM-CRF model based on deep learning is used to achieve high-quality coding and prediction. On the other hand, we propose a method of Extracting Urban text data based on Web page features and Clustering operation (EUWC). EUWC is used to correct the false negative samples labeled by BERT-WWM+BLSTM-CRF recognition model and enable SUDIR to sense more accurate and comprehensive city data from multi-source web resources. The experimental results show that our work outperforms the other baseline methods, and it also proves that SUDIR using Internet resources and deep learning technology has the advantages of low-cost, high-quality urban data sensing.
topic Urban data sensing
urban computing
Internet resources
deep learning
Chinese text
web page features
url https://ieeexplore.ieee.org/document/9270026/
work_keys_str_mv AT chaoranzhou sudiranapproachofsensingurbantextdatafrominternetresourcesbasedondeeplearning
AT jianpingzhao sudiranapproachofsensingurbantextdatafrominternetresourcesbasedondeeplearning
AT chenghaoren sudiranapproachofsensingurbantextdatafrominternetresourcesbasedondeeplearning
_version_ 1724182709462368256