Summary: | The text information in the medical photocopies is of great significance to the construction of medical digital platform. Text region detection, the very first step of extracting medical photocopies information, is functional to detect text area or locate text instance on the sample. Researchers have done a lot works on text area detection in natural scenes, yet few of them in turn pay attention to the medical photocopies scenario which is urgent to be settled. Here, a text line area detection dataset based on Chinese medical photocopies (CMPTD) are created and a fine-grained text line region detection model based on multi-scale feature extraction and fusion are proposed in this paper. The detection model consists of three parts. The first part is feature extraction module. Cspdarknet53 in You Only Look Once version 4 (YOLOv4) is used as the backbone network of our model, and the spatial pyramid pool strategy is used to extract multi-scale features to enhance the robustness of the model. The second part is feature fusion module. By referring to the PANet structure, the three effective feature layers in feature extraction module are fused repeatedly. The last part is prediction module. The network outputs a series of fine-grained text proposals by referring to the CTPN structure, which are connected into text lines by text line construction algorithm. We experimentally demonstrate the effectiveness of the detection model with the precision of 92.46% and the recall of 91.74% in the text detection task of the dataset CMPTD.
|