Automated image based CAPTCHA solver

CAPTCHA is known as "Completely Automated Public Turing Test to tell Computers and Humans Apart". Text-based CAPTCHA is the most common technique used across the internet to detect bot from attacking an online system. An image of distorted word is generated as computer program will have di...

Full description

Bibliographic Details
Main Author: Choong, Kai Bin (Author)
Format: Thesis
Published: 2018-01.
Subjects:
Online Access:Get fulltext
LEADER 02314 am a22001573u 4500
001 78552
042 |a dc 
100 1 0 |a Choong, Kai Bin  |e author 
245 0 0 |a Automated image based CAPTCHA solver 
260 |c 2018-01. 
520 |a CAPTCHA is known as "Completely Automated Public Turing Test to tell Computers and Humans Apart". Text-based CAPTCHA is the most common technique used across the internet to detect bot from attacking an online system. An image of distorted word is generated as computer program will have difficulty to read it. In fact, human can read the text in the image CAPTCHA easily. This will help to prevent websites from being attacked by automated scripts. Hence, CAPTCHA should be considered as a win-win strategy that is able to provide security for websites from bot attack but do not cause any disturbance to the user. On the other hand, due to the advancement of pattern recognition technology, current text based CAPTCHA may not be robust enough to defend the intelligence of bot. Thus, in this project, a CAPTCHA solving algorithm is developed to investigate on the strength of CAPTCHA in defeating the bot. Besides, it is also aimed to find out the gap of text based CAPTCHA which in turn helps to develop a more robust CAPTCHA. The project methodology can be broken down into pre-processing, segmentation and character recognition. In preprocessing stage, CAPTCHA image is converted to grey image. After that, lines and dots are removed in order to get back the original word in the image. Segmentation is carried out to crop out individual characters that exist in the image CAPTCHA for character recognition purpose. After the characters have been extracted, the characters are recognized by matching them with the database. If all the characters can be recognized, the text based CAPTCHA is broken. The CAPTCHA solving algorithm was developed with MATLAB, so that it can be trained against a custom dataset. It is able to break ASP.NET text-based CAPTCHA with accuracy of 96 % and 98.86 % in term of word and character recognition respectively. 
546 |a en 
650 0 4 |a TK Electrical engineering. Electronics Nuclear engineering 
655 7 |a Thesis 
787 0 |n http://eprints.utm.my/id/eprint/78552/ 
856 |z Get fulltext  |u http://eprints.utm.my/id/eprint/78552/1/ChoongKaiBinMFKE2018.pdf