520 |
|
|
|a CAPTCHA is known as "Completely Automated Public Turing Test to tell Computers and Humans Apart". Text-based CAPTCHA is the most common technique used across the internet to detect bot from attacking an online system. An image of distorted word is generated as computer program will have difficulty to read it. In fact, human can read the text in the image CAPTCHA easily. This will help to prevent websites from being attacked by automated scripts. Hence, CAPTCHA should be considered as a win-win strategy that is able to provide security for websites from bot attack but do not cause any disturbance to the user. On the other hand, due to the advancement of pattern recognition technology, current text based CAPTCHA may not be robust enough to defend the intelligence of bot. Thus, in this project, a CAPTCHA solving algorithm is developed to investigate on the strength of CAPTCHA in defeating the bot. Besides, it is also aimed to find out the gap of text based CAPTCHA which in turn helps to develop a more robust CAPTCHA. The project methodology can be broken down into pre-processing, segmentation and character recognition. In preprocessing stage, CAPTCHA image is converted to grey image. After that, lines and dots are removed in order to get back the original word in the image. Segmentation is carried out to crop out individual characters that exist in the image CAPTCHA for character recognition purpose. After the characters have been extracted, the characters are recognized by matching them with the database. If all the characters can be recognized, the text based CAPTCHA is broken. The CAPTCHA solving algorithm was developed with MATLAB, so that it can be trained against a custom dataset. It is able to break ASP.NET text-based CAPTCHA with accuracy of 96 % and 98.86 % in term of word and character recognition respectively.
|