Summary: | Identity authentication is a main line of defense for network security, and passwords have long been the mainstream of identity authentication. In the field of password security research, large-scale password datasets have played an important role in the efficiency evaluation of password attack algorithms, the feasibility detection of password strength meters, and the correction of password probability models. However, due to user privacy, timeliness, effectiveness and other factors, it is still very difficult for researchers to obtain real large-scale user plaintext passwords. Based on this, this paper proposes a fast simulative password set generation algorithm based on structure partitioning and string recombination, denoted as SPSR-FSPG. The algorithm uses the probability context-free grammar to model the structure of the password, and constructs a string generation model based on the recurrent neural network to generate different types of strings, so as to learn the character composition of the password in the original dataset. In addition, the model fully considers the user's password reuse and modification behavior. Finally, the method is verified by experiment on six real Chinese and English password sets. The results show that the generation rate of SPSR-FSPG is faster than other algorithms. In terms of true password coverage, the SPSR-FPSG simulative password set is increased by 11.36% and 17.5, respectively, relative to SPPG and PCFG, and is increased by about 122.73% and 130.3%, respectively, compared to OMEN and 4-Markov. And the fit of the Zipf distribution is maintained at a level above 0.95, it is better than 0.9 of SPPG. At the same time, the SPPR-FPSG simulative password set is closer to the real password set in terms of length and character composition.
|