Focus: Forensic Genetics
LIU Bing
Until present, China national DNA database has already gathered tens of millions of data, including not only the DNA profiles but also a large amount of information related to the time, space, means, type of the committed crime and the residence, nationality, individual behavior of the suspect. With the growing needs of public security, the data are still in rapid accumulation and growth. From 2011 to 2013, the database collected relevant data covering over 79.25% of murder and 40.53% of rape cases filed. Currently, the main use of the DNA database is personal identification, not fully tapping its data value. Data mining can provide assistance in conceptual formation and accuracy, exploration on regularity and pattern, modeling and the other useful knowledge. Using the methods of classification, estimation, prediction, affinity grouping, association rules and cluster analysis, data mining can fulfill a deep analysis of the intricate data in the DNA database, like the DNA profiles, the relevant information of cases, the background and behaviors of individual suspects. By resorts of cluster analysis, this paper attempts to obtain a preliminary analysis at multiple dimensions of time, space, type of crime. The analyzed data covered over 0.45 million criminal cases, 20 million individuals and 1 million matched reports, which were collected and produced in the past four years. The analysis is made up of three parts: the distribution of four kinds of crime (murder, robbery, theft, rape);the residence distribution of the offenders involved into the four kinds of crime;the situation of offenders resampled in the national DNA database. This study also carried out a SWOT (strengths, weaknesses, opportunities, threats) analysis on the application of data mining in the national DNA database. Data mining is an emerging technology of wide prospect. Its usage into the management and application of the national DNA database conforms to the open-mindedness of the information society, in favor of the improvement and development of the database itself. However, the above analysis is not perfect due to the limitations of underlying conditions. Through the combined application of the established means of data mining plus online analytical processing (OLAP), the attempts hereof can be continuously elevated along with the other analyses under dynamic and deep-reaching conditions. Therefore, the criminal time and space distribution will be defined more clearly, evolution and prediction of typical crime given more timely based on the personal and crime background, and the dynamics and early detection of high-risk criminal groups tracked more tightly with the DNA hunting and ID checking. Ideally, the DNA database can provide real-time, reliable and accuracy-high personal identification intelligence, showing its particular potential and value in the study of criminal pattern and dynamics, public security management decision and other involved aspects.