Research Articles
LI Chenyang, DING Dongsheng, ZOU Guangfa, WANG Kewen, FENG Lei, GUO Xiangqian, JI Anquan
This paper aims to validate the multiplex amplification system of 9 CpG sites reported in the literature, and explore its applicability in the Chinese population. The SNaPshot multiplex amplification system was validated in terms of accuracy, analysis of the initial amount of converted DNA templates, and detection of mixed samples. A total of 236 samples of five types of body fluids including saliva, semen, blood, vaginal secretion, and menstrual blood were selected. The SNaPshot multiplex amplification system was used to detect the methylation values of 9 CpG sites. The detection threshold of CpG sites was that the methylation value is greater than 0.1. Analyze the starting amount of transformed DNA templates in this system after converting DNA using sodium bisulfite (template amount ranging from 0.5 ng to 10 ng). DNA extracted from four body fluids, including saliva, semen, blood, and vaginal secretion, were mixed in the following ratios: 1∶1, 1∶5, 1∶10, and 1∶20. Finally, the detection data set of 232 samples of the five types of body fluids was used to optimize the existing body fluid source determination method. The train set (n=162) was used to construct a random forest model, and the test set (n=70) was used to predict the body fluid type and evaluated the predictive performance of the model. Furthermore, an external data set (n=40) was added to validate the prediction model. In saliva, semen, blood, vaginal secretion samples, and menstrual blood, the body fluid type was determined directly based on the specific sites of body fluids, and the accuracy rates of body fluid identification were 100%, 98%, 98%, 94% respectively. Due to the influence of the menstrual cycle, some sites were missing, and the average accuracy of menstrual blood identification was 21%. This system could effectively detect the amount of transformed DNA from 1 ng to 10 ng. Among the mixed sample, both body fluid sources were correctly identified in all 1∶1 mixed samples. The main components could be detected in the other mixed samples (ratio 1∶5, 1∶10 and 1∶20), while there was a significant difference in the detection of secondary components. A random forest model was built from 232 samples, and the accuracy of identifying the five fluid sources in both the test and validation sets was 100%. The above results show that the multiplex amplification system has high accuracy for the identification of saliva, semen, blood and vaginal secretion, and is suitable for the identification of trace samples, mixed samples (ratio 1∶1) or main components of other ratios. Compared with direct interpretation based on body fluid specific peaks, the new random forest model can better identify menstrual blood. In summary, the multiplex amplification system for tissue identification of five types of forensic body fluids based on DNA methylation is potential for good forensic application.