2017年司法语音及声学研究
康锦涛, 王莉, 王晓笛, 盛卉, 李敬阳, 黄文林
公安部物证鉴定中心,2011计划司法文明协同创新中心,北京 100038

第一作者简介:康锦涛(1985—),男,河南鹿邑人,硕士,助理研究员,研究方向为声纹鉴定。E-mail:kangjintao@cifs.gov.cn

摘要

司法语音及声学在2017年发展迅速,本文对其中的代表性成果进行梳理。在语音同一认定方面,VPA成为听觉分析的热点,一些机构对鉴定意见表述分类更加细致,自动识别技术不断发展,其与专家鉴定的关系各方仍有不同意见,部分机构与学者开始推动检验过程向更加透明化的方向发展。在语音人身分析方面,除了传统的对性别、年龄、地域、体形等属性的推断,语音情感分析也成为研究热点,无论是专家分析还是自动分析都取得了进步。录音的真实性检验中,除了传统检验流程的深入研究,电网频率仍是关注热点。降噪及语音增强开始使用深度神经网络技术并取得了一定成果。

关键词: 司法语音及声学; 语音同一认定; 语音人身分析; 录音的真实性检验; 降噪及语音增强
中图分类号:DF793.2 文献标志码:A 文章编号:1008-3650(2018)03-0179-08
A Review on Researches of Forensic Phonetics and Acoustics in 2017
KANG Jintao, WANG Li, WANG Xiaodi, SHENG Hui, LI Jingyang, HUANG Wenlin
Institute of Forensic Science, Ministry of Public Security, 2011 Collaborative Innovation Center of Judicial Civilization, Beijing 10038, China
Abstract

Forensic phonetics and acoustics have made fast development in 2017. This paper gives a short introduction to their state-of-the-art techniques. In forensic speaker identification (FSI), many scholars have made efforts in standardizing the procedures of auditory analysis. These procedures are being used in several forensic speaker comparison laboratories around the world. Besides, Vocal profile analysis (VPA) is the focus of the techniques and several scholars have devised different versions of VPA for forensic purpose, and its utility even extends to the optimization of automatic speaker comparison systems. Phonetic-acoustic analysis has been broadened in different directions, including filled pauses, diphthongs and voice profiling. The value of diverse vocal characteristics is measured in different ways; concepts like relevant population and homogeneity measurement are initiated to quantize accurately. Evaluation system is divided into more scales in some institutions for various reasons like avoiding cliff-edge effect and keeping accordance with other forensic counterparts. Automatic speaker recognition moves forward although there are controversies over its relationship with experts and its role in FSI. Whether conclusions from these machines should be accepted in court as evidence is argued around the world and yielding different attitudes. Some institutions and professionals are working toward building a more transparent environment for FSI. In speaker profiling, apart from traditional characteristics like age, sex, body size and region, both emotion and deception detection are becoming more and more intelligence-intensive in human and automatic ways; several I-vector based systems are designed to infer speakers’ information from voices automatically, getting good results. In audio authentication, besides further developments in traditional examination procedures and methods, electric network frequency and its forensic application are still elaborated in many sub-disciplines, such as the construction of ENF database and automatic comparison methods. In noise reduction and speech enhancement, an integration is being made with new techniques such as deep neural network. Different algorithms and systems have been desvised in cancelling reverberation, enhancing speech while reducing noises.

Key words: forensic phonetics and acoustics; forensic speaker identification; speaker profiling; audio authentication; noise reduction and speech enhancement

司法语音及声学在我国即为广义上的声纹鉴定, 包括司法语音学检验中的语音同一认定、语音人身分析、语音内容辨识和司法声学检验中的录音的真实性检验、降噪及语音增强、噪声分析、音源同一鉴定以及录音器材鉴定等内容[1]。国外司法语音及声学的研究内容与我国大致相同[2]。2017年, 语音同一认定仍是司法语音及声学的主要内容, 其在听觉分析、语音学-声学分析、自动识别、质量控制等方面均产生了新的成果; 语音人身分析除传统的性别、年龄等特征外, 语音情感分析也成为重要内容, 并在自动识别方面发展迅速; 各国学者也在录音的真实性检验以及降噪及语音增强等方向做了开拓。本文对2017年司法语音及声学领域的语音同一认定、语音人身分析、录音的真实性检验、降噪及语音增强等热点专业的代表性成果进行介绍。

1 语音同一认定

语音同一认定在我国即为狭义上的声纹鉴定[3], 它也是司法语音及声学实践中的主要分支[4]。目前, 国际上的语音同一认定实践中, 绝大多数机构与从业者采用的是听觉分析与声学分析相结合的专家鉴定方法[5], 但也有一些机构开始尝试将自动识别的方法引入语音同一认定领域, 采用半自动(专家干预)或自动识别等方法开展实践[6, 7]。2017年, 关于语音同一认定的专业论述多数集中在听觉分析方法、语音学及声学特征分析、语音特征的鉴定价值、鉴定意见表述、自动识别技术以及语音同一认定过程中的质量控制与标准化等方面。

1.1 听觉分析

听觉分析是目前语音同一认定技术方法的重要组成部分[1, 8, 9, 10], 在国内外许多规范标准中早有明确规定[11, 12, 13, 14, 15, 16]。2017年, Sundqvist等[17]设计了一套听觉分析程序, 并将之应用于瑞典国家法庭科学中心(NFC)的检验实践中。为了推进听觉分析方法的体系化与规范化, Lindh等[18]对听觉分析方法的可靠性做了考察, 分别使用听觉分析与自动识别对芬兰语说话人进行对比分析, 并用于芬兰国家调查局(NBI)的语音同一认定实践的流程改进。Leinonen等[19]提出建立不同语种的听觉特征集, 并在瑞典语和芬兰语两个语种上开始了初步尝试。Land等[20]对笑声的听觉分析价值进行了探讨。在伪装语音的研究方面, Skarnitzl与Rů ž ič ková 等[21, 22]研究了捷克语说话人的常见伪装方式, 并对不同伪造方式下的听觉特征与声学特征做了初步分析, Delvaux等[23]考察了伪装与模仿两种方式下听觉特征与声学特征的差异。

嗓音特质分析(Vocal Profile Analysis, VPA)在语音同一认定中的应用是近年来听觉分析研究的热点[24, 25, 26, 27, 28, 29], 2017年, 许多专家学者继续就这一方向进行探索。为了便于分析, Segundo等[30]设计了简化的VPA分析表, 并应用于同卵双胞胎的听觉分析上; Segundo等[31]验证了VPA分析表在西班牙语、德语、英语语境下的有效性。Klug[32]就VPA分析表的改进做了探讨, 提出应当在加强培训的基础上改进要素的类目。Hughes等[33, 34]将VPA分析表得分与自动识别方法结合起来考察, 结果表明, 将使用梅尔频率倒谱系数(MFCC)参数与长时共振峰分布(LTFD)特征的自动识别系统融合, 系统性能提升有限, 将VPA得分结果加入后, 系统识别正确率显著增
加。

1.2 语音学-声学分析

听觉分析与语音学-声学分析是共生互补的关系[35, 36], 语音学-声学分析方法不仅为听觉分析提供量化支持, 而且也可以提供新的特征[3]。在语音学-声学分析方面, Heuven、Gold等[37, 38]继续就填词暂停(filled pauses)、犹豫词(hesitation markers)的声学特征进行分析, 以进一步挖掘其在语音同一认定中的价值。He等[39]研究了不同说话人的重音变化受噪音或不同频段影响的程度, 结果表明不同说话人的重音特征在全频段上都有较好的体现。双语者在说两种语言时的声学特征各有何特点是一直以来的研究课题之一, Dorreen等[40]就这个课题下的长时基频分布做了研究。Arantes等[41]考察了语种、话语方式等因素对长时基频达到稳定状态时的时长影响, 结果表明话语方式的影响最大。Dimos、Lopez等[42, 43]研究了大喊状态下语音的节奏、韵律以及频谱特征。He等[44]研究了音强曲线的声纹鉴定价值。不同语种的元音空间(vowel space)并不相同, Varoš anec-Š karić [45]研究了克罗地亚语、塞尔维亚语和斯洛文尼亚语男性说话人元音空间的异同, 为开展不同语种间的说话人鉴定提供了一定基础。McDougall等[46]比较了基于音节与基于时间的两种流利度描写方法。Wang等[47]研究了汉语复合元音的动态特征, 结果表明复合元音也具备较高的声纹鉴定价值。Heeren[48]对电话录音中[s]在不同语境下的不同声学特性进行了探讨。在嗓音档案(voice profile)的构建方面, Franchini[49]以[l]音的声学特征为例对此做了研究, Fingerling[50]对二语说话人的元音集合重建做了探索。

1.3 语音特征的价值

在语音同一认定中, 语音特征价值的高低是需要重点考虑的内容。根据语音特征的动态性原理, 其具有变异性(即同一说话人的自身的差异)和差异性(即不同说话人之间的差异), 变异小而差异大的特征鉴定价值较高。2017年, 对于特征价值的关注点主要在人群的语音特征分布上。Rhodes等[51]认为现阶段的人群特征分布研究应与实际案件结合。Hughes、Wormald[52]提出建立维基方言库的构想, 将方言中的高价值特征放入数据库。Hughes等[53]提出了研究人群语音特征分布需要考虑的四个问题, 一是控制因子, 二是特异度, 三是误差, 四是确定程度, 并以英语中双元音[ai]中的共振峰走势为例, 说明了不同情况下的语音特征分布对语音同一认定结果的可能影响。在检材与样本内部语音特征的表现是否稳定方面, 在以往部分研究的基础上, Ajili[54, 55, 56]提出一种使用信息论中的同质化度量(homogeneity measure)标准对声学参数的稳定性进行度量的方法[57]

1.4 声纹鉴定意见表述

声纹鉴定的意见表述一直以来都是讨论热点。国际上, Rose和Morrison一直提倡量化的似然比体系, 英国的Nolan等绝大部分从业人员使用英国立场声明形式, 欧洲大陆的大部分从业者则使用可能性等级形式。我国则多使用5级分类的可能性等级形式[11]

2017年, 英国的French[58]调整了其意见表述形式, 逐渐从英国立场说明框架下的一致性与独特性[59]转向可能性等级形式, 在这一框架下, 意见共分为13级, 与英国法庭科学提供者协会(Association of Forensic Science Providers)推荐的标准[60]一致。荷兰NFI的Vermeulen[61]介绍了其得出“ 强烈支持” 结论的依据, 在实际案例中, NFI只有在检材与样本特征几乎相同或者说话人有言语障碍等高度独特性特征时才给出这种鉴定意见。

1.5 语音数据库及自动识别技术

目前, 国际上司法语音及声学专门的语音数据库有英国的Nolan建立的DyVis[62]、澳大利亚的Morrison建立的FVCD[63]、西班牙的Ramos建立的AHUMADA[64]、荷兰的Vloed建立的NFI-FRITS[65]、法国的Ajili建立的FABIOLE[66]等。国内方面, 我国的“ 全国公安机关声纹数据库” 依然是国际上收录说话人最多的声纹鉴定语音数据库。2017年新建的VoxCeleb[67]则是比较新的代表。目前说话人自动识别技术的主流框架主要有两类, 一种是高斯混合模型加通用背景模型(GMM-UBM), 另一种是基于i向量(i-vector)空间的概率线性判别分析(PLDA)方法, 同时开始使用深度神经网络(deep neural network, DNN)提取语音特征。后一种框架较新, 因此成为2017年的研究热点。DNN提取语音特征的方法取得的效果较好, 对训练数据量的要求也较大, 我国的“ 全国公安机关声纹数据库” 已经采用DNN方法提取特征。Park等[68]将嗓音音质声学特征引入采用这种架构的自动识别系统中, 与MFCC特征结合, 显著提升了短语音的识别率。Solewicz等[69]为解决现有的对数似然比(LLR)对处理说话人内部变异的不足提出了一种新的说话人自动识别系统性能指标— — 空假设对数似然比(Null-Hypothesis LLR)。Tschä pe等[70]考察了基于i向量系统的错误结果, 发现如果加入地域信息, 系统错误率会大大下降。Alexander等[71]设计了基于i向量的多说话人自动识别系统。Miloš ević [72]将基频、共振峰频率、共振峰带宽等音段特征(SF)与现有GMM-MFCC架构的自动识别系统相结合, 提升了原有系统的识别正确率。

关于说话人自动识别在语音同一认定中的作用, 目前仍有争议。比如, 虽然德国、西班牙、瑞典等国的诉讼中已有接受专家干预自动识别方法鉴定结论的判例, 但鉴于目前自动识别系统的性能, 这种“ 接受” 不仅在程度上有限, 而且推广起来仍困难重重。以英国为例, 英国JP French实验室的French与Harrison作为辩方专家证人在“ 女王诉斯雷德等人” (R v Slade& Ors)的上诉案件中提供了专家鉴定与自动识别系统两套语音同一认定证据, 但是上诉法院驳回了自动识别系统的鉴定结论。 French[58]表示, 虽然这宗判例并没有直接扼杀英国未来使用自动识别系统鉴定结论的希望, 但是, 鉴于英美法系的判例传统, 除非未来说话人自动识别技术取得重大技术突破, 否则不仅是英国, 甚至包括加拿大、新西兰、澳大利亚等英联邦国家(共52个国家)都将驳回说话人自动识别系统的鉴定结论。

2 质量控制及标准化

质量控制方面, French等[73]提出了声纹鉴定实验室检验鉴定的透明化倡议, 其将之称为“ 打开百叶窗” (opening the blinds)行动, 并详细介绍了JP French实验室的检验流程。德国BKA的Wagner[74]则介绍了其语音同一认定的标准操作规程, 并结合实际案例进行了演示。这种透明化与标准化的趋势是司法语音及声学中质量控制的主要方向。

标准化方面, 我国的公安部颁布了司法语音及声学的四个公安安全行业标准, 包括语音同一认定[11]、录音的真实性检验[12]、降噪及语音增强[13]和语音人身分析[14]四个专业方向。

3 语音人身分析

语音人身分析是指在只闻其声、不见其人的情况下, 对说话人的社会群体属性和个体属性进行刻画; 或在见其人但不知其身份的情况下, 通过上述综合分析对其社会群体属性进行判断[4]。声纹鉴定实践中, 还涉及对说话人的暂时状态与瞬时状态的分析刻画, 如通过语音对说话人是否抽烟、吸毒进行分析, 通过语音推测说话人心理状态语音情感分析[75], 我们也将之归入语音人身分析中去。

人工耳蜗的频率响应有自己的特点, Kovač ić [76]研究了人工耳蜗对声音信号的处理特性, 并探索其在说话人性别、体型、身分识别等方面的应用潜力。Georg[77]研究了德语的不同方言对年龄分析的影响, 探索了不同方言对年龄推测的影响因素。Tomić [78]研究了通过口音负迁移推断说话人地域的方法。Jong-Lendle等[79]研究了从外国人的德语口音中推断其母语的方法。Schwab等[80]研究了抽烟对嗓音的影响, Rodmonga等[81]研究了吸毒后的言语听觉特征, 其结果均可用于对说话人身体状态的分析。自动人身分析方面, Kelly等[82]设计了基于i向量的说话人自动刻画系统, 能够自动分析说话人的性别、年龄及语种。Watt等[83]对自动口音识别与人工口音识别进行了比对研究。

语音情感分析方面, Kathiresan等[84]研究了MFCC中的语音情感信息。Hippey等[85]探索了在语音中识别懊悔情绪的方法。Bizozzero等[86]研究了女性说话人声音中的恐惧信息, 主要涉及基频、语速以及音高对恐惧信息的影响。Satt等设计了一种使用卷积网络与递归网络两种神经网络工[87]具直接从声谱图中识别情感的方法。Zhang等[88]针对对话语音设计了一个情感交流与转换(EIT)模型挖掘对话中的交流与转换语中的情感信息, 设计的算法比传统方法在正确率与精度方面各提升了18.8%与22.6%。Parthasarathy、Le等[89, 90]对深度学习中的多任务学习方法在语音情感识别中的应用做了探索。除了一般性的情感识别外, 语音测谎也是语音情感识别的研究热点。Schroder[91]使用合成分析方法(analysis-by-synthesis)将不同的发声方式、语速、颤音(tremolo)及基频与中性言语(neutral utterances)组合, 分别判断各段语音的可信度。结果表明, 当颤音与气息增加时, 语音内容的可信度大大提升, 当暂停与基频增加上, 语音内容的可信度则下降。Mendels[92]使用CXD语料库比较了频谱集合、声学-韵律集合和用词特征集合对于谎言的表征程度, 并使用混合深度模型对这些集合进行测试。

4 录音的真实性检验

录音真实性检验是指通过对录音资料进行语音学和声学、电磁学、信号处理技术等方面的分析检验, 做出其是否经过剪辑的结论[4]

2017年, Ali[93]等开发了一套自动系统, 系统基于心理声学原理, 准确率达99.2%。Catalin等[94]为了解决检验中无法获取原始录音器材的问题, 将18年间的125中录音设备与40中商业录音软件的文件结构与格式做了全面介绍。Jeff等[95]研究了iOS系统中的音频文件, 并基于决策树建立了针对此类文件的检验流程。Rashmika等[96]探讨了录音中的混响等噪音信息在真实性检验中的价值。

电网频率(ENF)检测方法是录音的真实性检验中的热点。关于这一方法的原理与具体内容, 可参见以往文献[97, 98, 99]。Huang等[100]就ENF检验中的一些常见问题进行了讨论。James等[101]开发了基于云端的便携式ENF系统, 从而避免了检验的地域限制。Huang等[102]提出用绝对误差图(absolute error map)联系检材音频与ENF数据库中的ENF信息, 并据此构建的两套算法。Reis等[103]开发了基于ESPRIT-Hilbert检测ENF的分析方法, 结果大大优于其他方法。

国内方面, 操文成[104]针对语音伪造的检测提出了两种新算法, 漏检率均低于10%。孙蒙蒙[105]提出了适用于音频检测的共生向量特征, 基于该特征的方法准确率可达95%。申小虎[106]等在系统分析数字音频文件篡改方法基本原理的基础上, 使用多种频谱分析方法寻找音频文件的篡改特征, 建立了有效的频谱检验的方法。

5 降噪及语音增强

降噪及语音增强是综合运用计算机技术、声学技术对录音资料进行降低噪音信号、增强语音信号的处理技术, 目前主要的算法有自适应噪声抵消算法、统计模型算法、谱减法、听觉掩蔽算法, 短时谱估计算法、子空间算法、小波变换算法等[4]。2017年, 使用DNN方法降噪及语音增强成为热点。

在去混响及回声消除方面, Guzewich等[107]研究了使用DNN去混响的一种新方法。此前, 相关研究[108, 109, 110, 111]已经在使用DNN去混响方面取得了一定进展, 新方法处理的音频在说话人比对系统中的等错误率由9.2%降至6.8%。Bulling等[112]提出了一种消除录音中回声的新方法, 可以使信号的最大稳定增益(MSG)提升30分贝。在语音增强方面, Wu等[113]提出了基于局部线性嵌入(LLE)算法的差异补偿后置滤波(post-filtering)方法。Ogawa等[114]从基于深度神经网络的声学模型(DNN-AM)中提取出瓶颈特征(bottleneck features), 然后使用噪音样例搜索(example search)的方法消除单声道音频中的高度不稳定噪音。Gelderblom等[115]提出了一种评价基于DNN的语音增强算法的主观评测方法。在非DNN方法上, Qian等[116]使用贝叶斯WaveNet方法直接就原始音频进行处理, 也得到了不错的语音增强效果。在降噪方面, Pascual等[117]使用深度网络中的生成式对抗网络(generative adversarial network)降噪, 并以主观与客观两种评测方法证明了这种方法的有效性。Maiti等[118]同时使用两个网络进行拼接再合成(concatenative resynthesis), 大大提升了处理速度。值得注意的是, 在司法实践中, 背景噪音因为包含着有用信息, 需要在降噪过程中保留甚至增强, 这就需要实践中结合多种方法, 消减目标噪音, 保留有用信息, 上述部分深度学习的方法因具有较强的灵活性便具有了更大的优势。

The authors have declared that no competing interests exist.

作者已声明无竞争性利益关系。

参考文献
[1] 李敬阳. 音像物证技术第二章: 声音物证技术[M]//李学军. 新编物证技术学. 北京: 北京交通大学出版社, 2015: 339-360. [本文引用:2]
[2] HOLLIEN H. The acoustics of crime: the new science of forensic phonetics[M]. New York: Plenum Press, 1990. [本文引用:1]
[3] 曹洪林, 李敬阳, 王英利, . 论声纹鉴定意见的表述形式[J]. 证据科学, 2013, 21(5): 605-624. [本文引用:1]
[4] 王英利, 李敬阳, 曹洪林. 声纹鉴定技术综述[J]. 警察技术, 2012(4): 54-56. [本文引用:4]
[5] Eriksson A. Aural/Acoustic vs. automatic methods in forensic phonetic casework[M]// NEUSTEIN A, PATIL H. A. In Forensic Speaker Recognition: Law Enforcement and Counter-Terrorism. New York: Springer, 2011: 41-69. [本文引用:1]
[6] GOLD E, FRENCH P. International practices in forensic speaker comparison[J]. International Journal of Speech Language and the Law, 2011, 18(2): 293-307. [本文引用:1]
[7] MORRISON G S, SAHITO F H, JARDINE G, et al. Interpol survey of the use of speaker identification by law enforcement agencies[J]. Forensic Science International, 2016, 263(3): 92-100. [本文引用:1]
[8] NOLAN F. The phonetic bases of speaker recognition[M]. Cambridge, UK: Cambridge University Press, 1983. [本文引用:1]
[9] HOLLIEN H, DIDLA G, HARNSBERGER J D, et al. The case for aural perceptual speaker identification[J]. Forensic Science International, 2016, 269(3) : 8-20. [本文引用:1]
[10] ROSE P. Forensic speaker identification[M]. London: Taylor and Francis, 2002. [本文引用:1]
[11] 中华人民共和国公安部, 法庭科学语音同一认定技术规范: GA/T 1433-2017[S]法庭科学语音同一认定技术规范: GA/T 1433-2017[S]. 北京: 中国标准出版社, 2017. [本文引用:3]
[12] 中华人民共和国公安部, 法庭录音的真实性检验技术规范: GA/T 1432-2017 [S]法庭录音的真实性检验技术规范: GA/T 1432-2017 [S]. 北京: 中国标准出版社, 2017. [本文引用:2]
[13] 中华人民共和国公安部, 法庭科学降噪及语音增强技术规范: GA/T 1431-2017[S]法庭科学降噪及语音增强技术规范: GA/T 1431-2017[S] . 北京: 中国标准出版社, 2017. [本文引用:2]
[14] 中华人民共和国公安部, 法庭科学语音人身分析技术规范: GA/T 1430-2017[S]法庭科学语音人身分析技术规范: GA/T 1430-2017[S] . 北京: 中国质检出版社, 2017. [本文引用:2]
[15] 中华人民共和国司法部司法鉴定管理局, 录音资料鉴定规范: SF/Z JD0301001-2010[S]. 北京: 中国标准出版社, 2010. [本文引用:1]
[16] CAIN S. American Board of Recorded Evidence-Voice Comparison Stand ards[EB/OL]. (1998)[ 2017-10-15]. http://www.forensictapeanalysisinc.com/Articles/voice_comp.htm [本文引用:1]
[17] SUNDQVIST M, LEINONEN T, LINDH J, et al. Blind test procedure to avoid bias in perceptual analysis for forensic speaker comparison casework[C]// IAFPA . Proceedings of IAFPA2017. Split, Croatia: IAFPA, 2017: 45-47. [本文引用:1]
[18] LINDH J, NAUTSCH A, LEINONEN T, et al. Comparison between perceptual and automatic systems on finnish phone speech data (FinEval1) - a pilot test using score simulations[C]// IAFPA. Proceedings of IAFPA2017. Split, Croatia: IAFPA, 2017: 86-87. [本文引用:1]
[19] LEINONEN T, LINDH J, AKESSON J. Creating linguistic feature set templates for perceptual forensic speaker comparison in finnish and swedish[C]// IAFPA. Proceedings of IAFPA2017. Split, Croatia: IAFPA, 2017: 126-128. [本文引用:1]
[20] LAND E, GOLD E. Speaker identification using laughter in a close social network[C]// IAFPA. Proceedings of IAFPA2017. Split, Croatia: IAFPA, 2017: 99-101. [本文引用:1]
[21] SKARNITZL R, RŮŽIČKOVÁ A. The malleability of speech production: An examination of sophisticated voice disguise[C]// IAFPA. Proceedings of IAFPA2017. Split, Croatia: IAFPA, 2017: 59-60. [本文引用:1]
[22] RŮŽIČKOVÁ A, SKARNITZL R. Voice disguise strategies in Czech male speakers[J]. AUC Philologica, Phonetica Pragensia. 2017. [本文引用:1]
[23] DELVAUX V, CAUCHETEUX L, HUET K, et al. Voice disguise vs. Impersonation: Acoustic and perceptual measurements of vocal flexibility in non-experts[C]// ISCA. Proceedings of Inter speech 2017. Stockholm, Sweden: ISCA , 2017: 3777-3781. [本文引用:1]
[24] JESSEN M. Speaker-specific information in voice quality parameters[J], Forensic Linguistics 1997, 4(1): 84-103. [本文引用:1]
[25] KÖSTER O, KÖSTER J P. The auditory-perceptual evaluation of voice quality in forensic speaker recognition[J]. The Phonetician, 2004, 89: 9-37. [本文引用:1]
[26] NOLAN F. Voice quality and forensic speaker identification[J]. GOVOR XXIV 2007. 24(2): 111-128. [本文引用:1]
[27] KÖSTER O, JESSEN M, KHAIRI F, et al. Auditory-perceptual identification of voice quality by expert and non-expert listeners[C]. ICphS XVI, 2007: 1845-1848. [本文引用:1]
[28] SEGUNDO E, ALVES H, TRINIDAD M F. CIVIL corpus: voice quality for speaker forensic comparison[J]. Proceida, Social and Behavioral Science. 2013, 95(4): 587-593. [本文引用:1]
[29] FRENCH P. Developing the vocal profile analysis scheme for forensic voice comparison[C]. York, UK: IAFPA, 2016. [本文引用:1]
[30] SEGUNDO E. A simplified vocal profile analysis protocol for the assessment of voice quality and speaker similarity[J]. Journal of Voice. 2017, 31(5): 11-27. [本文引用:1]
[31] SEGUNDO E, BRAUN A, HUGHES V, et al. Speaker-similarity perception of Spanish twins and non-twins by native speakers of Spanish, German and English[C]// IAFPA. Proceedings of IAFPA2017. Split, Croatia: IAFPA, 2017: 159-162. [本文引用:1]
[32] KLUG K. Refining the Vocal Profile Analysis (VPA) scheme for forensic purposes[C]// IAFPA. Proceedings of IAFPA2017. Split, Croatia: IAFPA, 2017. 190-191. [本文引用:1]
[33] HUGHES V, HARRISON P, FOULKES P, et al. Mapping across feature spaces in forensic voice comparison: the contribution of auditory-based voice quality to (semi-)automatic system testing[C]// ISCA. Proceedings of Interspeech2017. Stockholm, Sweden: ISCA , 2017: 3892-3896. [本文引用:1]
[34] HUGHES V, HARRISON P, P FOULKES, et al. The complementarity of automatic, semi-automatic, and phonetic measures of vocal tract output in forensic voice comparison[C]. // IAFPA. Proceedings of IAFPA2017. Split, Croatia: IAFPA, 2017: 83-85. [本文引用:2]
[35] NOLAN F. Speaker identification evidence: its forms, limitations, and roles[C]//Proceedings of the conference’ Law and Language: Prospect and Retrospect’ . University of Lapland , 2001. [本文引用:1]
[36] NOLAN F. Voice[M]// BOGAN P S, ROBERTS A. In identification: investigation, trial and scientific evidence. Jordan Publishing , 2011: 381-390. [本文引用:1]
[37] HEUVEN V, CORTES P. Speaker specificity of filled pauses compared with vowels and consonants in Dutch[C]// IAFPA . Proceedings of IAFPA2017. Split, Croatia: IAFPA, 2017: 48-49. [本文引用:1]
[38] GOLD E, ROSS S, EARNSHAW K. Delimiting the West Yorkshire population: Examining the regional-specificity of hesitation markers[C]// IAFPA . Proceedings of IAFPA2017. Split, Croatia: IAFPA, 2017: 50-52. [本文引用:1]
[39] HE L, DELWO V. Between-speaker intensity variability is maintained in different frequency band s of amplitude demodulated signal[C]// IAFPA . Proceedings of IAFPA2017. Split, Croatia: IAFPA, 2017: 55-58. [本文引用:1]
[40] DORREEN K, PAPP V. Bilingual speakers’ long-term fundamental frequency distributions as cross-linguistic speaker discriminants[C]// IAFPA . Proceedings of IAFPA2017. Split, Croatia: IAFPA, 2017: 61-64. [本文引用:1]
[41] ARANTES P, ERIKSSON A, GUTZEIT. Effect of language, speaking style and speaker on long-term f0 estimation[C]// ISCA. Proceedings of Inter speech 2017. Stockholm, Sweden: ISCA , 2017: 3897-3901. [本文引用:1]
[42] DIMOS K, DELLWO V, HE L. Rhythm and speaker-specific variability in shouted speech[C]// IAFPA . Proceedings of IAFPA2017. Split, Croatia: IAFPA, 2017: 102-104. [本文引用:1]
[43] LOPEZ A, SAEIDI R, JUVELA L, et al. Normal-to-shouted speech spectral mapping for speaker recognition under vocal effort mismatch[C]// ICASSP. Proceedings of ICASSP2017. ICASSP, 2017: 4940-4944. [本文引用:1]
[44] HE L, DELLWO V. Speaker-specific temporal organizations of intensity contours[C]// IAFPA . Proceedings of IAFPA2017. Split, Croatia: IAFPA, 2017: 163-166. [本文引用:1]
[45] VAROŠANEC-ŠKARIĆ G, BAŠIĆ I, KIŠIČEK G. Comparison of vowel space of male speakers of Croatian, Serbian and Slovenian language[C]// IAFPA . Proceedings of IAFPA2017. Split, Croatia: IAFPA, 2017: 142-146. [本文引用:1]
[46] MCDOUGALL K, DUCKWORTH M. Fluency profiling for forensic speaker comparison: a comparison of syllable- and time-based approaches[C]// IAFPA . Proceedings of IAFPA2017. Split, Croatia: IAFPA, 2017: 129-131. [本文引用:1]
[47] WANG L, KANG J, LI J, et al. Speaker-specific dynamic features of diphthongs in Stand ard Chinese[C]// IAFPA . Proceedings of IAFPA2017. Split, Croatia: IAFPA, 2017: 91-95. [本文引用:1]
[48] HEEREN W. Speaker-dependency of /s/ in spontaneous telephone conversation[C]// IAFPA . Proceedings of IAFPA2017. Split, Croatia: IAFPA, 2017: 68-71. [本文引用:1]
[49] FRANCHINI S. Construction of a voice profile: An acoustic study of /l/[C]// IAFPA . Proceedings of IAFPA2017. Split, Croatia: IAFPA, 2017: 183-186. [本文引用:1]
[50] FINGERLING B. Constructing a voice profile: Reconstruction of the L1 vowel set for a L2 speaker[C]// IAFPA . Proceedings of IAFPA2017. Split, Croatia: IAFPA, 2017: 197-199. [本文引用:1]
[51] RHODES R, FRENCH P, HARRISON P, et al. Which questions, propositions and ‘relevant populations’ should a speaker comparison expert assess[C]// IAFPA. Proceedings of IAFPA2017. Split, Croatia: IAFPA. 2017: 40-44. [本文引用:1]
[52] HUGHES V, WORMALD J. WikiDialects: a resource for assessing typicality in forensic voice comparison[C]// IAFPA. Proceedings of IAFPA2017. Split, Croatia: IAFPA. 2017: 154-155. [本文引用:1]
[53] HUGHES V, FOULKES P. What is the relevant population? Considerations for the computation of likelihood ratios in forensic voice comparison[C]// ISCA. Proceedings of Inter speech 2017. Stockholm, Sweden: ISCA , 2017: 3772-3776. [本文引用:1]
[54] AJILI M, BONASTRE J, KHEDER W, et al. Phonetic content impact on forensic voice comparison[C]// Spoken Language Technology Workshop(SLT), 2016 IEEE. IEEE, 2016: 210-217. [本文引用:1]
[55] AJILI M, BONASTRE J, ROSSATTO S, et al. Inter-speaker variability in forensic voice comparison: a preliminary evaluation[C]//2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2016: 2114-2118. [本文引用:1]
[56] DODDINGTON G, LIGGETT W, MARTIN A, et al. Sheep, goats, lambs and wolves: A statistical analysis of speaker performance in the NIST 1998 speaker recognition evaluation[C]//Tech. Rep. DTIC Document, 1998. [本文引用:1]
[57] AJILI M, BONASTRE J, KHEDER W, et al. Homogeneity measure impact on target and non-target trials in forensic voice comparison[C]// ISCA. Proceedings of Inter speech 2017. Stockholm, Sweden: ISCA , 2017: 2844-2848. [本文引用:1]
[58] FRENCH P. A developmental history of forensic speaker comparison in the UK[J]. English Phonetics, 2017: 271-286. [本文引用:2]
[59] FRENCH P, HARRISON P. Position statement concerning use of impressionistic likelihood terms in forensic speaker comparison cases[J]. International Journal of Speech Language and the Law, 2007, 14(1): 137-144. [本文引用:1]
[60] Association of Forensic Science Providers. Stand ards for the formulation of evaluative forensic science expert opinion[J]. Science and Justice 2009(49): 161-164. [本文引用:1]
[61] VERMEULEN J, CAMBIER-LANGEVELD T. Outstand ing cases: about case reports with a “strong” conclusion[C] // IAFPA . Proceedings of IAFPA2017. Split, Croatia: IAFPA. 2017: 31-33. [本文引用:1]
[62] NOLAN F, MCDOUGALL K, JONG G D, et al. A forensic phonetic study of ‘dynamic’ sources of variability in speech: the dyvis project[C]//Proceedings of the 11th Australian International Conference on Speech Science & Technology. University of Auckland , 2006: 13-18. [本文引用:1]
[63] MORRISON G S, ZHANG C, ENZINGER E, et al. Forensic voice comparison databases[DB/OL], 2015. http://www.forensic-voice-comparison.net/ [本文引用:1]
[64] RAMOS D, GONZALEZ-RODRIGUEZ J, LUCENA-MOLINA J J. Addressing database mismatch in forensic speaker recognition with Ahumada III: a public real-casework database in Spanish[C]. International Speech CommunicationAssociation. 2008. [本文引用:1]
[65] VLOED V D, BOUTEN J, LEEUWEN D. NFI-FRITS: A forensic speaker recognition database and some first experiments[C]//Proceedings of Odyssey: The Speaker and Language Recognition Workshop. 2014: 6-13. [本文引用:1]
[66] AJILI M, BONASTRE J, ROSSATO S. FABIOLE, a Speech database for forensic speaker comparison[C]// Proceedings of LREC-Conference, Slovenia. 2016: 726-733. [本文引用:1]
[67] NAGRANI A, CHUNG J, ZISSERMAN A. VoxCeleb: a large-scale speaker identification dataset[J]. Sound. 2017. [本文引用:1]
[68] PARK S J, YEUNG G, KREIMAN J, et al. Using voice quality features to improve short-utterance, text-independent speaker verification systems[C] // ISCA. Proceedings of Inter speech 2017. Stockholm, Sweden: ISCA , 2017: 1522-1526. [本文引用:1]
[69] SOLEWICZ Y, JESSEN M, VAN DER VLOED. Null-Hypothesis LLR: a proposal for forensic automatic speaker recognition[C]// ISCA. Proceedings of Inter speech 2017. Stockholm, Sweden: ISCA , 2017: 2849-2853. [本文引用:1]
[70] TSCHÄPE N. Analysis of i-vector-based false-accept trials in a dialect labelled telephone corpus[C]// IAFPA . Proceedings of IAFPA2017. Split, Croatia: IAFPA. 2017: 65-67. [本文引用:1]
[71] ALEXANDER A. Not a lone voice: automatically identifying speakers in multi-speaker recordings[C]// IAFPA . Proceedings of IAFPA2017. Split, Croatia: IAFPA. 2017: 80-82. [本文引用:1]
[72] MILOŠEVIĆ M, GLAVITSCH U. Combining Gaussian mixture models and segmental feature models for speaker recognition[C]// ISCA. Proceedings of Inter speech 2017. Stockholm, Sweden: ISCA , 2017: 2042-2043. [本文引用:1]
[73] FRENCH J, HARRISON P, KIRCHHÜBEL C, et al. From receipt of recordings to dispatch of report: opening the blinds on lab practices[C]// IAFPA . Proceedings of IAFPA2017. Split, Croatia: IAFPA. 2017: 29-30. [本文引用:1]
[74] WAGNER I. The BKA stand ard operation procedure of forensic speaker comparison and examples of case work[C]// IAFPA . Proceedings of IAFPA2017. Split, Croatia: IAFPA. 2017: 34-36. [本文引用:1]
[75] 韩文静, 李海峰, 阮华斌, . 语音情感识别研究进展综述[J]. 软件学报, 2014, 25(1): 37-50. [本文引用:1]
[76] KOVAČIĆ D. Voice gender identification in cochlear implant users[C]// IAFPA . Proceedings of IAFPA2017. Split, Croatia: IAFPA. 2017: 23-25. [本文引用:1]
[77] GEORG A. The effect of dialect on age estimation[C]// IAFPA. Proceedings of IAFPA2017. Split, Croatia: IAFPA. 2017: 118-121. [本文引用:1]
[78] TOMIĆ K. Cross-language accent analysis for determination of origin[C]// IAFPA . Proceedings of IAFPA2017. Split, Croatia: IAFPA. 2017: 171-173. [本文引用:1]
[79] JONG-LENDLE G, KEHREIN R, URKE F, et al. Language identification from a foreign accent in German[C]// IAFPA . Proceedings of IAFPA2017. Split, Croatia: IAFPA. 2017: 135-138. [本文引用:1]
[80] SCHWAB S, AMATO M, DELLWO V, et al. Can we hear nicotine craving[C]// IAFPA . Proceedings of IAFPA2017. Split, Croatia: IAFPA. 2017: 115-117. [本文引用:1]
[81] RODMONGA P, TATIANA A, NIKOLAY B, et al. Perceptual auditory speech features of drug-intoxicated female speakers (preliminary results)[C]// IAFPA . Proceedings of IAFPA2017. Split, Croatia: IAFPA. 2017: 118-121. [本文引用:1]
[82] KELLY F, FORTH O, ATREYA A, et al. What your voice says about you: automatic speaker profiling using i-vectors[C]// IAFPA . Proceedings of IAFPA2017. Split, Croatia: IAFPA. 2017: 72-75. [本文引用:1]
[83] WATT D, JENKINS M, BROWN G. Performance of human listeners vs. the Y-ACCDIST automatic accent classifier in an accent authentication task[C]// IAFPA . Proceedings of IAFPA2017. Split, Croatia: IAFPA. 2017: 139-141. [本文引用:1]
[84] KATHIRESAN T, DELLWO V. Cepstral dynamics in MFCCs using conventional deltas for emotion and speaker recognition[C]// IAFPA . Proceedings of IAFPA2017. Split, Croatia: IAFPA. 2017: 105-108. [本文引用:1]
[85] HIPPEY F, GOLD E. Detecting remorse in the voice: A preliminary investigation into the perception of remorse using a voice line-up methodology[C]// IAFPA . Proceedings of IAFPA2017. Split, Croatia: IAFPA. 2017: 179-182. [本文引用:1]
[86] BIZOZZERO S, NETZSCHWITZ N, LEEMANN A. The effect of fundamental frequency f0, syllable rate and pitch range on listeners’ perception of fear in a female speaker’s voice[C]// IAFPA . Proceedings of IAFPA2017. Split, Croatia: IAFPA. 2017: 174-178. [本文引用:1]
[87] SATT A, ROZENBERG S, HOORY R. Efficient emotion recognition from speech using deep learning on spectrograms[C]// ISCA. Proceedings of Inter speech 2017. Stockholm, Sweden: ISCA , 2017: 1089-1093. [本文引用:1]
[88] ZHANG R, ATSUSHI A, KOBASHIKAWA S, et al. Interaction and transition model for speech emotion recognition in dialogue[C]// ISCA. Proceedings of Inter speech 2017. Stockholm, Sweden: ISCA , 2017: 1094-1097. [本文引用:1]
[89] PARTHASARATHY S, BUSSO C. Jointly predicting arousal, valence and dominance with multi-task learning[C]// ISCA. Proceedings of Inter speech 2017. Stockholm, Sweden: ISCA , 2017: 1103-1107. [本文引用:1]
[90] LE D, ALDENEH Z, PROVOST E. Discretized continuous speech emotion recognition with multi-task deep recurrent neural network[C]// ISCA. Proceedings of Inter speech 2017. Stockholm, Sweden: ISCA , 2017: 1108-1112. [本文引用:1]
[91] SCHRODER A, STONE S, BIRKHOLZ P. The sound of deception - what makes a speaker credible[C]// ISCA. Proceedings of Inter speech 2017. Stockholm, Sweden: ISCA , 2017: 1467-1471. [本文引用:1]
[92] MENDELS G, LEVITAN S, LEE K. Hybrid acoustic-lexical deep learning approach for deception detection[C]// ISCA. Proceedings of Inter speech 2017. Stockholm, Sweden: ISCA , 2017: 1472-1476. [本文引用:1]
[93] ALI Z, IMRAN M, ALSULAIMAN M. An automatic digital audio authentication/forensics system[J]. Digital Object Identifier. 2017(5): 2994-3007. [本文引用:1]
[94] GRIGORAS C, SMITH J. Large scale test of digital audio file structure and format for forensic analysis[C]. 2017 AES International Conference on Audio Forensics, 2017. [本文引用:1]
[95] SMITH J, LACEY D, KOENIG B, et al. Triage approach for the forensic analysis of apple ios audio files recorded using the “voice memos” app[C]. 2017 AES International Conference on Audio Forensics, 2017. [本文引用:1]
[96] PATOLE R, KORE G, REGE P. Reverberation based tampering detection in audio recordings[C]. 2017 AES International Conference on Audio Forensics, 2017. [本文引用:1]
[97] Advisory Panel of White House Tapes. The EOB Tape of June 20, 1972: Report on a Technical Investigation Conducted for the U. S. District Court for the District of Columbia[R]. 1974. [本文引用:1]
[98] GRIGORAS C. Application of ENF Analysis Method in Forensic Authentication of Digital Audio and Video Recordings[J]. Journal of the Audio Engineering Society, 2007, 57(9) : 643-661. [本文引用:1]
[99] GRIGORAS C. Statistical Tools for Multimedia Forensics[C]. 39th International Conference: Audio Forensics: Practices and Challenges, 2010. [本文引用:1]
[100] HUA G, THING V. On practical issues of electric network frequency based audio forensics[J]. IEEE Transactions on Information Forensics & Security, 2017(5): 20640-20651. [本文引用:1]
[101] JAMES Z, GRIGORAS C, SMITH J. A low cost, cloud based, portable, remote ENF system[C]. 2017 AES International Conference on Audio Forensics , 2017. [本文引用:1]
[102] HUA G, ZHANG Y, GOH J. Audio authentication by exploring the absolute-error-map of ENF signals[J]. IEEE Transactions on Information Forensics & Security, 2016(5): 1003-1016. [本文引用:1]
[103] REIS P M G, MIRANDA R, GALDO G. ESPRIT-Hilbert based audio tampering detection with SVM classifier for forensic analysis via electrical network frequency[J]. IEEE Transactions on Information Forensics & Security, 2017(4): 853-864. [本文引用:1]
[104] 操文成. 语音伪造盲检测技术研究[D]. 成都: 西南交通大学, 2017. [本文引用:1]
[105] 孙蒙蒙. 录音真实性辨识和重翻录检测[D]. 深圳: 深圳大学, 2017. [本文引用:1]
[106] 申小虎, 金恬, 张长珍, . 录音资料真实性鉴定的频谱检验技术研究[J]. 刑事技术, 2017, 42(3): 173-177. [本文引用:1]
[107] GUZEWICH P, ZAHORIAN S. Improving speaker verification for reverberant conditions with deep neural network dereverberation processing[C]// ISCA. Proceedings of Inter speech 2017. Stockholm, Sweden: ISCA , 2017: 171-175. [本文引用:1]
[108] HAN K, WANG Y, WANG D. Learning spectral mapping for speech dereverberation[C]// 2014 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP), 2014: 4661-4665. [本文引用:1]
[109] HAN K, WANG Y, WANG D, et al. Learning spectral mapping for speech dereverberation and denoising[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2015, 23(6) : 982-992. [本文引用:1]
[110] WU B, LI K, YANG M, et al. A study on target feature activation and normalization and their impacts on the performance of DNN based speech dereverberation systems[C]. 2016 Asia-Pacific Signal and Information Processing AssociationAnnual Summit and Conference (APSIPA), 2016. [本文引用:1]
[111] WU B, LI K, YANG M, Et al. A reverberation-time-aware approach to speech dereverberation based on deep neural networks[J]. IEEE/ACM transactions on audio, speech, and language processing, 2017, 25(1): 102-111. [本文引用:1]
[112] BULLING P, LINHARD K, WOLF A, et al. Stepsize control for acoustic feedback cancellation based on the detection of reverberant signal periods and the estimated system distance[C]// ISCA. Proceedings of Inter speech 2017. Stockholm, Sweden: ISCA , 2017: 176-180. [本文引用:1]
[113] WU Y C, HWANG H, WANG S, et al. A post-filtering approach based on locally linear embedding difference compensation for speech enhancement[C]// ISCA. Proceedings of Inter speech 2017. Stockholm, Sweden: ISCA , 2017: 1953-1957. [本文引用:1]
[114] OGAWA A, KINOSHITA K, DELCROIX M, et al. Improved example-based speech enhancement by using deep neural network acoustic model for noise robust example search[C]// ISCA. Proceedings of Inter speech 2017. Stockholm, Sweden: ISCA , 2017: 1963-1967. [本文引用:1]
[115] GELDERBLOM F B, GRONSTAD T, VIGGEN E. Subjective intelligibility of deep neural network-based speech enhancement[C]// ISCA. Proceedings of Inter speech 2017. Stockholm, Sweden: ISCA , 2017: 1968-1972. [本文引用:1]
[116] QIAN K, ZHANG Y, CHANG S, et al. Speech enhancement using bayesian wavenet[C]// ISCA. Proceedings of Inter speech 2017. Stockholm, Sweden: ISCA , 2017: 2013-2017. [本文引用:1]
[117] PASCUAL S, BONAFONTE A, SERRA J. SEGAN: Speech enhancement generative adversarial network[C]// ISCA. Proceedings of Inter speech 2017. Stockholm, Sweden: ISCA , 2017: 3642-3646. [本文引用:1]
[118] MAITI S, MANDEL M. Concatenative resynthesis using twin networks[C]// ISCA. Proceedings of Inter speech 2017. Stockholm, Sweden: ISCA , 2017: 3647-3651. [本文引用:1]