纪达麒 陈运文
摘要:自然语言处理在实际应用场景上经常面临的挑战是很难有大量的标注语料,缺少语料的情况下如何达到比较好的效果是这些场景要落地所必须解决的问题。本文主要是用NER 任务来说明在小标注数据量下进行处理的一些经验和方法。主要包括了机器学习、深度学习、迁移学习,使用词向量,引入业务专家规则等方法,并在租赁合同抽取的场景下,对比了几种方法的效果。
关键词:小样本,词向量,深度学习,专家规则
作者简介:纪达麒,男,1983年生,硕士,CTO,专注于自然语言处理,搜索引擎,推荐系统等人工智能技术研发和应用,Email:jidaqi@datagrand.com;陈运文,男,博士,CEO。
Experiences of Natural Language Processing under the Small Amount of Data(www.xing528.com)
Ji Daqi Chen Yunwen
(DataGrand Tech Inc.,Shanghai 201203,China)
Abstract:The challenge that natural language processing often faces in practical application scenarios is that it is difficult to have a large number of samples,and how to achieve better results in the absence of samples is a problem that must be solved in these scenarios.This article mainly uses the NER task to illustrate some of the experiences and methods of processing under the small amount of data.It mainly includes machine learning,deep learning,migration learning,using word embedding,introducing business expert rules,etc.,and comparing the effects of several methods in the context of lease contract extraction.
Keywords:Small Sample,Word Embedding,Deep Learning,Expert Rules
免责声明:以上内容源自网络,版权归原作者所有,如有侵犯您的原创版权请告知,我们将尽快删除相关内容。