(1) 项目名称(中、英文) [此处实际上是子课题名称,下一项“项目简述”与此同] 汉语语法信息词典(高频词) The Grammatical Knowledge-base of Contemporary Chinese (High Frequency Words) (2)项目简述(中、英文) 选取了在1998年全年2600万字《人民日报》基本标注语料库中出现频率高的词语(约2.8万,其中包括少量低频的兼类词、成语以及属于封闭词类的词),制作了《汉语语法信息词典(高频词)》。其成果又分3部分: ① 从《现代汉语语法信息词典》中复制了这些词语的全部信息,实际上得到《现代汉语语法信息词典》的一个高频词语的子集; ② 这些词语的频度信息(两套数据); ③ 这些词语的例句。 The Grammatical Knowledge-base of Contemporary Chinese (High Frequency Words) has been developed by selecting about 28,000 high frequency words that occurred in the People’s Daily Basically Tagged Corpus of 1998 year (about 26 million Chinese characters), including few multi-POS words of relatively low frequency, idioms and words in the closed sets on some POS. The achievement can be divided into 3 parts: (1) The complete grammatical attribute information of the high frequency words is duplicated from The Grammatical Knowledge-base of Contemporary Chinese (abbrev. GKB-CC). In fact, a subclass of GKB-CC was formed, covering the high frequency words. (2) The frequency information of these words(including 2 suites of data). (3) The sample sentences in which these words are used. 该研究基于北京大学计算语言学研究所已有的两个基础资源:《现代汉语语法信息词典》和1998年全年《人民日报》基本标注语料库 The research is based on two fundamental language resources developed by Institute of Computational Linguistics, Peking University (ICL/PKU): (1) The Grammatical Knowledge-base of Contemporary Chinese (with a total of about 73,000 words) (2) People’s Daily Basically Tagged Corpus of 1998 year 例句选取原则: (1)正确性 (2)典型性 (3)可理解性 (4)长效性 The principles for sample sentence selection: (1) Correctness (2) Typicality (3) Comprehensibility (4) Long-term Validity (3)单位名称(中、英文) 北京大学计算语言学研究所 Institute of Computational Linguistics, Peking University (abbrev. ICL/PKU ) (4)开发时间 2003年1月至9月 (5)规模 包括《现代汉语语法信息词典》中的高频词28,000余条; 1998年“人民日报”基本标注语料库中的例句16万余条。 (6)定价 [待商定] (7)相关技术文档 973词典技术报告20030929.doc 973词典研制报告20030927.doc 语法词典(高频词)子课题结题报告20030926.doc 现代汉语语法信息词典规格说明书20030727.doc 北京大学现代汉语语料库基本加工规范20030727.doc (8)语料库样例 A. 样例集(共638个词语,与其相对应的3837个例句): 频度与例句20030618.zip [包括“样例文件说明20030618.doc”] B. 单个样例: 词语/词性 例句编号 例句(有切分标注信息,标注集的说明见技术文档) 搭乘/v 19980204-04-011-003=005/m 市/n 领导/n 下乡/v 、/w 外出/v 经常/d 搭乘/v 便车/n 或/c 公共/b 汽车/n 。/w 本文来源:https://www.wddqw.com/doc/71891612d3f34693daef5ef7ba0d4a7303766c4e.html