欧美精品视频在线看,国产av无码专区亚洲av麻豆丫

　　對于計算機來說，要理解人類語言就必須消除歧義，在計算語言學領域，詞義消歧（Word Sense Disambiguation，簡稱WSD）一直是研究者探索的內容本，《計算語言學與語言科技原文叢書·詞義消歧：算法與應用（英文影印版）》是近年來國際學術界關于詞義消歧研究成果的一部集成之作。幾乎覆蓋了詞義消歧研究各個題目，具有重要學術價值。

　　Ironically, the very "statistical semantics" that Weaver proposed might have applied in cases such as this: Yarowsky (2000) notes that the trigram in the pen is very strongly indicative of the enclosure sense, since one almost never refers to what is in a writing pen, except for ink.
　　WSD was resurrected in the 1970s within artificial intelligence (AI) research on full natural language understanding. In this spirit, Wilks (l975) developed "preference semantics", one of the first systems to explicitly account for WSD. The system used selectional restrictions and a frame-based lexical semantics to find a consistent set of word senses for the words in a sentence. The idea of individual "word experts" evolved over this time (Rieger and Small 1979). For example, in Hirst's (1987) system,'a word was gradually disambiguated as information was passed between the various modules (including a lexicon, parscr, and semantic interpreter) in a process he called "Polaroid Words". "Proper" knowledge representation was important in the AI paradigm. Knowledge sources had to be handcrafted, so the ensuing knowledge acquisition bottleneck inevitably led to limited lexical coverage ofnarrow domains and would not scale.
　　The 1980s were a turning point for WSD. Large-scale lexical resources and corpora became available so handcrafting could be replaced with knowledge extracted automatically from the resources (Wilks et al. 1990). Lesk's (1986) short but extremely seminal paper used the overlap of word sense definitions in the Oxford Advanced Learner 's Dictionary of Current English (OALD) to resolve word senses. Given two (or more) target words in a sentence, the pair of senses whose definitions have the greatest lexical overlap are chosen (see Chap, 5 (Sect. 5.2)). Dictionary-based WSD had begun and the relationship of WSD to lexicography became explicit. For example, Guthrie.et al. (l991) used the subject codes (e.g., Economics, Engineering, etc.) in the Longman Dictionary of Contemporary English (LDOCE) (Procter 1978) on top ofLesk's method. Yarowsky (1992) combined the information in Rogets International Thesaurus with co occurrence data from large corpora in order to learn disambiguation rules for Roget's classes, which could then be applied to words in a manner reminiscent of Masterman (1957) (see Chap. 10 (Sect. 10.2.1)). Although dictionary methods are useful for some cases of word sense ambiguity (such as homographs), they are not robust since dictionaries lack complete coverage ofinformation on sense distinctions.
　　The 1990s saw three major developments: WordNet became available, the statistical revolution in NLP swept through, and Senseval began.
　　WordNet (Miller 1990) pushed research forward because it was both computationally accessible ancl luerarchically organized into word senses called synsets Today, English WordNet (together with wordnets for other languages) is the most-used general sense inventory in WSD research.
　　Statistical and machine learning methods have been successfully applied to the sense classification problem. Today, methods that train on manually sense-tagged corpora (i.e., supervised learning methods) have become the mainstream approach to WSD, with the best results in all tasks of the Senseval competitions. Weaver had recognized the statistical nature of the problem as early as 1949 and early corpus-based work by Weiss (1973), Kelley and Stone (1975), and Black (1988) presaged the statistical revolution by demonstrating the potential of empinical methods to extract disambiguation clues from manually-tagged corpora. Brown et al, (1991) were the first to use corpus-based WSD in statistical MT.
　　Before Senseval, it was extremely difficult to compare and evaluate different systems because ofdisparities in test words, annotators, sense inventories, and corpora. For instance, Gale et al. (1992:252) noted that "the literature on word sense disambiguation fails to offer a clear model that we might follow in order to quantify the performance of our disambiguation algorithms," and so they introduced lower bounds (choosing the most frequent sense) and upper bounds (the performance ofhuman annotators).
　　However, these could not be used effectively until sufficiently large test corpora were generated. Senseval was first discussed in 1997 (Resnik and Yarowsky 1999; Kilgarriff and Palmer 2000) and now after hosting three evaluation exercises has grown into the primary forum for researchers to discuss and advance the field, Its main contribution was to establish a framework for WSD evaluation that includes standardized task descriptions and an evaluation methodology. It has also focused research, enabled scientific rigor, produced benchmarks, and generated substantial resources in many languages (e.g*, sense-annotated corpora), thus enabling research in languages other than English.
　　Recently, at the Senseval-3 workshop (Mihalcea and Edmonds 2004) there was a general consensus (and a sense of unease) that the traditional explicit WSD task, so effective at driving research, had reached a plateau and was not likely to lead to fundamentally new research. This could indicate the need to look for new research directions in the field, some of which may already be emerging, for instance the use ofparallel bilingual corpora. Section 1.7 explores the emerging research, but let's first review
　　the issue at the center ofit all: word senses.
　　……

你還可能感興趣

我要評論