關于我們
書單推薦
新書推薦
|
詞義消歧:算法與應用(英文影印版)
對于計算機來說,要理解人類語言就必須消除歧義,在計算語言學領域,詞義消歧(Word Sense Disambiguation,簡稱WSD)一直是研究者探索的內容本,《計算語言學與語言科技原文叢書·詞義消歧:算法與應用(英文影印版)》是近年來國際學術界關于詞義消歧研究成果的一部集成之作。幾乎覆蓋了詞義消歧研究各個題目,具有重要學術價值。
《詞義消歧:算法與應用(英文影印版)》是第一本全面探討詞義消歧的書,對于重要的算法、方式、指標、結果、哲學問題和應用,都有涉略,并有這個領域的權威學者對本領域的歷史及發(fā)展所做的較為全面的綜述。研究者可以從本書了解到本領域的成果和發(fā)展趨勢,開發(fā)人員可以從本書了解一些技術和方法。
導讀
Contributors Foreword Preface 1 Introduction Eneko Agirre and Philip Edmonds 1.1 Word Sense Disambiguation 1.2 A Brief History of WSD Research 1.3 What is a Word Sense? 1.4 Applications of WSD 1.5 Basic Approaches to WSD 1.6 State-of-the-Art Performance 1.7 Promising Directions 1.8 Overview of This Bok 1.9 Further Reading References 2 Word Senses Adam Kilgarriff 2.1 Introduction 2.2 Lexicographers 2.3 Philosophy 2.3.1 Meaning is Something You Do 2.3.2 The Fregean Tradition and Reification 2.3.3 Two Incompatible Semantics? 2.3.4 Implications for Word Senses 2.4 Lexicalization 2.5 Corpus Evidence 2.5.1 Lexicon Size 2.5.2 Quotations 2.6 Conclusion 2.7 Further Reading Acknowledgments References 3 Making Sense About Sense Nancy Ide and Yorick Wilks 3.1 Introduction 3.2 WSD and the Lexicographers 3.3 WSD and Sense Inventories 3.4 NLP Applications and WSD 3.5 What Level of Sense Distinctions Do We Need for NLP, If Any? 3.6 What Now for WSD? 3.7 Conclusion References 4 Evaluation of WSD Systems Martha Palmer, Hwee Tou Ng and Hoa Trang Dang 4.1 Introduction 4.1.1 Terminology 4.1.2 Overview 4.2 Background 4.2.1 WordNet and Semcor 4.2.2 The Line and Interest Corpora 4.2.3 The DSO Corpus 4.2.4 Open Mind Word Expert 4.3 Evaluation Using Pseudo-Words 4.4 Senseval Evaluation Exercises 4.4.1 Senseval-1 Evaluation and Scoring 4.4.2 Senseval-2 English All-Words Task English Lexical Sample Task 4.4.3 Comparison of Tagging Exercises 4.5 Sources of Inter-Annotator Disagreement 4.6 Granularity of Sense: Groupings for WordNet 4.6.1 Criteria for WordNet Sense Grouping 4.6.2 Analysis of Sense Grouping 4.7 Senseval-3 4.8 Discussion References 5 Knowledge-Based Methods for WSD Rada Mihalcea 5.1 Introduction 5.2 Lesk Algorithm 5.2.1 Variations of the Lesk Algorithm Simulated Annealing Simplified Lesk Algorithm Augmented Semantic Spaces Summary 5.3 Semantic Similarity 5.3.1 Measures of Semantic Similarity 5.3.2 Using Semantic Similarity Within a Local Context 5.3.3 Using Semantic Similarity Within a Global Context 5.4 Selectional Preferences 5.4.1 Preliminaries: Learning Word-to-Word Relations 5.4.2 Learning Selectional Preferences 5.4.3 Using Selectional Preferences 5.5 Heuristics for Word Sense Disambiguation 5.5.1 Most Frequent Sense 5.5.2 One Sense Per Discourse 5.5.3 One Sense Per Collocation 5.6 Knowledge-Based Methods at Senseval-2 5.7 Conclusions References 6 Unsupervised Corpus-Based Methods for WSD Ted Pedersen 6.1 Introduction 6.1.1 Scope 6.1.2 Motivation Distributional Methods Translational Equivalence 6.1.3 Approaches 6.2 Type-Based Discrimination 6.2.1 Representation of Context 6.2.2 Algorithms Latent Semantic Analysis (LSA) Hyperspace Analogue to Language (HAL) Clustering By Committee (CBC) 6.2.3 Discussion 6.3 Token-Based Discrimination 6.3.1 Representation of Context 6.3.2 Algorithms Context Group Discrimination McQuitty's Similarity Analysis 6.3.3 Discussion 6.4 Translational Equivalence 6.4.1 Representation of Context 6.4.2 Algorithms 6.4.3 Discussion 6.5 Conclusions and the Way Forward Acknowledgments References 7 Supervised Corpus-Based Methods for WSD 8 Knowledge Sources for WSD 9 Automatic Acquisition of Lexical Information and Examples 10 Domain-Specific WSD 11 WSD in NLP Applications
Ironically, the very "statistical semantics" that Weaver proposed might have applied in cases such as this: Yarowsky (2000) notes that the trigram in the pen is very strongly indicative of the enclosure sense, since one almost never refers to what is in a writing pen, except for ink.
WSD was resurrected in the 1970s within artificial intelligence (AI) research on full natural language understanding. In this spirit, Wilks (l975) developed "preference semantics", one of the first systems to explicitly account for WSD. The system used selectional restrictions and a frame-based lexical semantics to find a consistent set of word senses for the words in a sentence. The idea of individual "word experts" evolved over this time (Rieger and Small 1979). For example, in Hirst's (1987) system,'a word was gradually disambiguated as information was passed between the various modules (including a lexicon, parscr, and semantic interpreter) in a process he called "Polaroid Words". "Proper" knowledge representation was important in the AI paradigm. Knowledge sources had to be handcrafted, so the ensuing knowledge acquisition bottleneck inevitably led to limited lexical coverage ofnarrow domains and would not scale. The 1980s were a turning point for WSD. Large-scale lexical resources and corpora became available so handcrafting could be replaced with knowledge extracted automatically from the resources (Wilks et al. 1990). Lesk's (1986) short but extremely seminal paper used the overlap of word sense definitions in the Oxford Advanced Learner 's Dictionary of Current English (OALD) to resolve word senses. Given two (or more) target words in a sentence, the pair of senses whose definitions have the greatest lexical overlap are chosen (see Chap, 5 (Sect. 5.2)). Dictionary-based WSD had begun and the relationship of WSD to lexicography became explicit. For example, Guthrie.et al. (l991) used the subject codes (e.g., Economics, Engineering, etc.) in the Longman Dictionary of Contemporary English (LDOCE) (Procter 1978) on top ofLesk's method. Yarowsky (1992) combined the information in Rogets International Thesaurus with co occurrence data from large corpora in order to learn disambiguation rules for Roget's classes, which could then be applied to words in a manner reminiscent of Masterman (1957) (see Chap. 10 (Sect. 10.2.1)). Although dictionary methods are useful for some cases of word sense ambiguity (such as homographs), they are not robust since dictionaries lack complete coverage ofinformation on sense distinctions. The 1990s saw three major developments: WordNet became available, the statistical revolution in NLP swept through, and Senseval began. WordNet (Miller 1990) pushed research forward because it was both computationally accessible ancl luerarchically organized into word senses called synsets Today, English WordNet (together with wordnets for other languages) is the most-used general sense inventory in WSD research. Statistical and machine learning methods have been successfully applied to the sense classification problem. Today, methods that train on manually sense-tagged corpora (i.e., supervised learning methods) have become the mainstream approach to WSD, with the best results in all tasks of the Senseval competitions. Weaver had recognized the statistical nature of the problem as early as 1949 and early corpus-based work by Weiss (1973), Kelley and Stone (1975), and Black (1988) presaged the statistical revolution by demonstrating the potential of empinical methods to extract disambiguation clues from manually-tagged corpora. Brown et al, (1991) were the first to use corpus-based WSD in statistical MT. Before Senseval, it was extremely difficult to compare and evaluate different systems because ofdisparities in test words, annotators, sense inventories, and corpora. For instance, Gale et al. (1992:252) noted that "the literature on word sense disambiguation fails to offer a clear model that we might follow in order to quantify the performance of our disambiguation algorithms," and so they introduced lower bounds (choosing the most frequent sense) and upper bounds (the performance ofhuman annotators). However, these could not be used effectively until sufficiently large test corpora were generated. Senseval was first discussed in 1997 (Resnik and Yarowsky 1999; Kilgarriff and Palmer 2000) and now after hosting three evaluation exercises has grown into the primary forum for researchers to discuss and advance the field, Its main contribution was to establish a framework for WSD evaluation that includes standardized task descriptions and an evaluation methodology. It has also focused research, enabled scientific rigor, produced benchmarks, and generated substantial resources in many languages (e.g*, sense-annotated corpora), thus enabling research in languages other than English. Recently, at the Senseval-3 workshop (Mihalcea and Edmonds 2004) there was a general consensus (and a sense of unease) that the traditional explicit WSD task, so effective at driving research, had reached a plateau and was not likely to lead to fundamentally new research. This could indicate the need to look for new research directions in the field, some of which may already be emerging, for instance the use ofparallel bilingual corpora. Section 1.7 explores the emerging research, but let's first review the issue at the center ofit all: word senses. ……
你還可能感興趣
我要評論
|