编辑: 戴静菡 | 2019-07-16 |
s Legacy to Computational Linguistics and Beyond Roger Evans , Alexander Gelbukh? , Gregory Grefenstette? , Patrick Hanks? , Milo? Jakubíˇ cek? , Diana McCarthy? , Martha Palmer? , Ted Pedersen , Michael Rundell , Pavel Rychl?∞ , Serge Sharoff? , David Tugwell University of Brighton, R.
[email protected], ? CIC, Instituto Politécnico Nacional, Mexico, [email protected], ? Inria Saclay, [email protected], ? University of Wolverhampton, [email protected], ? Lexical Computing and Masaryk University, [email protected], ? DTAL University of Cambridge, [email protected], ? University of Colorado, [email protected], University of Minnesota, [email protected] Lexicography MasterClass, [email protected] ∞ Lexical Computing and Masaryk University, pary@?.muni.cz, ? University of Leeds, [email protected], Independent Researcher, [email protected] Abstract. This year, the CICLing conference is dedicated to the memory of Adam Kilgarriff who died last year. Adam leaves behind a tremendous scienti?c legacy and those working in computational linguistics, other ?elds of linguistics and lexicography are indebted to him. This paper is a summary review of some of Adam'
s main scienti?c contributions. It is not and cannot be exhaustive. It is writ- ten by only a small selection of his large network of collaborators. Nevertheless we hope this will provide a useful summary for readers wanting to know more about the origins of work, events and software that are so widely relied upon by scientists today, and undoubtedly will continue to be so in the foreseeable future.
1 Introduction Last year was marred by the loss of Adam Kilgarriff who during the last
27 years contributed greatly to the ?eld of computational linguistics1 , as well as to other ?elds of linguistics and to lexicography. This paper provides a review of some of the key scienti?c contributions he made. His legacy is impressive, not simply in terms of the numerous academic papers, which are widely cited in many ?elds, but also the many scienti?c events and communities he founded and fostered and the commercial Sketch Engine software. The Sketch Engine has provided computational linguistics tools and corpora to scientists in other ?elds, notably lexicography for example [61,50,17], as
1 In this paper, natural language processing (NLP) is used synonymously with computational linguistics. well as facilitating research in other areas of linguistics [56,12,11,54] and our own sub?eld of computational linguistics [60,74]. Adam was hugely interested in lexicography from the very inception of his post- graduate career. His DPhil2 on polysemy and subsequent interest in word sense disam- biguation (WSD) and its evaluation was ?rmly rooted in examining corpus data and dictionary senses with a keen eye on the lexicographic process [20]. After his DPhil, Adam spent several years as a computational linguist advising Longman Dictionaries on use of language engineering for the development of lexical databases, and he contin- ued this line of knowledge transfer in consultancies with other publishers until realizing the potential of computational linguistics with the development of his commercial soft- ware, the Sketch Engine. The origins of this software lay in his earlier ideas of using computational linguistics tools for providing word pro?les from corpus data. For Adam, data was key. He fully appreciated the need for empirical approaches to both computational linguistics and lexicography. In computational linguistics from the 90s onwards there was a huge swing from symbolic to statistical approaches, however the choice of input data, in composition and size, was often overlooked in favor of a focus on algorithms. Furthermore, early on in this statistical tsunami, issues of repli- cability were not always appreciated. A large portion of his work was devoted to these issues;