Dr Mark Stevenson
School of Computer Science
Senior Lecturer
Member of the Natural Language Processing research group
+44 114 222 1921
Full contact details
School of Computer Science
Regent Court (DCS)
211 Portobello
91直播
S1 4DP
- Profile
-
Mark Stevenson is a Senior Lecturer in Computer Science. He is a member of the Natural Language Processing group which he joined in 1995. His PhD, on Word Sense Disambiguation, was published as a monograph.
He has been Principal Investigator of projects funded by a range of sources including the EU, EPSRC and Google. He was an EPSRC Advanced Research Fellow (2006-2011) and co-ordinator of the EU-funded project PATHS.
He has also worked in a range of commercial and academic organisations including Reuters Ltd (where he was involved in the production and dissemination of the widely used Reuters Corpus), Adastral Park (British Telecom鈥檚 research lab) and the Center for the Study of Language and Information, Stanford University.
- Research interests
-
Mark Stevenson鈥檚 research focusses on Natural Language Processing and Information Retrieval. Topics he has worked on include word sense disambiguation, Information Extraction, plagiarism/reuse detection, lexicon adaptation, cross-lingual information retrieval and exploratory search.
His research includes applications of these technologies to a range of areas including biomedical journal articles (interpretation of documents, extraction of information from them and data mining information from corpora), cultural heritage (automatic organisation of corpora, exploratory search interfaces) and software testing (generation of realistic test suites).
- Publications
-
Books
- Words and Intelligence I: Selected Papers by Yorick Wilks. Springer.
- Words and Intelligence II: Essays in Honour of Yorick Wilks. Springer.
- Word Sense Disambiguation: The Case for Combinations of Knowledge Sources. Stanford, CA.: CSLI Publications.
Journal articles
- . JAMIA Open, 7(4).
- Understanding Linearity of Cross-Lingual Word Embedding Mappings. Transactions on Machine Learning Research.
- . ACM Transactions on Asian and Low-Resource Language Information Processing, 21(2), 1-16.
- . Health Psychology Review.
- . International Journal of Educational Technology in Higher Education, 18.
- . Genomics and Informatics, 18(2).
- . Asian and Low-Resource Language Information Processing, 18(4).
- . Language Resources and Evaluation.
- . Artificial Intelligence in Medicine, 87, 9-19.
- . BMC Bioinformatics, (Suppl 7):249, 59-67.
- . Journal of the Association for Information Science and Technology, 68(1), 154-167.
- . Journal of the Association for Information Science and Technology, 67(7), 1624-1638.
- . Journal of Biomedical Semantics, 7.
- . IEEE/ACM Transactions on Computational Biology and Bioinformatics, 14(4), 796-804.
- . Journal of the American Medical Informatics Association, 22(5), 987-992.
- . Science of Computer Programming, 97(4), 405-425.
- . Journal of Documentation, 70, 970-996.
- . Information Retrieval, 17(4), 351-379.
- . Journal of Biomedical Informatics.
- Comparing Medline citations using modified N-grams. Journal of the American Medical Informatics Association.
- . Journal of Computing and Cultural Heritage, 5(4).
- Towards semantic literature based discovery. AAAI Fall Symposium - Technical Report, FS-12-05, 86-87.
- . Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 7224 LNCS, 207-218.
- . J Am Med Inform Assoc, 19(2), 235-240.
- . LANG RESOUR EVAL, 45(1), 5-24.
- Extracting relationswithin and across sentences. International Conference Recent Advances in Natural Language Processing, RANLP, 25-32.
- . International Conference on Information and Knowledge Management, Proceedings, 59-62.
- Resolving ambiguity in biomedical text to improve summarization. Information Processing and Management.
- . LANG RESOUR EVAL, 44(4), 295-313.
- . Bioinformatics, 26(22), 2889-2896.
- . J Biomed Inform, 43(6), 972-981.
- . J Biomed Inform, 43(5), 762-773.
- The effect of ambiguity on the automated acquisition of WSD examples. NAACL HLT 2010 - Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Proceedings of the Main Conference, 353-356.
- , 215-231.
- . Research on Language and Computation, 7(1), 13-39.
- Assessing the contribution of shallow and deep knowledge sources for word sense disambiguation. Language Resources and Evaluation, 1-19.
- . BMC Bioinformatics, 9 Suppl 11, S7.
- . LANG RESOUR EVAL, 40(2), 183-201.
- . Journal of Natural Language Engineering, 11(1), 125-128.
- . COMPUT SPEECH LANG, 18(3), 201-207.
- Unsupervised induction of IE domain knowledge using an ontology. AAAI Workshop - Technical Report, WS-04-01, 80-85.
- Parallel Text Processing: Alignment and Use of Translation Corpora by V茅ronis, J. (Ed.), (2000), 393pp. Machine Translation Review, 12, 75-76.
- The interaction of knowledge sources in word sense disambiguation. COMPUT LINGUIST, 27(3), 321-349.
- . Natural Language Engineering, 4(3), 135-144.
- The Grammar of Sense: Is word-sense tagging much more than part-of-speech tagging?. CoRR, cmp-lg/9607028.
- Computer-assisted screening in systematic evidence synthesis requires robust and well-evaluated stopping criteria. Systematic Reviews.
- . ACM Transactions on Information Systems.
- C3-IoC: A career guidance system for assessing student skills using machine learning and network visualisation. International Journal of Artificial Intelligence in Education.
- . Journal of the American Medical Informatics Association.
- Cross-Lingual Word Embedding Refinement by $ell_{1}$ Norm Optimisation.
Chapters
- Supporting Exploration and Use of Digital Cultural Heritage Materials: the PATHS Perspective In Ruthven I & Chowdhury GG (Ed.), Cultural Heritage Information Access and Management (pp. 197-220). Facet Publishing
- Word Sense Disambiguation In Mitkov R (Ed.), Oxford Handbook of Computational Linguistics Oxford University Press
- Natural Language Processing and Information Retrieval In Davies J, G枚ker A & Graham M (Ed.), Information Retrieval: Searching in the 21st Century (pp. 215-232-215-232). Wiley
- Sense Tagging In Ludeling A, Kyto M & McEnery T (Ed.), Handbook of Corpus Linguistics Mouton de Gruyter
- , Text, Speech and Language Technology (pp. 217-251). Springer Netherlands
- Words and Intelligence II Essays in Honor of Yorick Wilks Introduction, WORDS AND INTELLIGENCE II: ESSAYS IN HONOR OF YORICK WILKS (pp. XI-XIV).
- , WORD SENSE DISAMBIGUATION: ALGORITHMS AND APPLICATIONS (pp. 217-251).
- Knowledge Sources for Word Sense Disambiguation In Agirre E & Edmonds P (Ed.), Word Sense Disambiguation: Algorithms, Applications and Trends Kluwer
- Word Sense Disambiguation In Mitkov R (Ed.), Oxford Handbook of Computational Linguistics (pp. 249-265-249-265). Oxford University Press
- Combining Independent Knowledge Sources for Word Sense Disambiguation In Nicolov N & Mitkov R (Ed.), Recent Advances in Natural Language Processing (pp. 74-86-74-86). John Benjamins Publishers
- Large Vocabulary Word Sense Disambiguation In Ravin Y & Leacock C (Ed.), Polysemy: Theoretical and Computational Contributions (pp. 161-177-161-177). Oxford: Oxford University Press.
Conference proceedings papers
- Stopping Methods Based on Point Processes: Recent Developments. Proceedings of the 3rd Workshop on Augmented Intelligence for Technology-Assisted Reviews Systems (ALTARS 2024) co-located with the 46th European Conference on Information Retrieval (ECIR 2024)
- RLStop: A Reinforcement Learning Stopping Method for TAR. Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval
- Cross-Lingual Word Embedding Refinement by 鈩1 Norm Optimisation. Proceedings of the 2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics
- Identifying Automatically Generated Headlines using Transformers. Proceedings of the Fourth Workshop on NLP for Internet Freedom: Censorship, Disinformation, and Propaganda
- . ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing. Toronto, ON, Canada, 6 June 2021 - 11 June 2021.
- Highly Efficient Knowledge Graph Embedding Learning with Orthogonal Procrustes Analysis. NAACL-HLT 2021 - 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference (pp 2364-2375)
- Robustness and Reliability of Gender Bias Assessment in Word Embeddings: The Role of Base Pairs. Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing
- . Proceedings of the 29th ACM International Conference on Information & Knowledge Management
- ParaPat: The multi-million sentences parallel corpus of patents abstracts. LREC 2020 - 12th International Conference on Language Resources and Evaluation, Conference Proceedings (pp 3769-3774)
- Automatic Generation of Topic Labels.. SIGIR (pp 1965-1968)
- 91直播 at CheckThat! 2020: Claim Identification and Verification on Twitter. CEUR Workshop Proceedings, Vol. 2696
- . Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval
- Modelling stopping criteria for search results using poisson processes. EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference (pp 3484-3489)
- . CLEF 2019 Proceedings : Experimental IR Meets Multilinguality, Multimodality, and Interaction (pp 141-148). Lugarno, Switzerland, 9 September 2019 - 12 September 2019.
- Ranking studies for systematic reviews using query adaptation : University of 91直播's approach to CLEF eHealth 2019 task 2 working notes for CLEF 2019. Working Notes of CLEF 2019 - Conference and Labs of the Evaluation Forum, Vol. 2380. Lugano, Switzerland, 9 September 2019 - 12 September 2019.
- . The 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (pp 1257-1260), 21 July 2019 - 25 July 2019.
- Graph-KD: Exploring relational information for knowledge discovery. CEUR Workshop Proceedings, Vol. 2456 (pp 257-260)
- Retrieving and ranking studies for systematic reviews: University of 91直播's approach to CLEF eHealth 2018 Task 2. CEUR Workshop Proceedings, Vol. 2125
- Topic or Style? Exploring the Most Useful Features for Authorship Attribution.. COLING (pp 343-353)
- Ranking abstracts to identify relevant evidence for systematic reviews: 91直播's approach to CLEF eHealth 2017 Task 2: Working notes for CLEF 2017. CEUR Workshop Proceedings, Vol. 1866
- Using TF-IDF n-gram and word embedding cluster ensembles for author profiling: Notebook for PAN at CLEF 2017. CEUR Workshop Proceedings, Vol. 1866
- . HT 2017 - Proceedings of the 28th ACM Conference on Hypertext and Social Media (pp 45-54)
- . Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers (pp 267-273). Valencia, Spain, 3 April 2017 - 7 April 2017.
- (pp 669-675)
- . Value in Health, Vol. 19(7) (pp A367-A367)
- . Proceedings of the First Workshop on NLP and Computational Social Science, November 2016 - November 2016.
- ExploringWord embeddings and character N-Grams for author clustering. CEUR Workshop Proceedings, Vol. 1609 (pp 984-991)
- . Proceedings of the 24th ACM International on Conference on Information and Knowledge Management - CIKM '15, 18 October 2015 - 23 October 2015.
- . Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), July 2015 - July 2015.
- . Proceedings of the Fourth Joint Conference on Lexical and Computational Semantics, June 2015 - June 2015.
- A machine learning-based intrinsic method for cross-topic and cross-genre authorship verification notebook for PAN at CLEF 2015. CEUR Workshop Proceedings, Vol. 1391
- Topic models and n-gram language models for author profiling. CEUR Workshop Proceedings, Vol. 1391
- The short stories corpus. CEUR Workshop Proceedings, Vol. 1391
- Topic models and n-gram language models for author profiling. CEUR Workshop Proceedings, Vol. 1391
- The short stories corpus. CEUR Workshop Proceedings, Vol. 1391
- . 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 9 November 2015 - 12 November 2015.
- The Short Stories Corpus: Notebook for PAN at CLEF 2015.. CLEF (Working Notes), Vol. 1391
- Topic Models and n-gram Language Models for Author Profiling - Notebook for PAN at CLEF 2015.. CLEF (Working Notes), Vol. 1391
- A machine learning-based intrinsic method for cross-topic and cross-genre authorship verification notebook for PAN at CLEF 2015. CEUR Workshop Proceedings, Vol. 1391
- Making the most of limited training data using distant supervision. ACL-IJCNLP 2015 - BioNLP 2015: Workshop on Biomedical Natural Language Processing, Proceedings of the Workshop (pp 12-20)
- Held-out versus Gold Standard: Comparison of Evaluation Strategies for Distantly Supervised Relation Extraction from Medline abstracts. EMNLP 2015 - 6th International Workshop on Health Text Mining and Information Analysis, LOUHI 2015 - Proceedings of the Workshop (pp 97-102)
- Automatic Detection of Answers to Research Questions from Medline Abstracts. ACL-IJCNLP 2015 - BioNLP 2015: Workshop on Biomedical Natural Language Processing, Proceedings of the Workshop (pp 141-146)
- . Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL 2014), Vol. 2 (pp 631-636)
- Measuring the Similarity between Automatically Generated Topics. Proceedings of the 14th Conference of the European Chapter of $ the Association for Computational Linguistics (pp 22-27)
- PATHS in context: User characteristics and the construction of cultural heritage narratives. iConference Proceedings 2014
- Hashing and merging heuristics for text reuse detection: Notebook for PAN at CLEF-2014. CEUR Workshop Proceedings, Vol. 1180 (pp 939-946)
- Hashing and merging heuristics for text reuse detection: Notebook for PAN at CLEF-2014. CEUR Workshop Proceedings, Vol. 1180 (pp 939-946)
- Hashing and Merging Heuristics for Text Reuse Detection.. CLEF (Working Notes), Vol. 1180 (pp 939-946)
- . IEEE/ACM Joint Conference on Digital Libraries, 8 September 2014 - 12 September 2014.
- (pp 143-154)
- (pp 116-127)
- Applying UMLS for Distantly Supervised Relation Detection. Proceedings of the The Fifth International Workshop on Health Text Mining and Information Analysis (pp 80-84)
- PATHS: A System for Accessing Cultural Heritage Collections.. ACL (Conference System Demonstrations) (pp 151-156)
- Evaluating topic coherence using distributional semantics. Proceedings of the 10th International Conference on Computational Semantics, IWCS 2013 - Long Papers
- Distinguishing Common and Proper Nouns. *SEM 2013 - 2nd Joint Conference on Lexical and Computational Semantics, Vol. 1 (pp 80-84)
- UBC UOS-TYPED: Regression for Typed-similarity. *SEM 2013 - 2nd Joint Conference on Lexical and Computational Semantics, Vol. 1 (pp 132-137)
- Unsupervised domain tuning to improve word sense disambiguation. NAACL HLT 2013 - 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Main Conference (pp 680-684)
- Representing topics using images. NAACL HLT 2013 - 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Main Conference (pp 158-167)
- . SIGIR 2013 - Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp 1105-1106)
- . Proceedings - IEEE 6th International Conference on Software Testing, Verification and Validation, ICST 2013 (pp 352-361)
- Unsupervised Domain Tuning to Improve Word Sense Disambiguation. Proceedings of the 2nd Workshop on Computational Linguistics for Literature, CLfL 2013 at the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2013 (pp 680-684)
- UBC UOS-TYPED: Regression for Typed-similarity. SEM 2013 - 2nd Joint Conference on Lexical and Computational Semantics, Proceedings of the Main Conference and the Shared Task: Semantic Textual SimilaritySEM 2013 - 2nd Joint Conference on Lexical and Computational Semantics, Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity (pp 132-137)
- Representing Topics Using Images. Proceedings of the 2nd Workshop on Computational Linguistics for Literature, CLfL 2013 at the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2013 (pp 158-167)
- Distinguishing Common and Proper Nouns. SEM 2013 - 2nd Joint Conference on Lexical and Computational Semantics, Proceedings of the Main Conference and the Shared Task: Semantic Textual SimilaritySEM 2013 - 2nd Joint Conference on Lexical and Computational Semantics, Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity (pp 80-84)
- DALE: A Word Sense Disambiguation System for Biomedical Documents Trained using Automatically Labeled Examples. 2013 Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2013 - Demonstration Session (pp 1-4)
- Identification of Genia Events using Multiple Classifiers. Proceedings of the Annual Meeting of the Association for Computational Linguistics, Vol. 2013-October (pp 125-129)
- Generating Paths through Cultural Heritage Collections. Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp 1-10)
- PAN@FIRE: Overview of the Cross-Language !ndian Text Re-Use Detection Competition. Forum for Information Retrieval Evaluation (FIRE) Working Notes. Bombay, India
- PATHS - Exploring Digital Cultural Heritage Spaces. Theory and Practice of Digital Libraries 2012. Cyprus
- Evaluating the use of clustering for automatically organising digital library collections. Theory and Practice of Digital Libraries 2012. Cyprus
- Automated Discovery of Valid Test Strings using Dynamic Regular Expressions Collation and Tailored Web Searches. Proceedings of the 12th International Conference on Quality Software (QSIC 2012). Xi鈥檃n, China
- Scaling up WSD with Automatically Generated Examples. BioNLP: Proceedings of the 2012 Workshop on Biomedical Natural Language Processing (pp 231-239). Montr茅al, Canada
- Adapting Wikification to Cultural Heritage. Proceedings of the 6th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (pp 101-106). Avignon, France
- Computing Similarity between Cultural Heritage Items using Multimodal Features. Proceedings of the 6th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (pp 85-93). Avignon, France
- The 91直播 and Basque Country universities entry to CHiC: Using random walks and similarity to access cultural heritage. CEUR Workshop Proceedings, Vol. 1178
- User-centred design to support exploration and path creation in cultural heritage collections. CEUR Workshop Proceedings, Vol. 909 (pp 75-78)
- . Proceedings of the 2012 18th International Conference on Virtual Systems and Multimedia, VSMM 2012: Virtual Systems in the Information Society (pp 469-474)
- . Proceedings - International Conference on Quality Software (pp 79-88)
- . Proceedings - IEEE 5th International Conference on Software Testing, Verification and Validation, ICST 2012 (pp 141-150)
- Personalising access to cultural heritage collections using pathways. PATCH 2011 : 3rd International Workshop on Personalized Access To Cultural Heritage (pp 12-19-12-19)
- External Plagiarism Detection using Information Retrieval and Sequence Alignment - Notebook for PAN at CLEF 2011.. CLEF (Notebook Papers/Labs/Workshop), Vol. 1177
- The Effect of Ambiguity on the Automated Acquisition of WSD Examples. Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics (pp 353-356-353-356). Los Angeles, California
- Inter-sentential Relations in Information Extraction Corpora. Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC-2010). Valetta, Malta
- Improving Summarization of Biomedical Documents using Word Sense Disambiguation. Proceedings of the 2010 Workshop on Biomedical Natural Language Processing (pp 55-63-55-63). Uppsala, Sweden
- Aligning WordNet Synsets and Wikipedia Articles. Proceedings of the AAAI-2010 Workshop on Collaboratively-built Knowledge Sources and Artificial Intelligence. Atlanta, Georgia
- IIITH: Domain Specific Word Sense Disambiguation. Proceedings of the 5th International Workshop on Semantic Evaluation. Uppsala, Sweden
- University of 91直播: Lab Report for PAN at CLEF 2010. Proceedings of the 4th International Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse
- . 1st International Workshop on Software Test Output Validation, STOV 2010, in Conjunction with the 2010 International Conference on Software Testing and Analysis, ISSTA 2010 (pp 1-4)
- Disambiguation of Biomedical Abbreviations. Proceedings of the BioNLP 2009 Workshop (pp 71-79). Boulder, Colorado
- A Corpus of Biomedical Abbreviations. Proceedings of Corpus Linguistics 2009. Liverpool, UK
- Designing a Corpus of Plagiarised Academic Texts. Proceedings of Corpus Linguistics 2009. Liverpool, UK
- . Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing - BioNLP '08, 19 June 2008 - 19 June 2008.
- Knolwedge Sources for Word Sense Disambiguation of Biomedical Text. Proceedings of the workshop 鈥淏ioNLP 2008" held in conjunction with the 46th Annual Meeting of the Association for Computational Linguistics (pp 80-87-80-87). Columbus, OH.
- Acquiring Sense Tagged Examples using Relevance Feedback. Proceedings of the 22nd International Conference on Computational Linguistics (COLING-08). Manchester, UK
- A Semantic Approach to Paraphrase Identification. Proceedings of the 11th Annual Research Colloquium of the UK Special-interest group for Computational Lingusitics. Oxford, England
- . Proceedings of the 5th 2013 Forum on Information Retrieval Evaluation - FIRE '13, 4 December 2013 - 6 December 2013.
- A Semi-supervised Approach to Learning Relevant Protein-Protein Interaction Articles. Proceedings of BioCreative II workshop (pp 175-177-175-177). Madrid, Spain
- A Task-based Comparison of Information Extraction Pattern Models. Proceedings of the Workshop 鈥淒eep Linguistic Processing鈥 held in conjunction with the 45th Annual Meeting of the Association for Computational Linguistics (pp 81-88)
- Learning Expressive Models for Word Sense Disambiguation. 45th Annual Meeting of the Association of Computational Linguistics (pp 41-48)
- Improving Semi-supervised Acquisition of Relation Extraction Patterns. Proceedings of the Workshop 鈥淚nformation Extraction Beyond The Document鈥 held in conjunction with 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (pp 29-35-29-35). Sydney, Australia
- Comparing Information Extraction Pattern Models. Proceedings of the Workshop 鈥淚nformation Extraction Beyond The Document鈥 held in conjunction with 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (pp 12-19-12-19). Sydney, Australia
- Multilingual versus Monolingual WSD. Proceedings of the workshop "Making Sense of Sense" held in conjunction with the Eleventh Conference of the European Chapter of the Association for Computational Lingusitics (pp 33-40-33-40). Trento, Italy
- Translation Context Sensitive WSD. Proceedings of the European Association for Machine Transaltion 11th Annual Conference (EAMT-2006) (pp 227-232-227-232). Oslo, Norway
- The need for application-dependent WSD strategies: A case study in NIT. COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROCEEDINGS, Vol. 3960 (pp 233-237)
- 91直播's TREC 2006 Q&A Experiments.. TREC, Vol. 500-272
- Mining Rules for Word Sense Disambiguation. III TIL - Workshop em Tecnologia da Informacao e da Linguagem Humana, XXV Congresso da SBC. Sao Leopoldo, Brasil
- An Automatic Approach to Creating a Sense Tagged Corpus for Word Sense Disambiguation in Machine Translation. Second Workshop Organised by the MEANING project (MEANING-2005) (pp 31-36). Trento, Italy
- Automatically Acquiring a Linguistically Motivated Genic Interaction Extraction System. Proceedings of the workshop 鈥淟earning Language in Logic (LLL 05)鈥 held in conjunction the 22nd International Conference on Machine Learning (ICML 05). Bonn, Germany
- Exploiting Parallel Texts to Produce a Multilingual Sense Tagged Corpus for Word Sense Disambiguation. Recent Advances in Natural Language Processing (pp 525-531)
- A Semantic Approach to IE Pattern Induction. Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (pp 379-386-379-386). Ann Arbour, MI
- Learning Information Extraction Patterns Using WordNet. GWC 2006: THIRD INTERNATIONAL WORDNET CONFERENCE, PROCEEDINGS (pp 95-102)
- An Unsupervised WordNet-based Algorithm for Relation Extraction. Proceedings of the 鈥淏eyond Named Entity鈥 workshop at the Fourth International Conference on Language Resources and Evalutaion (LREC-04) (pp 37-42-37-42). Lisbon, Portugal
- EuroWordNet as a Resource for Cross-language Information Retrieval. Proceedings of the Fourth International Conference on Language Resources and Evaluation (pp 777-780). Lisbon, Portugal
- Information Extraction from Single and Multiple Sentences. Proceedings of the Twentieth International Conference on Computational Linguistics (COLING-04) (pp 875-881-875-881). Geneva, Switzerland
- Cross-language information retrieval using EuroWordNet and word sense disambiguation. ADVANCES IN INFORMATION RETRIEVAL, PROCEEDINGS, Vol. 2997 (pp 327-337)
- Requirements for Information Extraction for Knowledge Management. Knowledge Management and Semantic Annotation Workshop at Second International Semantic Web Conference (ISWC-2003) (pp 89-94-89-94). Sanibel, FL.
- Information Extraction as a Semantic Web Technology: Requirements and Promises. Proceedings of the 14th European Conference on Machine Learning (ECML 2003) workshop 鈥淎daptive Text Extraction and Mining鈥. Cavtat-Dubrovnik, Croatia
- Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-language Information Retrieval. GWC 2004: SECOND INTERNATIONAL WORDNET CONFERENCE, PROCEEDINGS (pp 97-105)
- Combining Disambiguation Techniques to Enrich an Ontology. Proceedings of the 15th European Conference on Artificial Intelligence (ECAI-02) workshop 鈥淢achine Learning and Natural Language Processing for Ontology Engineering鈥 (pp 43-50-43-50). Lyon, France
- The Reuters Corpus 鈥 from Yesterday鈥檚 News to Tomorrow鈥檚 Language Resources. Proceedings of the Third International Conference on Language Resources and Evaluation (LREC-02) (pp 827-832-827-832). Las Palmas, Canary Islands
- Augmenting Noun Taxonomies by Combining Lexical Similarity Metrics. Proceedings of the 19th International Conference on Computational Linguistics (COLING-02) (pp 953-959-953-959). Taipei, Taiwan
- Adding Thesaural Information to Noun Taxonomies (poster). Proceedings of the Second International Conference on Recent Advances in Natural Language Processing (RANLP-01) (pp 297-299-297-299). Tzigov Chark, Bulgaria
- Improving Named Entity Recognition using Annotated Corpora. Proceedings of the Second International Conference on Language Resources and Evaluation (LREC-2000) workshop 鈥淚nformation Extraction meets Corpus Linguistics鈥 (pp 26-32-26-32). Athens, Greece
- Using Corpus-derived Name Lists for Named Entity Recognition.. ANLP (pp 290-295)
- Experiments on Sentence Boundary Detection.. ANLP (pp 84-89)
- Baseline IE-NE Experiments using the SPRACH/LASIE System. Proceedings of the DARPA HUB-4 Workshop (pp 47-50-47-50). Herndon, Virginia
- Combining weak knowledge sources for sense disambiguation. IJCAI-99: PROCEEDINGS OF THE SIXTEENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOLS 1 & 2 (pp 884-889)
- A corpus-based approach to deriving lexical mappings. NINTH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS (pp 285-286)
- An Empirical Approach to Lexical Tuning. First International Conference on Language Resources and Evaluation (LREC-98) Workshop on Adapting Lexical and Corpus Resources to Sublanguages and Applications (pp 27-33-27-33). Granada, Spain
- Implementing a Sense Tagger within a General Architecture for Text Engineering. Proceedings of the New Methods in Language Processing Conference (NeMLaP-3) (pp 59-72-59-72). Sydney, Australia
- Word Sense Disambiguation using Optimised Combinations of Knowledge Sources. Proceedings of the 17th International Conference on Computational Linguistics and the 36th Annual Meeting of the Association for Computational Linguistics (COLING-ACL-98) (pp 1398-1402-1398-1402). Montreal, Canada
- Extracting Syntactic Relations using Heuristics. Proceedings of the European Summer School in Logic, Language and Information (ESSLLI-98) (pp 248-256-248-256). Saarbr眉cken, Germany
- Sense tagging and language engineering. ECAI 1998: 13TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, PROCEEDINGS (pp 185-189)
- Sense Tagging: Semantic Tagging with a Lexicon. Fifth Conference on Applied Natural Language Processing (ANLP-1997) Workshop 鈥淭agging Text with Lexical Semantics: Why, What and How?鈥 (pp 47-51-47-51). Washington, D.C.
- Combining Independent Knowledge Sources for Word Sense Disambiguation. Proceedings of Recent Advances in Natural Language Processing (RANLP-97) (pp 1-7-1-7). Tzigov Chark, Bulgaria
- Document Set Expansion with Positive-Unlabelled Learning Using Intractable Density Estimation. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation
- Combining counting processes and classification improves a stopping rule for technology assisted review. Findings of the Association for Computational Linguistics: EMNLP 2023. Singapore, 6 December 2023 - 6 December 2023.
- On the Vulnerabilities of Text-to-SQL Models. Proceedings of the 34th IEEE International Symposium on Software Reliability Engineering
- HiDE: A Tool for Unrestricted Literature Based Discovery. Proceedings of the 27th International Conference on Computational Linguistics (COLING 2018)
- Matching Cultural Heritage items to Wikipedia. Proceedings of the 8th International Conference on Language Resources and Evaluation. Istanbul, Turkey
- Mapping WordNet synsets to Wikipedia articles. Proceedings of the 8th International Conference on Language Resources and Evaluation. Istanbul, Turkey
- Detecting Text Reuse with Modified and Weighted N-grams. *SEM 2012: The First Joint Conference on Lexical and Computational Semantics 鈥 Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012) (pp 54-58). Montr茅al, Canada
- University_Of_91直播: Two Approaches to Semantic Text Similarity. *SEM 2012: The First Joint Conference on Lexical and Computational Semantics 鈥 Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012) (pp 655-661). Montr茅al, Canada
- Highly Efficient Knowledge Graph Embedding Learning with Orthogonal Procrustes Analysis
- . Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 鈥21), 11 July 2021 - 15 July 2021.
Reports
- On the Expressiveness of Information Extraction Patterns
- Evaluating the Single Sentence Assumption in Information Extraction
- Shallow Parsing using Heuristics
- Sense Tagging: Semantic Tagging with a Lexicon
Preprints
- , arXiv.
- Automatic Generation of Topic Labels, arXiv.
- Re-Ranking Words to Improve Interpretability of Automatically Generated Topics, arXiv.
- The Grammar of Sense: Is word-sense tagging much more than part-of-speech tagging?, arXiv.
- , arxiv.
- .
- Revisiting the linearity in cross-lingual embedding mappings: from a perspective of word analogies.
- Grants
-
Current Grants
- Distinguishing Common and Proper Nouns, Industrial, 03/2011 - 12/2022, 拢31,847 as PI
Previous Grants
- Automatically mapping and assessing inequalities in public health research, NIHR, 04/2021 - 12/2021, 拢48,764, as PI
- Institute of Coding, HEFCE, 11/2017 - 03/2021, 拢957,000, as Co-PI
- Digital Sensitivity Review, Industrial, 11/2018 - 03/2019, 拢39,880, as PI
- Data Analytics, Royal Academy of Engineering, 09/2017 - 09/2020, 拢30,000 as PI
- Recommendation Algorithm, Industrial, 04/2017 - 10/2017, 拢60,600 as PI
- HiDE: A Tool for Unrestricted Literature Based Discovery, Government, 01/2016 - 06/2016, 拢66,584 as PI
- InPuT: Individual Profiling using Text Analysis, Government, 09/2014 - 09/2015, 拢10,746 as PI
- Information Processing and Sensemaking: An Exploratory Search System for Document Collections, Government, 09/2014 - 08/2015, 拢77,840 as PI
- Connected Marketplace, Industrial, 01/2014 - 08/2014, 拢5,000 as PI
- PUMP: Developing a Data Set of Textual and Visual Topic Labels, EPSRC, 09/2013 - 10/2013, 拢1,540 as PI
- , EPSRC, 06/2012 - 05/2015, 拢293,127 as PI
- , EC FP7, 01/2011 - 12/2013, 拢709,407 as PI
- Professional activities and memberships
-
- Area chair for EACL 2017 track ``Document analysis including text categorisation, topic models, and retrieval鈥欌
Winner of best paper award at CLEF 2004 (with Roland Roller) - Keynote speaker at RANLP 2013
- Area chair for EMNLP 2013 track 鈥渟emantics鈥
- Assistant Director of Advanced Computing Research Centre
- Co-ordinator of EU-funded project (PATHS)
- Member of ACL SIGLEX board (2010-2013 and 2013-2016)
- EPSRC Advanced Research Fellow (2006-2011)
- Member of editorial board of Computational Linguistics (2008-2010)
- Member of the research group
- Area chair for EACL 2017 track ``Document analysis including text categorisation, topic models, and retrieval鈥欌