Dr Carolina Scarton

BSc, MSc, PhD

School of Computer Science

Senior Lecturer in Natural Language Processing

Outreach, Open Days and Headstart Officer

Member of the Natural Language Processing research group

Caron Scarton profile photo
Profile picture of Caron Scarton profile photo
c.scarton@sheffield.ac.uk
+44 114 222 1892

Full contact details

Dr Carolina Scarton
School of Computer Science
Regent Court (DCS)
211 Portobello
91Ö±²¥
S1 4DP
Profile

Carolina Scarton is a Senior Lecturer in Natural Language Processing at the Department of Computer Science, University of 91Ö±²¥, UK. She is a member of the Natural Language Processing group and part of the .

Previously, she worked as an Academic Fellow (from September 2019 to November 2021) and as a Research Associate for the  (from March 2019 to August 2019) and SIMPATICO (from July 2016 to February 2019) European projects.

Qualifications

In 2017, she was awarded a PhD degree in Computer Science from the University of 91Ö±²¥, under the supervision of Professor Lucia Specia. Her PhD was funded by the  project (a Marie Curie ITN network).

She also has a MSc and a BSc degree from the University of São Paulo, Brazil (awarded in 2013).

Her MSc supervisor was Dr. Sandra Aluísio and she was a member of the . Since 2018, she is the .

Research interests

Dr Scarton's research area is Natural Language Processing (NLP). She is particularly interested in text adaptation, machine translation, online misinformation detection and verification, evaluation of NLP task outputs, NLP applied to healthcare and robotics, and dialog systems.

Publications

Books

  • Pinheiro V, Gamallo P, Amaro R, Scarton C, Batista F, Silva D, Magro C & Pinto H (2022) Preface. RIS download Bibtex download
  • Specia L, Scarton C & Paetzold GH (2018) . Morgan & Claypool Publishers LLC. RIS download Bibtex download
  • Specia L, Scarton C & Paetzold GH (2018) . RIS download Bibtex download

Journal articles

  • Berto MVV, Freitas BL, Scarton C, Machado-Neto JA & Almeida TA (2024) . Expert Systems with Applications, 250, 123566-123566. RIS download Bibtex download
  • He W, Idiart M, Scarton C & Villavicencio A (2024) Enhancing Idiomatic Representation in Multiple Languages via an Adaptive Contrastive Triplet Loss.. CoRR, abs/2406.15175. RIS download Bibtex download
  • Gow-Smith E, Phelps D, Madabushi HT, Scarton C & Villavicencio A (2024) Word Boundary Information Isn't Useful for Encoder Language Models.. CoRR, abs/2401.07923. RIS download Bibtex download
  • Leite JA, Razuvayevskaya O, Bontcheva K & Scarton C (2024) EUvsDisinfo: a Dataset for Multilingual Detection of Pro-Kremlin Disinformation in News Articles.. CoRR, abs/2406.12614. RIS download Bibtex download
  • Zhang Z, Goldsack T, Scarton C & Lin C (2024) ATLAS: Improving Lay Summarisation with Attribute-based Control.. CoRR, abs/2406.05625. RIS download Bibtex download
  • Vasilakes J, Zhao Z, Vykopal I, Gregor M, Hyben M & Scarton C (2024) ExU: AI Models for Examining Multilingual Disinformation Narratives and Understanding their Spread.. CoRR, abs/2406.15443. RIS download Bibtex download
  • Vincent ST, Prescott C, Bayliss C, Oakley C & Scarton C (2024) A Case Study on Contextual Machine Translation in a Professional Scenario of Subtitling.. CoRR, abs/2407.00108. RIS download Bibtex download
  • Srba I, Razuvayevskaya O, Leite JA, Móro R, Schlicht IB, Tonelli S, García FM, Lottmann SB, Teyssou D, Porcellini V , Scarton C et al (2024) A Survey on Automatic Credibility Assessment of Textual Credibility Signals in the Era of Large Language Models.. CoRR, abs/2410.21360. RIS download Bibtex download
  • Li Y, Zhao Z & Scarton C (2024) Label Set Optimization via Activation Distribution Kurtosis for Zero-shot Classification with Generative Models.. CoRR, abs/2410.19195. RIS download Bibtex download
  • Mu Y, Jin M, Grimshaw C, Scarton C, Bontcheva K & Song X (2023) VaxxHesitancy: A Dataset for Studying Hesitancy Towards COVID-19 Vaccination on Twitter.. CoRR, abs/2301.06660. RIS download Bibtex download
  • Mu Y, Wu BP, Thorne W, Robinson A, Aletras N, Scarton C, Bontcheva K & Song X (2023) Navigating Prompt Complexity for Zero-Shot Classification: A Study of Large Language Models in Computational Social Science.. CoRR, abs/2305.14310. RIS download Bibtex download
  • Singh I, Scarton C, Song X & Bontcheva K (2023) Finding Already Debunked Narratives via Multistage Retrieval: Enabling Cross-Lingual, Cross-Dataset and Zero-Shot Learning.. CoRR, abs/2308.05680. RIS download Bibtex download
  • Leite JA, Razuvayevskaya O, Bontcheva K & Scarton C (2023) Detecting Misinformation with LLM-Predicted Credibility Signals and Weak Supervision.. CoRR, abs/2309.07601. RIS download Bibtex download
  • Goldsack T, Zhang Z, Tang C, Scarton C & Lin C (2023) . Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. RIS download Bibtex download
  • Vincent ST, Flynn R & Scarton C (2023) MTCue: Learning Zero-Shot Control of Extra-Textual Attributes by Leveraging Unstructured Context in Neural Machine Translation.. CoRR, abs/2305.15904. RIS download Bibtex download
  • Razuvayevskaya O, Wu B, Leite JA, Heppell F, Srba I, Scarton C, Bontcheva K & Song X (2023) Comparison between parameter-efficient techniques and full fine-tuning: A case study on multilingual news article classification.. CoRR, abs/2308.07282. RIS download Bibtex download
  • Li Y & Scarton C (2023) Evaluating the Role of Target Arguments in Rumour Stance Classification.. CoRR, abs/2303.12665. RIS download Bibtex download
  • Vincent ST, Sumner R, Dowek A, Blundell C, Preston E, Bayliss C, Oakley C & Scarton C (2023) Personalised Language Modelling of Screen Characters Using Rich Metadata Annotations.. CoRR, abs/2303.16618. RIS download Bibtex download
  • Wu B, Li Y, Mu Y, Scarton C, Bontcheva K & Song X (2023) . Findings of the Association for Computational Linguistics: EMNLP 2023. RIS download Bibtex download
  • Alva-Manchego F, Scarton C & Specia L (2021) . Computational Linguistics, 1-29. RIS download Bibtex download
  • Mejova Y, Petrocchi M & Scarton C (2021) . Online Social Networks and Media, 23. RIS download Bibtex download
  • Alva-Manchego F, Scarton C & Specia L (2020) . Computational Linguistics, 135-187. RIS download Bibtex download
  • Scarton C (2020) Horacio Saggion, Automatic Text Simplification. Synthesis lectures on human language technologies, April 2017.. Nat. Lang. Eng., 26, 489-492. RIS download Bibtex download
  • Toledo CM, Cunha A, Scarton C & Aluísio S (2014) . Dementia & Neuropsychologia, 8(3), 227-235. RIS download Bibtex download
  • He W, Vieira TK, Garcia M, Scarton C, Idiart M & Villavicencio A () . Computational Linguistics, 1-48. RIS download Bibtex download
  • He W, Vieira TK, Gonzalez MG, Scarton C, Idiart M & Villavicencio A () Finding Idiomaticity in Word Representations. Computational Linguistics. RIS download Bibtex download
  • Singh I, Scarton C & Bontcheva K () . EPJ Data Science, 12(1). RIS download Bibtex download
  • Leal SE, Duran MS, Scarton CE, Hartmann NS & Aluísio SM () . Language Resources and Evaluation. RIS download Bibtex download
  • Mu Y, Jin M, Grimshaw C, Scarton C, Bontcheva K & Song X () . Proceedings of the International AAAI Conference on Web and Social Media, 17, 1052-1062. RIS download Bibtex download
  • A. Leite J, Scarton C & F. Silva D () . Proceedings of the Conference Recent Advances in Natural Language Processing - Large Language Models for Natural Language Processings. RIS download Bibtex download
  • Li Y, Scarton C, Song X & Bontcheva K () . Proceedings of the Conference Recent Advances in Natural Language Processing - Large Language Models for Natural Language Processings. RIS download Bibtex download
  • Scarton C & Specia L () . Discours(16). RIS download Bibtex download
  • Singh I, Scarton C & Bontcheva K () Multistage BiCross Encoder: Team GATE Entry for MLIA Multilingual Semantic Search Task 2. RIS download Bibtex download
  • Leite JA, Silva DF, Bontcheva K & Scarton C () Toxic Language Detection in Social Media for Brazilian Portuguese: New Dataset and Multilingual Analysis. RIS download Bibtex download
  • Scarton C, Silva DF & Bontcheva K () Measuring What Counts: The case of Rumour Stance Classification. RIS download Bibtex download
  • Singh I, Bontcheva K & Scarton C () The False COVID-19 Narratives That Keep Being Debunked: A Spatiotemporal Analysis. RIS download Bibtex download
  • Jiang Y, Song X, Scarton C, Aker A & Bontcheva K () Categorising Fine-to-Coarse Grained Misinformation: An Empirical Study of COVID-19 Infodemic. RIS download Bibtex download
  • Leal SE, Duran MS, Scarton CE, Hartmann NS & Aluísio SM () NILC-Metrix: assessing the complexity of written and spoken language in Brazilian Portuguese. RIS download Bibtex download

Book reviews

  • Scarton C (2019) . Natural Language Engineering. RIS download Bibtex download

Conference proceedings papers

  • Vincent S, Dowek A, Sumner R, Prescott C, Preston E, Bayliss C, Oakley C & Scarton C (2024) Reference-less Analysis of Context Specificity in Translation with Personalised Language Models. 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings (pp 13769-13784) RIS download Bibtex download
  • Mu Y, Wu BP, Thorne W, Robinson A, Aletras N, Scarton C, Bontcheva K & Song X (2024) Navigating Prompt Complexity for Zero-Shot Classification: A Study of Large Language Models in Computational Social Science.. LREC/COLING (pp 12074-12086) RIS download Bibtex download
  • Li Y & Scarton C (2024) Can We Identify Stance Without Target Arguments? A Study for Rumour Stance Classification. 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings (pp 2844-2851) RIS download Bibtex download
  • He W, Idiart M, Scarton C & Villavicencio A (2024) Enhancing Idiomatic Representation in Multiple Languages via an Adaptive Contrastive Triplet Loss.. ACL (Findings) (pp 12473-12485) RIS download Bibtex download
  • Goldsack T, Scarton C, Shardlow M & Lin C (2024) Overview of the BioLaySumm 2024 Shared Task on the Lay Summarization of Biomedical Research Articles. BioNLP 2024 - 23rd Meeting of the ACL Special Interest Group on Biomedical Natural Language Processing, Proceedings of the Workshop and Shared Tasks (pp 122-131) RIS download Bibtex download
  • Zhang Z, Goldsack T, Scarton C & Lin C (2024) ATLAS: Improving Lay Summarisation with Attribute-based Control.. ACL (Short Papers) (pp 337-345) RIS download Bibtex download
  • Goldsack T, Scarton C, Shardlow M & Lin C (2024) Overview of the BioLaySumm 2024 Shared Task on the Lay Summarization of Biomedical Research Articles.. BioNLP@ACL (pp 122-131) RIS download Bibtex download
  • Vasilakes J, Zhao Z, Gregor M, Vykopal I, Hyben M & Scarton C (2024) ExU: AI Models for Examining Multilingual Disinformation Narratives and Understanding their Spread.. EAMT (2) (pp 39-40) RIS download Bibtex download
  • (2024) Proceedings of the 25th Annual Conference of the European Association for Machine Translation (Volume 2), EAMT 2024, 91Ö±²¥, UK, June 24-27, 2024. EAMT (2) RIS download Bibtex download
  • Spillane B, Scarton C, Móro R, Ivanov P, Tagarev A, Simko J, Farha IA, Munnelly G, Uhlárik F & Heppell F (2024) Multilinguality in the VIGILANT project.. EAMT (2) (pp 41-42) RIS download Bibtex download
  • Vincent ST, Prescott C, Bayliss C, Oakley C & Scarton C (2024) A Case Study on Contextual Machine Translation in a Professional Scenario of Subtitling.. EAMT (1) (pp 561-572) RIS download Bibtex download
  • (2024) Proceedings of the 25th Annual Conference of the European Association for Machine Translation (Volume 1), EAMT 2024, 91Ö±²¥, UK, June 24-27, 2024. EAMT (1) RIS download Bibtex download
  • Leite JA, Razuvayevskaya O, Bontcheva K & Scarton C (2024) EUvsDisinfo: A Dataset for Multilingual Detection of Pro-Kremlin Disinformation in News Articles.. CIKM (pp 5380-5384) RIS download Bibtex download
  • Goldsack T, Zhang Z, Lin C & Scarton C (2023) (pp 361-376) RIS download Bibtex download
  • Wu B, Razuvayevskaya O, Heppell F, Leite JA, Scarton C, Bontcheva K & Song X (2023) . Proceedings of the The 17th International Workshop on Semantic Evaluation (SemEval-2023), July 2023 - July 2023. RIS download Bibtex download
  • (2023) Proceedings of the 24th Annual Conference of the European Association for Machine Translation, EAMT 2023, Tampere, Finland, 12-15 June 2023. EAMT RIS download Bibtex download
  • Vincent S, Flynn R & Scarton C (2023) . Findings of the Association for Computational Linguistics: ACL 2023, July 2023 - July 2023. RIS download Bibtex download
  • Goldsack T, Luo Z, Xie Q, Scarton C, Shardlow M, Ananiadou S & Lin C (2023) . The 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks, July 2023 - July 2023. RIS download Bibtex download
  • Goldsack T, Luo Z, Xie Q, Scarton C, Shardlow M, Ananiadou S & Lin C (2023) Overview of the BioLaySumm 2023 Shared Task on Lay Summarization of Biomedical Research Articles. Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp 468-477) RIS download Bibtex download
  • Heppell F, Bontcheva K & Scarton C (2023) . Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, December 2023 - December 2023. RIS download Bibtex download
  • Mu Y, Jiang Y, Heppell F, Singh I, Scarton C, Bontcheva K & Song X (2023) A Large-Scale Comparative Study of Accurate COVID-19 Information versus Misinformation.. CoRR, Vol. abs/2304.04811 RIS download Bibtex download
  • Leite JA, Scarton C & Silva DF (2023) Noisy Self-Training with Data Augmentations for Offensive and Hate Speech Detection Tasks.. RANLP (pp 631-640) RIS download Bibtex download
  • Li Y, Scarton C, Song X & Bontcheva K (2023) Classifying COVID-19 Vaccine Narratives.. RANLP (pp 648-657) RIS download Bibtex download
  • Goldsack T, Zhang Z, Tang C, Scarton C & Lin C (2023) Enhancing Biomedical Lay Summarisation with External Knowledge Graphs.. EMNLP (pp 8016-8032) RIS download Bibtex download
  • Wu B, Li Y, Mu Y, Scarton C, Bontcheva K & Song X (2023) Don't waste a single annotation: improving single-label classifiers through soft labels.. EMNLP (Findings) (pp 5347-5355) RIS download Bibtex download
  • Singh I, Bontcheva K, Song X & Scarton C (2022) (pp 128-143) RIS download Bibtex download
  • Phelps D, Fan X-R, Gow-Smith E, Madabushi HT, Scarton C & Villavicencio A (2022) Sample Efficient Approaches for Idiomaticity Detection RIS download Bibtex download
  • Madabushi HT, Gow-Smith E, Garcia M, Scarton C, Idiart M & Villavicencio A (2022) SemEval-2022 Task 2: Multilingual Idiomaticity Detection and Sentence Embedding RIS download Bibtex download
  • Vincent ST, Barrault L & Scarton C (2022) Controlling Extra-Textual Attributes about Dialogue Participants: A Case Study of English-to-Polish Neural Machine Translation. EAMT 2022 - Proceedings of the 23rd Annual Conference of the European Association for Machine Translation (pp 121-130) RIS download Bibtex download
  • Vincent S, Barrault L & Scarton C (2022) . Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022) (pp 341-350) RIS download Bibtex download
  • Tayyar Madabushi H, Gow-Smith E, Garcia M, Scarton C, Idiart M & Villavicencio A (2022) . Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022), July 2022 - July 2022. RIS download Bibtex download
  • Singh I, Li Y, Thong M & Scarton C (2022) . Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022), July 2022 - July 2022. RIS download Bibtex download
  • (2022) Proceedings of the 23rd Annual Conference of the European Association for Machine Translation, EAMT 2022, Ghent, Belgium, June 1-3, 2022. EAMT RIS download Bibtex download
  • Madabushi HT, Gow-Smith E, García M, Scarton C, Idiart M & Villavicencio A (2022) SemEval-2022 Task 2: Multilingual Idiomaticity Detection and Sentence Embedding.. SemEval@NAACL (pp 107-121) RIS download Bibtex download
  • Gow-Smith E, Tayyar Madabushi H, Scarton C & Villavicencio A (2022) . Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, December 2022 - December 2022. RIS download Bibtex download
  • Goldsack T, Zhang Z, Lin C & Scarton C (2022) . Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, December 2022 - December 2022. RIS download Bibtex download
  • Singh I, Bontcheva K, Song X & Scarton C (2022) Comparative Analysis of Engagement, Themes, and Causality of Ukraine-Related Debunks and Disinformation.. SocInfo, Vol. 13618 (pp 128-143) RIS download Bibtex download
  • (2022) Computational Processing of the Portuguese Language - 15th International Conference, PROPOR 2022, Fortaleza, Brazil, March 21-23, 2022, Proceedings. PROPOR, Vol. 13208 RIS download Bibtex download
  • Singh I, Li Y, Thong M & Scarton C (2022) GateNLP-UShef at SemEval-2022 Task 8: Entity-Enriched Siamese Transformer for Multilingual News Article Similarity.. SemEval@NAACL (pp 1121-1128) RIS download Bibtex download
  • Garcia M, Kramer Vieira T, Scarton C, Idiart M & Villavicencio A (2021) . Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), August 2021 - August 2021. RIS download Bibtex download
  • Tayyar Madabushi H, Gow-Smith E, Scarton C & Villavicencio A (2021) . Findings of the Association for Computational Linguistics: EMNLP 2021. Punta Cana, Dominican Republic, 7 November 2021 - 11 November 2021. RIS download Bibtex download
  • Madabushi HT, Gow-Smith E, Scarton C & Villavicencio A (2021) AStitchInLanguageModels: Dataset and Methods for the Exploration of Idiomaticity in Pre-Trained Language Models.. EMNLP (Findings) (pp 3464-3477) RIS download Bibtex download
  • Scarton C & Li Y (2021) Cross-lingual Rumour Stance Classification: a First Study with BERT and Machine Translation.. TTO (pp 50-59) RIS download Bibtex download
  • Santos RLS, Wick-Pedro G, Leal S, Vale OA, Pardo TAS, Bontcheva K & Scarton C (2020) Measuring the Impact of Readability Features in Fake News Detection.. LREC (pp 1404-1413) RIS download Bibtex download
  • Alva-Manchego FE, Martin L, Bordes A, Scarton C, Sagot B & Specia L (2020) ASSET: A Dataset for Tuning and Evaluation of Sentence Simplification Models with Multiple Rewriting Transformations.. ACL (pp 4668-4679) RIS download Bibtex download
  • Wick-Pedro G, Santos RLS, Vale OA, Pardo TAS, Bontcheva K & Scarton C (2020) (pp 313-320) RIS download Bibtex download
  • Scarton C, Madhyastha P & Specia L (2020) Deciding When, How and for Whom to Simplify.. ECAI, Vol. 325 (pp 2172-2179) RIS download Bibtex download
  • Leite JA, Silva DF, Bontcheva K & Scarton C (2020) Toxic Language Detection in Social Media for Brazilian Portuguese: New Dataset and Multilingual Analysis.. AACL/IJCNLP (pp 914-924) RIS download Bibtex download
  • Scarton C, Silva DF & Bontcheva K (2020) Measuring What Counts: The Case of Rumour Stance Classification.. AACL/IJCNLP (pp 925-932) RIS download Bibtex download
  • Alva-Manchego F, Martin L, Bordes A, Scarton C, Sagot B & Specia L (2020) . Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, July 2020 - July 2020. RIS download Bibtex download
  • Alva-Manchego F, Martin L, Scarton C & Specia L (2019) . Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations (pp 49-54). Hong Kong, China, 3 November 2019 - 7 November 2019. RIS download Bibtex download
  • Alva-Manchego FE, Scarton C & Specia L (2019) Cross-Sentence Transformations in Text Simplification.. WNLP@ACL (pp 181-184) RIS download Bibtex download
  • Alva-Manchego F, Martin L, Scarton C & Specia L (2019) EASSE: Easier Automatic Sentence Simplification Evaluation. 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF SYSTEM DEMONSTRATIONS (pp 49-54) RIS download Bibtex download
  • Scarton C, Forcada ML, Esplà-Gomis M & Specia L (2019) Estimating post-editing effort: a study on human judgements, task-based and reference-based metrics of MT quality.. IWSLT RIS download Bibtex download
  • Scarton C, Paetzold GH & Specia L (2019) Text simplification from professionally produced corpora. LREC 2018 - 11th International Conference on Language Resources and Evaluation (pp 3504-3510) RIS download Bibtex download
  • Scarton C, Henrique Paetzold G & Specia L (2019) Simpa: A sentence-level simplification corpus for the public administration domain. LREC 2018 - 11th International Conference on Language Resources and Evaluation (pp 4333-4338) RIS download Bibtex download
  • Forcada ML, Scarton C, Specia L, Haddow B & Birch A (2018) . Proceedings of the Third Conference on Machine Translation, Vol. 1 (pp 192-203), 31 October 2018 - 1 November 2018. RIS download Bibtex download
  • Forcada ML, Scarton C, Specia L, Haddow B & Birch A (2018) . Proceedings of the Third Conference on Machine Translation: Research Papers, October 2018 - October 2018. RIS download Bibtex download
  • Lala C, Madhyastha PS, Scarton C & Specia L (2018) . Proceedings of the Third Conference on Machine Translation: Shared Task Papers, October 2018 - October 2018. RIS download Bibtex download
  • Ive J, Scarton C, Blain F & Specia L (2018) . Proceedings of the Third Conference on Machine Translation: Shared Task Papers, October 2018 - October 2018. RIS download Bibtex download
  • Scarton C & Specia L (2018) . Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), July 2018 - July 2018. RIS download Bibtex download
  • Lala C, Madhyastha P, Scarton C & Specia L (2018) . WMT 2018 - 3rd Conference on Machine Translation, Proceedings of the Conference, Vol. 2 (pp 624-631) RIS download Bibtex download
  • Ive J, Scarton C, Blain F & Specia L (2018) . WMT 2018 - 3rd Conference on Machine Translation, Proceedings of the Conference, Vol. 2 (pp 794-800) RIS download Bibtex download
  • Blain F, Scarton C & Specia L (2017) . Proceedings of the Second Conference on Machine Translation, September 2017 - September 2017. RIS download Bibtex download
  • Alva-Manchego FE, Bingel J, Paetzold G, Scarton C & Specia L (2017) Learning How to Simplify From Explicit Labeling of Complex-Simplified Text Pairs.. IJCNLP(1) (pp 295-305) RIS download Bibtex download
  • Scarton C, Aprosio AP, Tonelli S, Martín-Wanton T & Specia L (2017) MUSST: A Multilingual Syntactic Simplification Tool.. IJCNLP (System Demonstrations) (pp 25-28) RIS download Bibtex download
  • Graham Y, Ma Q, Baldwin T, Liu Q, Parra C & Scarton C (2017) . Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, April 2017 - April 2017. RIS download Bibtex download
  • Bojar O, Chatterjee R, Federmann C, Graham Y, Haddow B, Huck M, Jimeno Yepes A, Koehn P, Logacheva V, Monz C , Negri M et al (2016) . Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers, August 2016 - August 2016. RIS download Bibtex download
  • Scarton C, Paetzold GH & Specia L (2016) Quality estimation for language output applications. COLING 2016 - 26th International Conference on Computational Linguistics, Proceedings of COLING 2016: Tutorial Abstracts (pp 14-17) RIS download Bibtex download
  • Scarton C & Specia L (2016) A reading comprehension corpus for machine translation evaluation. Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016 (pp 3652-3658) RIS download Bibtex download
  • Scarton C, Beck D, Shah K, Sim Smith K & Specia L (2016) . Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers, August 2016 - August 2016. RIS download Bibtex download
  • Tan L, Scarton C, Specia L & van Genabith J (2016) . Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), June 2016 - June 2016. RIS download Bibtex download
  • Aluísio S, Cunha A & Scarton C (2016) (pp 109-114) RIS download Bibtex download
  • Scarton C, Beck D, Shah K, Smith KS & Specia L (2016) Word embeddings and discourse information for Machine Translation Quality Estimation. Proceedings of the Annual Meeting of the Association for Computational Linguistics, Vol. 2 (pp 831-837) RIS download Bibtex download
  • Specia L, Paetzold G & Scarton C (2015) . Proceedings of ACL-IJCNLP 2015 System Demonstrations, July 2015 - July 2015. RIS download Bibtex download
  • Scarton C, Zampieri M, Vela M, van Genabith J & Specia L (2015) Searching for context: A study on document-level labels for translation quality estimation. EAMT 2015 - Proceedings of the 18th Annual Conference of the European Association for Machine Translation (pp 121-128) RIS download Bibtex download
  • Scarton C, Tan L & Specia L (2015) . Proceedings of the Tenth Workshop on Statistical Machine Translation, September 2015 - September 2015. RIS download Bibtex download
  • Tan L, Scarton C, Specia L & van Genabith J (2015) . Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), June 2015 - June 2015. RIS download Bibtex download
  • Scarton C (2015) . Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop, June 2015 - June 2015. RIS download Bibtex download
  • Scarton C, Tan L & Specia L (2015) Ushef and usaar-ushef participation in the wmt15 quality estimation shared task. 10th Workshop on Statistical Machine Translation, WMT 2015 at the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015 - Proceedings (pp 336-341) RIS download Bibtex download
  • Bojar O, Chatterjee R, Federmann C, Haddow B, Hokamp C, Huck M, Logacheva V, Pecina P, Koehn P, Monz C , Negri M et al (2015) Preface. 10th Workshop on Statistical Machine Translation, WMT 2015 at the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015 - Proceedings (pp III) RIS download Bibtex download
  • Bojar O, Chatterjee R, Federmann C, Haddow B, Huck M, Hokamp C, Koehn P, Logacheva V, Monz C, Negri M , Post M et al (2015) . Proceedings of the Tenth Workshop on Statistical Machine Translation, September 2015 - September 2015. RIS download Bibtex download
  • Scarton C, Sanches Duran M & Aluísio SM (2014) (pp 149-160) RIS download Bibtex download
  • Scarton C & Specia L (2014) . Proceedings of the Ninth Workshop on Statistical Machine Translation, June 2014 - June 2014. RIS download Bibtex download
  • Scarton C, Sun L, Kipper-Schuler K, Duran MS, Palmer M & Korhonen A (2014) (pp 25-39) RIS download Bibtex download
  • Scarton C & Specia L (2014) Document-level translation quality estimation: Exploring discourse and pseudo-references. Proceedings of the 17th Annual Conference of the European Association for Machine Translation, EAMT 2014 (pp 101-108) RIS download Bibtex download
  • Scarton C, Sun L, Kipper-Schuler K, Duran MS, Palmer M & Korhonen A (2014) . Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol. 8403 LNCS(PART 1) (pp 25-39) RIS download Bibtex download
  • (2014) RIS download Bibtex download
  • Duran MS, Scarton CE, Aluísio SM & Ramisch C (2013) Identifying Pronominal Verbs: Towards Automatic Disambiguation of the Clitic se in Portuguese. Proceedings of the 9th Workshop on Multiword Expressions, MWE 2013 - in conjunction with the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2013 (pp 93-100) RIS download Bibtex download
  • Scarton C, Gasperin C & Aluisio S (2010) . Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol. 6433 LNAI (pp 306-315) RIS download Bibtex download
  • Scarton C, Oliveira MD, Junior AC, Gasperin C & Aluísio SM (2010) SIMPLIFICA: a tool for authoring simplified texts in Brazilian Portuguese guided by readability assessments.. NAACL (Demos) (pp 41-44) RIS download Bibtex download
  • Scarton CE, De Almeida DM & Aluísio SM (2009) . STIL 2009 - 2009 7th Brazilian Symposium in Information and Human Language Technology (pp 53-62) RIS download Bibtex download
  • Jiang Y, Song X, Scarton C, Singh I, Aker A & Bontcheva K () . Proceedings of the Conference Recent Advances in Natural Language Processing - Large Language Models for Natural Language Processings RIS download Bibtex download
  • Garcia M, Vieira TK, Scarton C, Idiart M & Villavicencio A () Probing for idiomaticity in vector space models. Proceedings of the 16th conference of the European Chapter of the Association for Computational Linguistics RIS download Bibtex download
  • Scarton C, Forcada ML, Esplà-Gomis M & Specia L () Estimating post-editing effort: a study on human judgements, task-based and reference-based metrics of MT quality RIS download Bibtex download
  • Villavicencio A, Garcia M, Idiart M, Kramer Vieira T & Scarton C () Assessing Idiomaticity Representations in Vector Models with a Noun Compound Dataset Labeled at Type and Token Levels. Proceedings of the Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2021) RIS download Bibtex download

Theses / Dissertations

  • Scarton CE . RIS download Bibtex download

Preprints

  • He W, Vieira TK, Garcia M, Scarton C, Idiart M & Villavicencio A (2024) Investigating Idiomaticity in Word Representations, arXiv. RIS download Bibtex download
  • Srba I, Razuvayevskaya O, Leite JA, Moro R, Schlicht IB, Tonelli S, García FM, Lottmann SB, Teyssou D, Porcellini V , Scarton C et al (2024) A Survey on Automatic Credibility Assessment of Textual Credibility Signals in the Era of Large Language Models, arXiv. RIS download Bibtex download
  • Li Y, Zhao Z & Scarton C (2024) Label Set Optimization via Activation Distribution Kurtosis for Zero-shot Classification with Generative Models, arXiv. RIS download Bibtex download
  • Goldsack T, Scarton C, Shardlow M & Lin C (2024) , arXiv. RIS download Bibtex download
  • Vincent S, Prescott C, Bayliss C, Oakley C & Scarton C (2024) A Case Study on Contextual Machine Translation in a Professional Scenario of Subtitling, arXiv. RIS download Bibtex download
  • He W, Idiart M, Scarton C & Villavicencio A (2024) , arXiv. RIS download Bibtex download
  • Leite JA, Razuvayevskaya O, Bontcheva K & Scarton C (2024) , arXiv. RIS download Bibtex download
  • Zhang Z, Goldsack T, Scarton C & Lin C (2024) , arXiv. RIS download Bibtex download
  • Vasilakes J, Zhao Z, Vykopal I, Gregor M, Hyben M & Scarton C (2024) , arXiv. RIS download Bibtex download
  • Gow-Smith E, Phelps D, Madabushi HT, Scarton C & Villavicencio A (2024) Word Boundary Information Isn't Useful for Encoder Language Models, arXiv. RIS download Bibtex download
  • Wu B, Li Y, Mu Y, Scarton C, Bontcheva K & Song X (2023) , arXiv. RIS download Bibtex download
  • Goldsack T, Zhang Z, Tang C, Scarton C & Lin C (2023) , arXiv. RIS download Bibtex download
  • Heppell F, Bontcheva K & Scarton C (2023) , arXiv. RIS download Bibtex download
  • Goldsack T, Luo Z, Xie Q, Scarton C, Shardlow M, Ananiadou S & Lin C (2023) , arXiv. RIS download Bibtex download
  • Leite JA, Razuvayevskaya O, Bontcheva K & Scarton C (2023) , arXiv. RIS download Bibtex download
  • Razuvayevskaya O, Wu B, Leite JA, Heppell F, Srba I, Scarton C, Bontcheva K & Song X (2023) , arXiv. RIS download Bibtex download
  • Singh I, Scarton C, Song X & Bontcheva K (2023) , arXiv. RIS download Bibtex download
  • Leite JA, Scarton C & Silva DF (2023) , arXiv. RIS download Bibtex download
  • Vincent S, Flynn R & Scarton C (2023) , arXiv. RIS download Bibtex download
  • Mu Y, Wu BP, Thorne W, Robinson A, Aletras N, Scarton C, Bontcheva K & Song X (2023) , arXiv. RIS download Bibtex download
  • Mu Y, Jiang Y, Heppell F, Singh I, Scarton C, Bontcheva K & Song X (2023) , arXiv. RIS download Bibtex download
  • Vincent S, Sumner R, Dowek A, Blundell C, Preston E, Bayliss C, Oakley C & Scarton C (2023) , arXiv. RIS download Bibtex download
  • Li Y & Scarton C (2023) , arXiv. RIS download Bibtex download
  • Wu B, Razuvayevskaya O, Heppell F, Leite JA, Scarton C, Bontcheva K & Song X (2023) , arXiv. RIS download Bibtex download
  • Mu Y, Jin M, Grimshaw C, Scarton C, Bontcheva K & Song X (2023) , arXiv. RIS download Bibtex download
  • Singh I, Bontcheva K, Song X & Scarton C (2022) , arXiv. RIS download Bibtex download
  • Goldsack T, Zhang Z, Lin C & Scarton C (2022) Making Science Simple: Corpora for the Lay Summarisation of Scientific Literature. RIS download Bibtex download
  • Li Y, Scarton C, Song X & Bontcheva K (2022) Classifying COVID-19 vaccine narratives, arXiv. RIS download Bibtex download
  • Singh I, Li Y, Thong M & Scarton C (2022) GateNLP-UShef at SemEval-2022 Task 8: Entity-Enriched Siamese Transformer for Multilingual News Article Similarity, arXiv. RIS download Bibtex download
  • Vincent ST, Barrault L & Scarton C (2022) Controlling Formality in Low-Resource NMT with Domain Adaptation and Re-Ranking: SLT-CDT-UoS at IWSLT2022, arXiv. RIS download Bibtex download
  • Vincent ST, Barrault L & Scarton C (2022) Controlling Extra-Textual Attributes about Dialogue Participants: A Case Study of English-to-Polish Neural Machine Translation, arXiv. RIS download Bibtex download
  • Gow-Smith E, Madabushi HT, Scarton C & Villavicencio A (2022) Improving Tokenisation by Alternative Treatment of Spaces. RIS download Bibtex download
  • Madabushi HT, Gow-Smith E, Scarton C & Villavicencio A (2021) AStitchInLanguageModels: Dataset and Methods for the Exploration of Idiomaticity in Pre-Trained Language Models, arXiv. RIS download Bibtex download
  • Singh I, Bontcheva K & Scarton C (2021) The False COVID-19 Narratives That Keep Being Debunked: A Spatiotemporal Analysis, arXiv. RIS download Bibtex download
  • Jiang Y, Song X, Scarton C, Aker A & Bontcheva K (2021) Categorising Fine-to-Coarse Grained Misinformation: An Empirical Study of COVID-19 Infodemic, arXiv. RIS download Bibtex download
  • Singh I, Scarton C & Bontcheva K (2021) Multistage BiCross encoder for multilingual access to COVID-19 health information, arXiv. RIS download Bibtex download
  • Leite JA, Silva DF, Bontcheva K & Scarton C (2020) Toxic Language Detection in Social Media for Brazilian Portuguese: New Dataset and Multilingual Analysis, arXiv. RIS download Bibtex download
  • Scarton C, Silva DF & Bontcheva K (2020) Measuring What Counts: The case of Rumour Stance Classification, arXiv. RIS download Bibtex download
  • Alva-Manchego F, Martin L, Bordes A, Scarton C, Sagot B & Specia L (2020) ASSET: A Dataset for Tuning and Evaluation of Sentence Simplification Models with Multiple Rewriting Transformations, arXiv. RIS download Bibtex download
  • Scarton C, Forcada ML, Esplà-Gomis M & Specia L (2019) Estimating post-editing effort: a study on human judgements, task-based and reference-based metrics of MT quality, arXiv. RIS download Bibtex download
  • Alva-Manchego F, Martin L, Scarton C & Specia L (2019) EASSE: Easier Automatic Sentence Simplification Evaluation, arXiv. RIS download Bibtex download
  • Forcada ML, Scarton C, Specia L, Haddow B & Birch A (2018) Exploring Gap Filling as a Cheaper Alternative to Reading Comprehension Questionnaires when Evaluating Machine Translation for Gisting, arXiv. RIS download Bibtex download
  • Jiang Y, Song X, Scarton C, Singh I, Aker A & Bontcheva K () , Research Square Platform LLC. RIS download Bibtex download
Grants

Current Grants

  • ExU: AI Models for Examining Multilingual Disinformation Narratives and Understanding their Spread, European Media and Information Fund, 11/2023 - 04/2025, €399,926, as PI

  • VIGILANT: , Horizon Europe, 11/2022 - 10/2025, £476,955, as PI

  • vera.ai: , Horizon Europe, 09/2022 - 08/2025, £776,703, as Co-PI

  • , EPSRC, 12/2020 - 11/2024, £446,163, as Co-PI

Previous Grants

  • Modelling the link between working memory and language deficits in schizophrenia, Royal Society, 12/2020 - 11/2022, £74,000, as Co-PI