publications
2024
GlossLM: A Massively Multilingual Corpus and Pretrained
Model for Interlinear Glossed Text
November 2024.
Proceedings of the 2024 Conference on Empirical Methods
in Natural Language Processing (EMNLP 2024)
Can we teach language models to gloss endangered
languages?
November 2024.
Findings of the Association for Computational
Linguistics: EMNLP 2024
PyFoma: a Python finite-state compiler module
August 2024.
Proceedings of the 62nd Annual Meeting of the
Association for Computational Linguistics (Volume 3:
System Demonstrations)
BELT: Building Endangered Language Technology
August 2024.
Proceedings of the Sixth Workshop on Teaching NLP @ ACL
2024
🏆 Best Paper
Resisting the Lure of the Skyline: Grounding Practices
in Active Learning for Morphological Inflection
August 2024.
Proceedings of the 62nd Annual Meeting of the
Association for Computational Linguistics (Volume 2:
Short Papers)
Decomposing Fusional Morphemes with Vector
Embeddings
June 2024.
Proceedings of the 21st SIGMORPHON workshop on
Computational Research in Phonetics, Phonology, and
Morphology @ NAACL 2024
On the Robustness of Neural Models for Full Sentence
Transformation
June 2024.
Proceedings of the 4th Workshop on Natural Language
Processing for Indigenous Languages of the Americas
(AmericasNLP 2024) @ NAACL 2024
2023
Robust Generalization Strategies for Morpheme Glossing
in an Endangered Language Documentation Context
December 2023.
Proceedings of the 1st GenBench Workshop on
(Benchmarking) Generalisation in NLP
Findings of the SIGMORPHON 2023 Shared Task on
Interlinear Glossing
July 2023.
Proceedings of the 20th SIGMORPHON workshop on
Computational Research in Phonetics, Phonology, and
Morphology
Ginn-Khamov at SemEval-2023 Task 6, Subtask B: Legal
Named Entities Extraction for Heterogenous Documents
July 2023.
Proceedings of the 17th International Workshop on
Semantic Evaluation (SemEval-2023)