projects
SLALLM
Large language models are highly effective across widely spoken languages, but they struggle on languages which are underrepresented in their training corpora. In many cases, these languages do not have sufficient large-scale corpora, so alternate training approaches are necessary.
In the Second Language Acquisition for LLMs project (SLALLM), we aim to develop a flexible training framework for LLMs that utilizes insights from human language acquisition. Specifically, we apply language learning materials such as textbooks and online courses. We are designing an iterative framework where the model is trained on one lesson at a time, applying synthetic data generation, LLM interplay, and training techniques such as reinforcement learning from human feedback (RLHF) and
Finite-State Extraction
🚜 Coming soon...
Automated Language Documentation
🚜 Coming soon...
Building Endangered Language Technology
🚜 Coming soon...