List of Papers and Blog posts
Controlling conditional language models without catastrophic forgetting
Tomasz Korbak, Hady Elsahar, German Kruszewski, Marc Dymetman
International Conference on Machine Learning, ICML2022 [paper] [slides] [code]
In this work we target an the important question of how to adapt pre-trained generative models to meet human requirements without destroying their general capabilities ("catastrophic forgetting"). Recent work has proposed to solve this problem by representing task-specific requirements through energy-based models (EBMs) and approximating these EBMs using distributional policy gradients (DPG). Despite its effectiveness, this approach is however limited to unconditional distributions. In this paper, we extend DPG to conditional tasks by proposing Conditional DPG (CDPG). We evaluate CDPG on four different control objectives across three tasks (translation, summarization and code generation) and two pretrained models (T5 and GPT-Neo). Our results show that fine-tuning using CDPG robustly moves these pretrained models closer towards meeting control objectives and — in contrast with baseline approaches — does not result in catastrophic forgetting.
Controlling Conditional Language Models with Distributional Policy Gradients
CtrlGen workshop Neurips 2021 [paper]
Tomasz Korbak, Hady Elsahar, German Kruszewski, Marc Dymetman
Machine learning is shifting towards general-purpose pretrained generative models. However, due to their generic training methodology, these models often fail to meet some of the downstream requirements (e.g. hallucination in abstractive summarization or wrong format in automatic code generation). This raises an important question on how to adapt pre-trained generative models to a new task without destroying its capabilities. Recent work has suggested to solve this problem by representing task-specific requirements through energy-based models (EBMs) and approximating these EBMs using distributional policy gradients (DPG). In this paper, we extend this approach to conditional tasks by proposing Conditional DPG (CDPG). We evaluate CDPG on three different control objectives across two tasks: summarization with T5 and code generation with GPT-Neo.
Sampling from Discrete Energy-Based Models with Quality/Efficiency Trade-offs
CtrlGen workshop Neurips 2021 [paper]
Bryan Eikema, Germán Kruszewski, Hady Elsahar, Marc Dymetman
A new approximate sampling technique, Quasi Rejection Sampling (QRS), that allows for a trade-off between sampling efficiency and sampling quality, while providing explicit convergence bounds and diagnostics. QRS capitalizes on the availability of high-quality global proposal distributions obtained from deep learning models. We demonstrate the effectiveness of QRS sampling for discrete EBMs over text for the tasks of controlled text generation with distributional constraints and paraphrase generation. We show that we can sample from such EBMs with arbitrary precision at the cost of sampling efficiency.
Energy-Based Models for Code Generation under Compilability Constraints
NLP4prog at ACL2021. [Paper]
Tomasz Korbak, Hady Elsahar, Marc Dymetman, German Kruszewski
In this work, We define an Energy-Based Model (EBM) representing a pre-trained generative model with an imposed constraint of generating only compilable sequences of programming languages. Our proposed approach is able to improve compilability rates without sacrificing the diversity and complexity of the generated samples.
A Distributional Approach To Controlled Text Generation
ICLR2021 ( Oral presentation - top 2.1% )
Muhammad Khalifa* Hady Elsahar* Marc Dymetman*
* first author equal contribution
[Paper] [code] [Blogpost] [Twitter Thread]
We propose a novel approach to Controlled Text Generation, relying on Constraints over Distributions, Information Geometry, and Sampling from Energy-Based Models.
Participatory Research for Low-resourced Machine Translation: A Case Study in African Languages
(∀ ∗ et al.) EMNLP 2020 Findings
[Paper] [Code] [summary]
Research in NLP lacks geographic diversity, and the question of how NLP can be scaled to low-resourced languages has not yet been adequately solved. ‘Low-resourced’-ness is a complex problem going beyond data availability and reflects systemic problems in society. In this paper, we focus on the task of Machine Translation (MT), that plays a crucial role for information accessibility and communication worldwide. Despite immense improvements in MT over the past decade, MT is centered around a few high-resourced languages. As MT researchers cannot solve the problem of low-resourcedness alone, we propose participatory research as a means to involve all necessary agents required in the MT development process. We demonstrate the feasibility and scalability of participatory research with a case study on MT for African languages.
Its implementation leads to a collection of novel translation datasets, MT benchmarks for over 30 languages, with human evaluations for a third of them, and enables participants without formal training to make a unique scientific contribution. Benchmarks, models, data, code, and evaluation results are released at https://github.com/masakhane-io/masakhane-mt.