Skip to content

Dynamic Transfer Learning for Low-Resource Neural Machine Translation

Notifications You must be signed in to change notification settings

surafelml/adapt-mnmt

Repository files navigation

Dynamic Transfer Learning for Low-Resource Neural Machine Translation

Updates

[July, 2020] Updated repo with scripts and notes on experimental settings


This repo implements the following papers and associated features based on OpenNMT-tf1.15:

Transfer Learning in Multilingual Neural Machine Translation

Adapting Multilingual Neural Machine Translation to Unseen Languages

Experimental Settings


Requirements

Data

Experiments utilize the Ted Talks data, for its low-resource nature (ranging from ~5k to ~200k parallel examples) for more than 50 languages paired with English, from Qi et al.

./scripts/get-data.sh

Preprocessing

Prepare data for src/s - tgt/s pair/s (if flag is specified, tgt-lang-id is appended on the src side):

./scripts/build-training-data.sh ['src1-en en-src1 src2-en en-src2'] [flag] [exp-id]

Preprocess (clean, detokenize, and subword segmentation with sentencepiece):

./scripts/preprocess.sh [exp-id] [subword-size]

Pre-Training Parent Model

Train a parent model, that exhibits a relatively high-resource data (e.g. Portuguese-English / Pt-En).

./train.sh [exp-id] [gpu-device]

Progressive Adaptation (ProgAdapt) to New Translation Directions


Steps for ProgAdapt of the parent model Pt-En to child low-resource pair Galician-English / Gl-En.

Data

./scripts/build-training-data.sh 'gl-en' [child-model_exp-id]

Data Preprocessing

./scripts/preprocess.sh [child-model_exp-id] [subword-size]

ProgAdapt Training

Training first customizes the parent model by taking in to consideration the child model (Gl-En) newly generated vocabulary:

./train-dynamic-tl.sh [parent-model_exp-id] [child-model_exp-id] [gpu-device]

Progressive Growth (ProgGrow) with New Translation Directions


ProgGrow differs from progAdapt by incorporating the Pt-En parent model translation direction, while learning the new low-resource pair Gl-En (child model) direction.

Data

./scripts/build-training-data.sh 'pt-en gl-en' flag [child-model_exp-id]

Data Preprocessing

./scripts/preprocess.sh [child-model_exp-id] [subword-size]

ProgGrow Training

./train-dynamic-tl.sh [parent-model_exp-id] [child-model_exp-id] [gpu-device]

More Options


At time of transfer-learning you can optionally:

  • Load specific components of the parent model. See load_weights in config_adapt.yml for more options:

['encoder', 'decoder', 'shared_embeddings', 'src_embs', 'tgt_embs', 'optim', 'projection'].

  • Freeze sub-networks (i.e. selectively optimize the encoder or decoder). See freeze in config_adapt.yml for options.

  • In addition to encoder and/or decoder only customization, you can pre-train a parent model with an encoder-decoder shared vocab and customize for the child model. See --shared_vocab and --new_shared_vocab options in ./train-dynamic-tl.sh.

Note: to replicate the experiments reported in our work, please see further details in the experimental section of each paper.

References


@article{lakew2018transfer,
title={Transfer learning in multilingual neural machine translation with dynamic vocabulary},
author={Lakew, Surafel M and Erofeeva, Aliia and Negri, Matteo and Federico, Marcello and Turchi, Marco},
journal={arXiv preprint arXiv:1811.01137},
year={2018}
}

@article{lakew2019adapting,
title={Adapting Multilingual Neural Machine Translation to Unseen Languages},
author={Lakew, Surafel M and Karakanta, Alina and Federico, Marcello and Negri, Matteo and Turchi, Marco},
journal={arXiv preprint arXiv:1910.13998},
year={2019}
}