Skip to content

We leverage 14 datasets as OOD test data and conduct evaluations on 8 NLU tasks over 21 popularly used models. Our findings confirm that the OOD accuracy in NLP tasks needs to be paid more attention to since the significant performance decay compared to ID accuracy has been found in all settings.

Notifications You must be signed in to change notification settings

YangLinyi/GLUE-X

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GLUE-X

We collect 14 publicly available datasets as OOD test data and conduct evaluations on 8 classic NLP tasks over popularly used models. Our findings confirm that the OOD accuracy in NLP tasks needs to be paid more attention to since the significant performance decay compared to ID accuracy has been found in all settings.

Fine-tune your language model

Please checkout these examples from Hugging Face Transformer, to fine-tune your custom models.

Out-of-Domain Tests (OOD)

The data for all OOD tests can be found here.

Main Contributer

Shuibai Zhang (Code work and Experiments Implementation); Linyi Yang (Guidance and Experiments Design); Wei Zhou (Website Implementation)

Citation

If you find this work is helpful for your research, please consider to cite the paper as follows.

@article{yang2022glue,
  title={GLUE-X: Evaluating Natural Language Understanding Models from an Out-of-distribution Generalization Perspective},
  author={Yang, Linyi and Zhang, Shuibai and Qin, Libo and Li, Yafu and Wang, Yidong and Liu, Hanmeng and Wang, Jindong and Xie, Xing and Zhang, Yue},
  journal={arXiv preprint arXiv:2211.08073},
  year={2022}
}

About

We leverage 14 datasets as OOD test data and conduct evaluations on 8 NLU tasks over 21 popularly used models. Our findings confirm that the OOD accuracy in NLP tasks needs to be paid more attention to since the significant performance decay compared to ID accuracy has been found in all settings.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages