Skip to content

Repo for paper "IDOL: Indicator-oriented Logic Pre-training for Logical Reasoning" accepted to the Findings of ACL 2023

License

Notifications You must be signed in to change notification settings

GeekDream-x/IDOL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

IDOL

  • 📚 Repo for our paper "IDOL: Indicator-oriented Logic Pre-training for Logical Reasoning" accepted to the Findings of ACL 2023. [Link]
  • 🏆 Ranked $1^{st}$ system on ReClor Leaderboard from 2022.12 to 2023.10

Pre-training Dataset (LGP)

Step 1

Download wikipedia data at WikiDumps. Then, extract texts with the help of WikiExtractor.

Step 2

Extract logic-related texts and give them LCP labels after tokenization with the help of the functions in scripts/LGP/utils.py. Here, we take RoBERTa for example, the IDOL pre-training dataset for RoBERTa is available at GoogleDrive.

IDOL Pre-training

  • During pre-training with IDOL, models learns via MLM and LCP simultaneously as follows:
  • About the training environment dependencies, please refer to ./idol_environment.yml. As for the library transformers, please use the one provided in ./transformers.

  • Steps

    1. cd /scripts/pretrain
    2. Put LGP in ./data
    3. Change the values of parameters to your prefered ones in logic_pretrain.sh
    4. sh logic_pretrain.sh
    
  • Examples of checkpoints further pre-trained with IDOL are available at:

    Model Link Model Link
    BERT Google Drive RoBERTa Google Drive
    ALBERT Google Drive DeBERTa Google Drive

Downstream Fine-tuning

Our implementation is based on the official framework provided by the ReClor team and we made some customization. ReClor, LogiQA, RACE are supported in our example in /scripts/finetune.

1. cd /scripts/finetune
2. Put the downstream task datasets in ./data
3. Change the values of parameters to your prefered ones # especially task_name
4. sh run_ft.sh

Citation

@inproceedings{xu-etal-2023-idol,
    title = "{IDOL}: Indicator-oriented Logic Pre-training for Logical Reasoning",
    author = "Xu, Zihang  and
      Yang, Ziqing  and
      Cui, Yiming  and
      Wang, Shijin",
    booktitle = "Findings of the Association for Computational Linguistics: ACL 2023",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.findings-acl.513",
    pages = "8099--8111",
    abstract = "In the field of machine reading comprehension (MRC), existing systems have surpassed the average performance of human beings in many tasks like SQuAD. However, there is still a long way to go when it comes to logical reasoning. Although some methods for it have been put forward, they either are designed in a quite complicated way or rely too much on external structures. In this paper, we proposed IDOL (InDicator-Oriented Logic Pre-training), an easy-to-understand but highly effective further pre-training task which logically strengthens the pre-trained models with the help of 6 types of logical indicators and a logically rich dataset LoGic Pre-training (LGP). IDOL achieves state-of-the-art performance on ReClor and LogiQA, the two most representative benchmarks in logical reasoning MRC, and is proven to be capable of generalizing to different pre-trained models and other types of MRC benchmarks like RACE and SQuAD 2.0 while keeping competitive general language understanding ability through testing on tasks in GLUE. Besides, at the beginning of the era of large language models, we take several of them like ChatGPT into comparison and find that IDOL still shows its advantage.",
}


About

Repo for paper "IDOL: Indicator-oriented Logic Pre-training for Logical Reasoning" accepted to the Findings of ACL 2023

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages