Skip to content

Collected corpus for named entity recognition pre-training

License

Notifications You must be signed in to change notification settings

zliucr/NER-BERT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

NER-BERT

This repository contains the collected large-scale named entity recognition (NER) corpus for pre-training in the paper: NER-BERT: A Pre-trained Model for Low-Resource Entity Tagging.

The corpus (used for named entity recognition pre-training) can be downloaded from here. The description of each file is as follows:

  • annotated_ner_data_train.txt Data for the NER pre-training
  • augmented_ner_data_train.txt Data after conducting data balancing across entity categories (described in paper) on "annotated_ner_data_train.txt". We suggest to use this corpus version for the NER model pre-training.
  • annotated_ner_data_dev.txt Data for the evaluation of the NER pre-training

If you use any resources included in this repository for your work, please kindly cite the following paper:

@article{liu2021ner,
  title={NER-BERT: a pre-trained model for low-resource entity tagging},
  author={Liu, Zihan and Jiang, Feijun and Hu, Yuxiang and Shi, Chen and Fung, Pascale},
  journal={arXiv preprint arXiv:2112.00405},
  year={2021}
}

About

Collected corpus for named entity recognition pre-training

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published