You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to do relation extraction for a document and I have few questions regarding the annotation format to finetune the model.
Is multiple linking possible (1-N relations) and is accepted by the model?
vaccine X links to 1st date of vaccination
vaccine X links to 2nd date of vaccination
What does train/dev/test.txt generate inside the file? Because i preprocessed my data but there's a bunch of jargons to the generated .txt file so I would like to understand what really is the format to input to the model. I did my annotations accordingly to the label-studio guide provided by PaddleNLP but the contents from the training/validation data files are not clear. Here is a sample content from the train.txt file I got.
Once the model has been fine-tuned, does it also generate detection and recognition results from the document, or just the relation extraction results? Because i have fine-tuned weights from PaddleOCR for the detection and recognition. I was wondering if this would be of use with PaddleNLP.
If you could provide me clarifications with this regards, that would be very helpful! Thanks in advance.
The text was updated successfully, but these errors were encountered:
piarosebelledelapaz
changed the title
[Question]: Data annotation and pre processing
[Question]: Data annotation and pre processing for Relation Extraction
May 16, 2024
请提出你的问题
Hello,
I am trying to do relation extraction for a document and I have few questions regarding the annotation format to finetune the model.
What does train/dev/test.txt generate inside the file? Because i preprocessed my data but there's a bunch of jargons to the generated .txt file so I would like to understand what really is the format to input to the model. I did my annotations accordingly to the label-studio guide provided by PaddleNLP but the contents from the training/validation data files are not clear. Here is a sample content from the train.txt file I got.
Once the model has been fine-tuned, does it also generate detection and recognition results from the document, or just the relation extraction results? Because i have fine-tuned weights from PaddleOCR for the detection and recognition. I was wondering if this would be of use with PaddleNLP.
If you could provide me clarifications with this regards, that would be very helpful! Thanks in advance.
The text was updated successfully, but these errors were encountered: