Skip to content

Source codes of our paper "BERT Based Topic-Specific Crawler" - ASYU 2021 conference.

Notifications You must be signed in to change notification settings

yahyatawil/BERT-Topic-Specific-Crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 

Repository files navigation

BERT Based Topic-Specific Crawler

This paper presents a multi-thread web crawler using a Sentence Bidirectional Encoder Representations from Transformers (S-BERT). The S-BERT is used to calculate the similarity between the predefined classes and the text of the downloaded web pages.

image

Authors

ASYU 2021 Conferance Paper Presentation

Available on Youtube.

Organization

This Repo is organized like the following:

  • Code: inside this folder there are downloader_V_0_10.ipynb which is the crawler code written and tested using Google Colab and also evaluator.ipynb which is a simple code to evaluate the results (inform of csv file) of crawler (calculate the true positives and false positives for each topic)

  • Report: A report in both PDF and Latex source.

  • Results: CSV files as an examples of an output after running the code for hours.

Instructions

To run the code, use downloader_V_0_10.ipynb and run it directly in Google colab. Make sure to select the right settings by changing the conf dictonary in the code.

Cite This

@INPROCEEDINGS{9599076,
  author={Tawil, Yahya and Alqaraleh, Saed},
  booktitle={2021 Innovations in Intelligent Systems and Applications Conference (ASYU)}, 
  title={BERT Based Topic-Specific Crawler}, 
  year={2021},
  volume={},
  number={},
  pages={1-5},
  doi={10.1109/ASYU52992.2021.9599076}}

About

Source codes of our paper "BERT Based Topic-Specific Crawler" - ASYU 2021 conference.

Topics

Resources

Stars

Watchers

Forks