This repo will be used to gather and possibly also visualize data for a research paper. The paper looks into variations in user engagement between threads and singular tweets. Its aim is to highlight possible benefits of using Twitter threads for science communication.
- make sure Python and Git is installed
- install needed packages
pip install pandas
pip install pysimplegui
pip install git+https://github.com/JustAnotherArchivist/snscrape.git
- run tweet-scraper.py
python tweet-scraper.py
- add Tweet / Thread / Both mechanic
- improve UX and add status messages
- expose output tweet properties in GUI
- add cancel scrape button
- compile
- implement visualization / plots
- do something about second caveat
- change GUI theme on mac
- Numbers don't always add up. E.g. Tweet Count = 100 -> 25 threads, 15 replies 55 tweets. This usually happens if not all parts of a thread are included in a given corpus size.
- If a thread's initial tweet is a reply, this initial data point is categorized as a reply, and therefore not included in the threads output.
- The scraper is in dire need of more testing, expect bugs and always check your output for errors (+ feel free to contact me)