Skip to content

A dataset containing 1000+ walks-into-a-bar jokes scraped from the internet.

License

Notifications You must be signed in to change notification settings

s-bose/Walks-into-a-bar-dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Walks Into A Bar Dataset

kaggle

This dataset contains 1434 bar jokes webscraped from various sources on the internet.

The sources used are listed below.

Name URL
grammarbook https://www.grammarbook.com/blog/definitions/walks-into-a-bar/
thrillist https://www.thrillist.com/culture/best-walks-into-a-bar-jokes
jokojokes https://jokojokes.com/walks-into-a-bar-jokes.html
gamertelligence https://www.gamertelligence.com/walks-into-a-bar-jokes/

Files

  • The main dataset can be found in data/jokes.csv.
  • The primary notebook used for scrapping the aforementioned websites is notebooks/walks_into_bar_scrapper.ipynb.
  • notebooks/seleniumconfig.py is a helper module for obtaining a chrome WebDriver with predefined configurations.
  • Note - Running the scrapper notebook requires installing all the packages in requirements.txt. Additionally, a chromedriver executable suitable for your operating system needs to be present in the root directory.

Disclaimer

Please note that the data has been webscrapped with minimal editing of the original text. Therefore some jokes might be repeated, or might be NSFW. Certain websites had user-provided jokes, which as a result might not conform to the general structure of a walks-into-a-bar joke.

Feel free to contribute to this dataset if you can come across further sources for bar jokes.

Further Links

kaggle

About

A dataset containing 1000+ walks-into-a-bar jokes scraped from the internet.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published