Skip to content

website scraper for text with conversion to markdown.md and directory structuring

Notifications You must be signed in to change notification settings

johnconnor-sec/scrapedown

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 

Repository files navigation

scrapedown

A website scraper tool for extracting text with conversion to markdown.md. Files are placed in a directory named after the directory they was found under. Creates a file structure that replicates the site's.


Use with caution

To install:

git clone https://github.com/johnconnor-sec/scrapedown

cd scrapedown

pip install poetry

poetry shell

poetry install

Run it with:

python3 main.py

The tool now includes links gathered from the site and a better output of the markdown text.

This is completely free to anyone who thinks its cool. If anything I think it could work for gathering data for LLMs, notetaking, or finding interesting endpoints.

Just clone it and after installing the dependencies run python3 main.py. Watch it work.

If you'd like to make this project better, please show me what you have made!

About

website scraper for text with conversion to markdown.md and directory structuring

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages