[FEAT]: Website scraping depth #1190

shatfield4 · 2024-04-25T21:23:16Z

What would you like to see?

We should create a data connector that allows scraping websites with a configurable depth

Should scrape all <a> tags on the site and look for all links that match the original domain name then scrape X levels deep
Should all be done using the existing puppeteer instance that is already used for scraping links

The text was updated successfully, but these errors were encountered:

flefevre · 2024-05-01T10:53:15Z

Is there any documentation on website querying?
I have seen anythingllm is compatible with Web site but no way to find the documentation.
Thanks for the links/help

shatfield4 added enhancement New feature or request feature request labels Apr 25, 2024

shatfield4 self-assigned this Apr 25, 2024

shatfield4 mentioned this issue Apr 26, 2024

[FEAT] Website depth scraping data connector #1191

Merged

10 tasks

shatfield4 linked a pull request Apr 26, 2024 that will close this issue

[FEAT] Website depth scraping data connector #1191

Merged

10 tasks

timothycarambat closed this as completed in #1191 May 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEAT]: Website scraping depth #1190

[FEAT]: Website scraping depth #1190

shatfield4 commented Apr 25, 2024 •

edited

flefevre commented May 1, 2024

[FEAT]: Website scraping depth #1190

[FEAT]: Website scraping depth #1190

Comments

shatfield4 commented Apr 25, 2024 • edited

What would you like to see?

flefevre commented May 1, 2024

shatfield4 commented Apr 25, 2024 •

edited