WARC + AI - Experimental Retrieval Augmented Generation Pipeline for Web Archive Collections.
-
Updated
Jun 10, 2024 - Python
WARC + AI - Experimental Retrieval Augmented Generation Pipeline for Web Archive Collections.
Makes saving pages in bulk to the wayback machine much easier
metawarc: a command-line tool for metadata extraction from files from WARC (Web ARChive)
A list of things related to software, literature, and other content for 🕣 Memento
Parser for WARC (aka WebArchive) files
Quick Cache and Archive search buttons
An Awesome List for getting started with web archiving
A tool for detecting viruses and NSFW material in WARC files
Seeder - Czech webarchive curating tool and public site
A social media open post web archiving tool
An archival thumbnail visualization server
Wayback Machine API interface & a command-line tool
Digital archive of web pages related to the Guild of Information Networks
Parse a Heritrix crawl.log into an XML sitemap
Given four bytes, download a random file from web archives implementing the UKWA Shine interface
Digital Preservation of HTTP in documentary heritage.
Client app for httpreserve pkg that generates CSV, JSON, HTTP, and BoltDB
News Archiver, Data Aggregation for CNN and Fox News
Add a description, image, and links to the webarchiving topic page so that developers can more easily learn about it.
To associate your repository with the webarchiving topic, visit your repo's landing page and select "manage topics."