webarchiving

Star

Here are 46 public repositories matching this topic...

iipc / awesome-web-archiving

Star

An Awesome List for getting started with web archiving

awesome awesome-list webarchiving

Updated May 8, 2024

akamhy / waybackpy

Star

Wayback Machine API interface & a command-line tool

osint internet-archive web-archiving wayback-machine webarchiving cdx-api internet-archiving savepagenow archive-webpage archive-webpages wayback-machine-api wayback-machine-python

Updated Feb 26, 2024
Python

N0taN3rd / Squidwarc

Star

Squidwarc is a high fidelity, user scriptable, archival crawler that uses Chrome or Chromium with or without a head

crawler chrome crawling chrome-headless browser-automation headless-chrome webarchiving webarchives high-fidelity-preservation puppeteer

Updated May 19, 2020
JavaScript

N0taN3rd / node-warc

Star

Parse And Create Web ARChive (WARC) files with node.js

warc web-archiving webarchive web-archives webarchiving warc-files chrome-remote-interface pupeteer

Updated Jan 3, 2023
JavaScript

ArchiveTeam / wget-lua

Star

Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.

crawler scraper downloader spider lua ftp scraping crawling archiving wget crawl zstd crawlers warc webarchiving archiveteam wget-lua

Updated Jan 29, 2024
C

harvard-lil / warc-gpt

Star

WARC + AI - Experimental Retrieval Augmented Generation Pipeline for Web Archive Collections.

ai warc webarchiving rag

Updated Jun 10, 2024
Python

peterk / warcworker

Star

A dockerized, queued high fidelity web archiver based on Squidwarc

archiving preservation webarchiving webarchives high-fidelity-preservation

Updated Jul 19, 2020
Python

machawk1 / awesome-memento

Star

A list of things related to software, literature, and other content for 🕣 Memento

awesome memento awesome-list webarchiving memento-rfc

Updated May 29, 2024

commoncrawl / cc-notebooks

Star

Various Jupyter notebooks about Common Crawl data

jupyter-notebook aws-athena commoncrawl common-crawl webarchiving webgraph-framework

Updated Jun 2, 2022
Jupyter Notebook

oduwsdl / tmvis

Star

An archival thumbnail visualization server

visualization nodejs archive memento webarchiving timemap tmvis webpage-changes

Updated Mar 25, 2024
JavaScript

cipher387 / quickcacheandarchivesearch

Star

Quick Cache and Archive search buttons

webarchive webarchiving google-cache yandex-cache baidu-cache

Updated May 11, 2024
JavaScript

ArchiveTeam / WebArchiver

Star

Decentralized web archiving

python crawler web decentralized archiving archiver warc webarchiving

Updated Aug 7, 2018
Python

httpreserve / httpreserve

Star

Digital Preservation of HTTP in documentary heritage.

archives code4lib wayback internetarchive digipres webarchiving digitalpreservation documentary-heritage digital-repositories waybackmachine

Updated May 26, 2023
Go

toimik / WarcProtocol

Star

Parser for WARC (aka WebArchive) files

warc webarchive webarchiving warc-files webarchives warc-format warc-reader warc-record

Updated May 22, 2024
C#

ruarxive / awesome-digital-preservation

Star

Awesome list dedicated to digital and data preservation tools, sources, services and so on.

crawler list awesome awesome-list warc digital-preservation archival webarchiving

Updated Oct 8, 2022

WebarchivCZ / Seeder

Star

Seeder - Czech webarchive curating tool and public site

government django tools czech czech-republic archive webarchive webarchiving webarchives

Updated May 21, 2024
Python

peterk / munin-indexer

Star

A social media open post web archiving tool

archiving preservation webarchiving high-fidelity-preservation

Updated Apr 3, 2024
JavaScript

News-Archiver / news-archiver

Star

News Archiver, Data Aggregation for CNN and Fox News

javascript mysql cnn scraping-websites webarchiving foxnews

Updated Apr 23, 2023
JavaScript

N0taN3rd / node-cdxj

Star

Parse CDXJ(https://github.com/oduwsdl/ORS/wiki/CDXJ) files with node.js

webarchive web-archives webarchiving cdxj

Updated Jul 20, 2017
JavaScript

ArchivingToolsForWBM / AdvancedInternetArchiving

Star

Makes saving pages in bulk to the wayback machine much easier

web-archiving webarchiving

Updated Jun 9, 2024
HTML

Improve this page

Add a description, image, and links to the webarchiving topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the webarchiving topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

webarchiving

Here are 46 public repositories matching this topic...

iipc / awesome-web-archiving

akamhy / waybackpy

N0taN3rd / Squidwarc

N0taN3rd / node-warc

ArchiveTeam / wget-lua

harvard-lil / warc-gpt

peterk / warcworker

machawk1 / awesome-memento

commoncrawl / cc-notebooks

oduwsdl / tmvis

cipher387 / quickcacheandarchivesearch

ArchiveTeam / WebArchiver

httpreserve / httpreserve

toimik / WarcProtocol

ruarxive / awesome-digital-preservation

WebarchivCZ / Seeder

peterk / munin-indexer

News-Archiver / news-archiver

N0taN3rd / node-cdxj

ArchivingToolsForWBM / AdvancedInternetArchiving

Improve this page

Add this topic to your repo