#

webarchiving

Here are 46 public repositories matching this topic...

ibnesayeed / archival-tests

A set of web archival replay test cases

testing memento webarchive webarchiving replay-tests archival-replay

Updated Oct 25, 2021
HTML

httpreserve / million-dollar-webpage

HTTPreserve Analysis of Million Dollar Web Page

digital-humanities web-archiving code4lib harvard webarchiving computing-history boingboing

Updated Jun 2, 2021

arquivo / dspace-link-extractor

Extracts links from DSpace repositories

java tika sitemaps webarchiving

Updated Nov 2, 2023
Java

N0taN3rd / node-cdxj

Parse CDXJ(https://github.com/oduwsdl/ORS/wiki/CDXJ) files with node.js

webarchive web-archives webarchiving cdxj

Updated Jul 20, 2017
JavaScript

shawnmjones / government-sites-archive-projects

This repository contains work done to determine how much of www.guideline.gov and qualitymeasures.ahrq.gov were archived.

archiving-datasets webarchiving webarchive-discovery

Updated Jul 16, 2018
Jupyter Notebook

SimonKocurek / Trebis

Offline storage of website data on Android

android room webview offline storage kotlin-android jetpack archived webarchiving

Updated Jun 22, 2018
Kotlin

UbuntuCZ / archiver

Nástroj pro archivaci webových stránek na Wayback Machine

crawler wayback-archiver webarchiving

Updated Dec 30, 2018
Kotlin

mijho / crawl-log2xml

Parse a Heritrix crawl.log into an XML sitemap

sitemap crawl sitemap-generator sitemap-xml webarchive heritrix webarchiving deno heritrix3

Updated Sep 30, 2023
TypeScript

httpreserve / wayback

A restrictied API in Golang for the (semi)-exposed functions of the internet archive.

archives code4lib internetarchive webarchiving digitalpreservation

Updated Dec 22, 2021
Go

athenekilta / arkisto

Digital archive of web pages related to the Guild of Information Networks

html php archive webarchiving

Updated Feb 9, 2024
HTML

httpreserve / phantomjsscreenshot

A wrapper for phantom.js commands for headless screenshots.

code4lib webarchiving digitalpreservation websnapshot httpreserve

Updated Jun 2, 2021
Go

MDBubing

pierlauro / MDBubing

From WARC records to MongoDB documents

crawler crawling warc webarchive webarchiving warc-files warc-format warc-record bubing

Updated Nov 3, 2020
Java

gitdev-bash / webArchiver

A archiving utility with an interface for web servers.

webserver archiving webarchive webarchiving

Updated Aug 3, 2021
Python

PaladhiDinesh / Web-Science

https://bl.ocks.org/PaladhiDinesh/raw/56e1843c31960ecfe919/ All the Assignments are mainly based on crawling data from websites, web archieving and analyzing the data and writing reports using python, latex, R. Includes studies of the Web's properties, protocols, algorithms, and societal effects.

python latex twitter-api d3js webcrawler webscraping webarchiving

Updated May 12, 2017
Python

httpreserve / workbench

Client app for httpreserve pkg that generates CSV, JSON, HTTP, and BoltDB

archives boltdb code4lib internetarchive webarchiving digitalpreservation digital-repositories

Updated May 26, 2023
JavaScript

ibnesayeed / awesome-web-archiving

An Awesome List for getting started with web archiving

awesome awesome-list webarchiving

Updated Mar 23, 2020

mgunn001 / WebArchiving-SeminarCourse

Class page for ODU CS 791 / 891 Web Archiving Seminar

webarchiving archived-webpages

Updated Dec 6, 2017
HTML

moonshine

exponential-decay / moonshine

Given four bytes, download a random file from web archives implementing the UKWA Shine interface

archives glam code4lib digipres webarchiving warclight file-formats ukwa

Updated Sep 8, 2023
Go

TarekJor / wpull

Wget-compatible web downloader and crawler.

crawler backup bookmarks wget web-archiving browsers preservation web-page webarchiving wpull web-downloader web-pages web-browsers

Updated Dec 20, 2017
HTML

MozillaCZ / phpbbcrawler

Link crawler for a phpBB forum

crawler tool phpbb wayback-archiver webarchiving

Updated Jun 19, 2022
Java

Improve this page

Add a description, image, and links to the webarchiving topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the webarchiving topic, visit your repo's landing page and select "manage topics."