blogust

Digital Archiving

https://crlf.link/mem/offline/

Why

Digital content is ephemeral. Corporations have no monetary incentive to preserve things, so they don’t.

The Internet Archive is a great way to preserve things for the public good, but you can also keep an archive for yourself.

How

Archive.org

Download html>Pandoc to markdown

Markdownr

archivebox: See their Github community page for a long list of resources.

https://github.com/mozilla/readability Mozilla’s library to extract the main content from webpages