Hello fedi

I would like to find a way I can have a cold copy of articles that I view, something like a self-hosted archive.is … but it does it automatically for every site I visit.

Please help me find out how or where!

@fluffy

1) Yes, you can do this up to a point. However, manual capture works better than automatic.

2) One approach is Pale Moon plus ScrapBook X. This approach is simple, but it only works for some sites.

3) For near-perfect captures, learn to use WARC toolsets. I've tested openwayback and pywb and suggest starting with those two. This isn't plug and play but I've gotten it to work pretty well.

4) Links:

github.com/webrecorder/pywb
github.com/iipc/openwayback/wi
loc.gov/preservation/digital/f

@oldcoder Thanks for the detailed explanation.

It looks like I’ll have to do some building to get exactly what I want!

@fluffy

1) These tools do work though in different ways. One of the WARC tools saves captures in multiple formats. You'll find that only the WARC format and multimedia files work well.

2) I started to do this in 2006 using pre-Quantum Firefox and the original ScrapBook extension. It's nice to still be able to read websites that are long gone.

3) There's one more approach that's tricky but automatic as you wished. One sets up a copy of Squid to decode https and cache content forever.

@oldcoder My main grievance is that I on occasion want to refer to an article I read, or to find it by searching by browser history, but if I store a link in my database years later it is gone, and browser history is no longer full text searchable, let alone far in the past.

Is there some reason you don’t automatically archive all of the web pages you visit?

Follow

@fluffy

1) Regarding the grievance: Yes, I started to write my own tools in 1995 to capture websites for this and other reasons.

2) Q. Why not capture everything? A. In the past, disks cost a lot more and complete and automated captures, especially for https sites, were more difficult.

But 1 TB disks, even SSDs, are cheap now and the Squid approach is both fast and automatic. So I'll probably capture more pages automatically in the future.

· · Web · 0 · 0 · 1
Sign in to participate in the conversation
Minetest Tooter

Mastodon server for creative rational­ists. In­tend­ed for light or tech­nical discussion as op­pos­ed to strong debate.

Discouraged: Identity poli­tics, religion, profanity, national events, X-rated dis­cus­sion or materials. We might set up other ser­vers for those things.

Encour­aged: Crea­tive Com­mons works, Mine­test and other FOSS games, FOSS, writers, artists, reci­pes, rhymes, cat photos, G-rated web­comics.