A specific example that comes to mind: I want to implement decentralized web crawling. In particular, I want to create a tool for quickly creating mirrors of web pages and storing them on Skynet. I have done the research and browser-based web crawling is not really a thing. While I might be able to achieve something if I, say, transpile a Node crawler into browser JS, I think it would be a better use of time to use a standard web crawler that runs on a server.
But this introduces an element of trust. Even if the app and the mementos (web archives) are stored on Skynet, with state tracked through the registry, you would still need to depend on a specific server with a specific piece of software. Here are my recommendations for mitigating this:
Server-based software components should be open source. This should go without saying. On top of this, it should be as easy as possible for others to deploy their own instance of the server software. Not only should it be feasible for others to take over should your own server go down, it should be possible for people to actively use alternatives.
Within the settings of the app, although a default server may be specified, the option should exist to specify an alternative server through a URL. Ideally there would be an option to persist these alternative server options (including the de-selection of the default option) through something like a SkyID account.
Then there is the generated data. Normally with a centralized archive provider you can be sure that their software is serving their archive because… it’s their server. But in this decentralized storage environment, mementos, or alleged “mementos,” can come from rogue servers or absolutely nowhere at all. When a memento is prepared, that content should be hashed and that hash should be signed by the server. You could verify the hash of the contents and then verify that the signature comes from a server-owned key. Whether the source is trustworthy is up to the end-user, but you should at least trust that the memento (or other computed/retrieved data object) comes from where it says it did.
To distill this into three basic principles: interoperable independent compute servers, freedom of server choice, and server attestation to outputs. Are there any dimensions I am missing?