Trust Spheres and Scalability
I've seen a lot of questions recently relating to Sia's limited scalability, and I wanted to explain what scalability means on the Sia network.
What makes decentralization difficult is that you don't trust anyone. If we trusted the world, we could rely on centralized models. The advantage to decentralization is that you don't need to trust anybody, but that comes at a heavy cost. Trustless decentralized storage is actually impossible. The whole point is that you are giving your data to someone else instead of keeping it yourself, and if you are giving away the data then there's a chance you won't get it back. So, Sia doesn't try to do trustless decentralized storage, and instead tries to do trust-minimal decentralized storage. We achieve minimal-trust data by: giving data to many hosts but only requiring that a few of them are available, paying the hosts for storing the data - but not until they've stored it for a long time, and by penalizing the hosts for losing the data. We make sure that the incentives are properly aligned, and after aligning the incentives we make sure that we're still tolerant to loss.
I want to introduce the idea of a 'trust sphere'. A trust sphere is a group of people or nodes that all trust eachother. On today's network, each Sia daemon runs a separate trust sphere, meaning that the daemon doesn't trust anything but its own internal state. To gain confidence that data put on the cloud is safe, the daemon creates contracts with a high number of hosts which each pay the hosts, and then penalize the hosts if the contract is not fulfilled. Each contract takes space on the blockchain, because without using the blockchain we are unable to align the incentives, especially if we plan on going offline, or if we expect the price of data to be changing at all. The host needs some reason to keep the data even if we aren't watching the host, and even if someone else comes along and tries to offer a better price. (similarly, the host needs a blockchain so that it knows its going to get paid after storing the data). So, we need at a minimum, one contract on the blockchain per host we're going to use in our trust model. Sia is set up so that, out of N total hosts, only M have to follow the rules. You can pick any value for these, but it's way better if N is large and M is small.
Putting these contracts on the blockchain is expensive. Sia's block size is 2MB, and each relationship takes up about 2kb. So that means, in each block, we can form 1000 relationships. Every year, Sia can form about 52,000,000 relationships. Each trust sphere is going to use some volume of those relationships. Relationships have a lifetime, and some trust spheres might want lifetimes of 60 months (the maximum recommended time), some might want lifetimes of 3 days (for maximum reliability). And some trust spheres might want relationships with 20 hosts at a time (the recommended minimum), others might want trust relationships with 300 at a time (the recommended maximum).
So Sia only has room for so many relationships per year, and each trust sphere is using a bunch of relationships. The nice thing is that once you have set up your relationship, you can put as much data on it as you want, you can give each host a large volume of data and it really doesn't effect the amount that you use the blockchain. If you plan on doing HUGE amounts of data, you need to be careful to pick hosts that can support HUGE amounts of data, but otherwise there are essentially no limits.
52,000,000 relationships per year is probably enough for somewhere between 50,000 and 1,000,000 trust spheres total. Bigger players are probably going to want more relationships. So the question is, how do we get Sia to scale given the limitations? We have a few options:
- Increase the blocksize. This has a whole host of problems and tradeoffs, and is being heavily explored in the Bitcoin world right now. It's an option for us, but not a great option because the scaling is linear.
- Decrease the size of contracts. There might be some room here for optimization. Today, a full contract (precursor transaction, file contract, file contract revision, storage proof) is about 2kb. In theory, we could probably get this down to about 1kb. Lots of work, not a ton of payoff. Scalability boost is 2x
- Decrease the number of relationships per trust sphere. This is where it starts to get interesting. In the current setup, each trust sphere connects directly to a host. But maybe, trust spheres could connect to a hub, and then leverage the hub's relationships. The thing is, the hub is controlling the relationship so you need some way to guarantee that the hub can only make changes that you approve of. This is difficult when the hub is talking to 100s or 1000s of nodes, because they'd basically all have to write signatures that say 'yeah, this change doesn't hurt me', and a single party could block the change. An alternative approach might be to make a new type of 'append-only' file contract, such that you connect to a hub with a huge penalty for losing data (and assume that they'll outsource the data), and then you have some proof-of-fraud that you can create/submit if the hub makes a change which is not append-only. The scaling potential from using hubs is probably 100-10,000x, but there seem to be significant security tradeoffs here, you very much leave the crypto world and get stuck in th game-theory and weakness-to-denial-of-service world.
- Increase the number of humans per trust sphere. (similar: increase the amount of data per trust sphere). This is the route that Sia is currently preferring. In the case of enterprises, we're optimizing for data-per-trustsphere. If you've got 10,000 people on the Sia network each storing 100,000 TB, Sia is a pretty healthy ecosystem. otoh that means it's probably completely inaccessible to anyone storing less than 100TB of data, simply because the transaction fees will outprice them. But if you trust, for example, your local library, then everyone in your town could get access to Sia as a single trust sphere through the local library. You still end up needing to trust someone else, but instead of trusting someone random you are trusting someone you know, and someone you can sue if they break the trust. Scalability here is probably between 5x and 1000x.
- Make relationships semi-permanent and renewable. This is the technique used by the lightning network. Relationships are basically two way payment channels, which means that you can send money in both directions. If the host is spending all of their money back through the renters that are paying them, they can essentially refresh the dollar value of the contracts, and end up storing a very large amount of data for what on-chain appears to be a very small amount of money (but, the chain doesn't see all of the circulation that's happening which makes this possible). This means that instead of lasting a few weeks, relationships could be made to last forever. Scalability boost is probably between 3x and 100x.
- Blocksize safety improvements from Bitcoin. It's unclear what direction these are going to go, but things like IBLT, weak blocks, and a handful of other accomplishments make it likely that the 2MB blocks of today will be able to be increased to 4MB or 8MB without increasing node cost or miner centralization, and without requiring advances in cpu speeds or network speeds. Scalability here is 2x-20x.
We're still actively exploring the options. If we do end up being a primarily-enterprise solution, we actually get some scalability bonuses. It's not really reasonable to expect a home user to run a $20/mo node that can support 8MB blocks, but it is reasonable to expect an enterprise user to run a $100/mo node that can support 16MB blocks (at least, it does when your data costs are 100x that), which means that we go from 50 million relationships per year to 400 million relationships per year. So, in the enterprise world Sia really does not seem to have scaling issues. 10,000 enterprises is a lot, and if we get there it's likely that we'll be able to do some brute-force stuff to get to 100,000 or 1,000,000 enterprises, which again is a huge number.
There's also a possibility (though it seems mostly unlikely at this point) that we could get some type of SPV data storage working. This again means that end-user security is reduced, but perhaps is still high enough to be worth using and then brings the transaction costs low enough that it makes sense to be a separate trust sphere.
The Sia protocol today can pretty easily support most of the world's enterprise customers. The Sia protocol today definitely cannot support most of the world's consumers, but in the near term it won't have to. Until Sia has a market cap that looks like Bitcoin, Sia will be fully accessible to enterprises and end-users alike. After that, it will mostly only be available to enterprises, but anyone storing more than about 100TB of data should find the transaction-fee overhead to be under 10%. As that number grows, it'll actually make economic sense to continue increasing the blocksize.
A handful of potential theoretical improvements to the Sia protocol offer a collective potential boost anywhere from 100x to 10,000x in scalability. That makes Sia accessible to billions of people, though not good enough to support an entire Internet of Things. In all likelyhood, only a handful of these potential scalability improvements will end up coming through, though there may be some others over the next 3-5 years which surprise us and add a ton of benefits. We'd probably need to hardfork to take advantage of some of these improvements, and we are unlikely to have any of them in the next 18 months.
interesting i find Internet of Things in your post. is Sia smart contract can be expanded into such things?
I like it. Thanks a lot for updating Taek :)
interesting i find Internet of Things in your post. is Sia smart contract can be expanded into such things?
Sia does not do anything more than storage, but you could configure an IoT device to use Sia's storage layer, while using Ethereum or some other platform to do other actions. Except, Sia won't hit IoT scale, so unless that device is moving massive amounts of data, it'll need to participate in using a shared trust sphere.
I bet in 3 or 4 years some new approaches will be available even though current solution is good enough. I know there are two unique researches on scalability. One is Bob McElrath's "Braiding the Blockchain" - (https://scalingbitcoin.org/hongkong2015/presentations/DAY2/2_breaking_the_chain_1_mcelrath.pdf). Another one is Serguei Popov's "Tangle" - (http://18.104.22.168/tangle.pdf). Both are Dag alike consensus (but I am not technical person and have no idea what is Dag). Tangle is being impletmented in IOTA right now. IOTA is also using a light PoW with Tangle. The PoW part is not on the Tangle white paper. It will be interesting to see how Tangle works in IOTA in next couples of months.
Sergio Lerner's comments on Dag - https://bitcointalk.org/index.php?topic=1177633.0
A new research paper on scalibility - "On Scaling Decentralized Blockchains" http://fc16.ifca.ai/bitcoin/papers/CDE+16.pdf
Now it is reasonable for me why scalability is limited