Upload incentive



  • I agree that its good enough for normal usage, but it seems to allow for a DOS also?

    Because if I create a very large number of (fake) hosts, that offer a competitive pricing for storage, but are not interested in making money or keeping the data, the whole network will become unreliable for new files as soon as they become a substantial percentage part of the whole network.

    And since all it takes to create a fake host is an ip adress and every little cpu/disk, it doesnt seem that expensive to double or even triple the network in size.

    @LjL said:

    I think the "reputation" system may be boosted a lot by making it distributed (like the rest of Sia), instead of private to the individual renters. The network as a whole should know which hosts are reliable.

    This is exactly what is missing to make Sia finished. Without it, the whole network will become unreliable in storing new data during attacks, especially for new users.

    @Taek said:

    1. Paid downloads. A host that is offline is missing out on revenue from serving files to renters. Other hosts will be getting that revenue instead. For files that are not accessed frequently, this is less useful.

    Is there anything implemented for this already? Because I can think of some use cases where the uploader is not the downloader, and payment would be necessary to compensate the host for the bandwidth.


  • admins

    Of course, this is easier said than done...

    Yes, much easier said :P The problems here are significant, and really we need some breakthroughs in research before we can expect to have a sufficiently reliable system in the real world. As far as I'm aware, the cutting-edge of research hasn't made much progress on this.

    I agree that its good enough for normal usage, but it seems to allow for a DOS also?

    Because if I create a very large number of (fake) hosts, that offer a competitive pricing for storage, but are not interested in making money or keeping the data, the whole network will become unreliable for new files as soon as they become a substantial percentage part of the whole network.

    That's called Sybil attack, and is a separate class of problem from what I was discussing above. We [will] fight Sybil attacks using a proof-of-burn mechanism that makes it expensive to be a host. If you are a host with lots of storage that is making lots of money, it's a cost of business and it won't be a huge percentage. But if you are trying to emulate a huge amount of fake storage, you're going to need a ton of money to pull off the attack. You'll need to burn many times as many coins as the rest of the network combined. If the average host is burning 1% of their profit, you'll need to burn enough coins to match 5-10% of the entire historical profit of the whole network. It won't be cheap, and you won't be getting those coins back.

    This is exactly what is missing to make Sia finished. Without it, the whole network will become unreliable in storing new data during attacks, especially for new users.

    Not true. It's fine, if suboptimal, to have renter's be their sole source when determining which hosts have good reputation. New users are more at risk, but we can distribute trusted reputation 'starting points' that people can optionally download to limit the amount of risk. Because all of the proofs are on the blockchain, as are all of the announcements, there are a number of things we can do to minimize the risk for people who are newer or aren't online very often.

    Is there anything implemented for this already? Because I can think of some use cases where the uploader is not the downloader, and payment would be necessary to compensate the host for the bandwidth.

    It's on the way, but it's not done yet. Should be done in the next sprint or two.



  • @Taek said:

    Because all of the proofs are on the blockchain, as are all of the announcements, there are a number of things we can do to minimize the risk for people who are newer or aren't online very often.

    This is why I mentioned somewhere at some point that perhaps the fact you have a blockchain in Sia may provide a "hook" to a distributed reputation system, even though that's a problem that can't otherwise be solved yet.

    Namely, could there be no way for renters to gauge a host's past reliability (so, its reputation) based on whether they had transactions with the same renters repeatedly, and whether any of those resulted in collaterals? Or something else of that sort.
    I suppose that since address reuse is avoided, it's not easy (or even desirable) to be able to follow these things, but, just wondering.


  • admins

    Namely, could there be no way for renters to gauge a host's past reliability (so, its reputation) based on whether they had transactions with the same renters repeatedly, and whether any of those resulted in collaterals? Or something else of that sort.

    It's difficult because both successful file contracts and failed file contracts can be faked. You can easily create a file contract that pays out to the host with both collateral and payment that has a fake Merkle root, meaning it'll be guaranteed to fail. Anyone looking will think that the host failed to solve the proof, because it went to their address. But... the host did not agree to the file contract.

    Similarly, it's easy for a host to make a fake renter and upload files to itself, making it look like the host is honest, when in fact all of the renters and data were fake.

    We might be able to get somewhere if we look for histories where both the host and the renter involved in the file contract have histories with many other renters and hosts, but then you lose anonymity and have to engage in some form of address reuse. Additionally, someone could still create an entire ecosystem of fakes and it would be hard to tell the fake participants from the real participants. Once you've been around for a while, you may have a set of hosts which you trust to be real. From there, you can look for renters who have used the hosts and then look at what other hosts the renters have also had success with.

    But again, it's not clear that an attacker couldn't manipulate this in some way. And it's a lot to implement. But we might be able to achieve this at some point. I'm sceptical.



  • The only way I would feel safe to store important backups in SIA is when I had a very good whitelist of trusted hosts in my client.

    Now in that case, suppose instead of IP's of SIA hosts it contained names like 'Amazon S3', 'Dropbox', 'Google', would you call that a decentralized storage client? That's my problem with seeing SIA as really decentralized storage right now, if it needs a trustlist by design.

    But maybe the strong point of SIA is more the build-in way to pay for storage, instead of being really distrbuted. But that payment system becomes much less needed when you already need to trust the host, because if I know who is hosting my data, I could just as well ask for a bill afterwards (like Amazon, Google, etc.).

    I still really love the idea and technology behind SIA, but it seems it will work unreliable for new users without those 'reputation starting points', and that removes the decentralization, because if Bitcoin had a whitelist of miners which blocks are trusted, would you still call it really decentralized?


  • admins

    Now in that case, suppose instead of IP's of SIA hosts it contained names like 'Amazon S3', 'Dropbox', 'Google', would you call that a decentralized storage client? That's my problem with seeing SIA as really decentralized storage right now, if it needs a trustlist by design.

    Decentralization is a spectrum. You can be more decentralized, and you can be less decentralized. The case where you are storing data across multiple providers is more decentralized than just giving data to a single provider, and is the core way that Sia operates.

    I still really love the idea and technology behind SIA, but it seems it will only work untill people start attacking.

    What types of attacks are you concerned about? A host that is dropping data or has a low uptime is going to have a much lower revenue (and probably a negative revenue) than a host that has high uptime and is not dropping data. The difference is going to be 10x or more. Sia is additionally a highly competitive network, only the cheapest and most reliable hosts get selected for storage, and it's only hosts that can keep up which are going to be able to profit. One of the major strengths of Sia is that if you want to cheat the network, or attempt corruption, it's going to be extremely expensive.

    The only way I would feel safe to store important backups in SIA is when I had a very good whitelist of trusted hosts in my client.

    I believe that you are just missing the strengths of Sia. The whole idea behind Sia is that the blockchain enables you to trust unknown hosts because you know that they are going to be hurting financially if they are unreliable. You don't need to know a host's reputation to be certain that they will be losing money for losing your data. This is combined with redundancy. If a large number of the hosts holding your data go offline, your data is still recoverable! Sia has a large number of protections against malicious hosts.

    Our upcoming whitepaper will go into this with much greater detail.



  • @Taek said:

    Sia is additionally a highly competitive network, only the cheapest and most reliable hosts get selected for storage, and it's only hosts that can keep up which are going to be able to profit.

    If the client automaticly selects the most reliable hosts (and not only the cheapest), then how does it determines this? Because I always supposed it just picks them randomly from the pool of available hosts.

    I believe that you are just missing the strengths of Sia. The whole idea behind Sia is that the blockchain enables you to trust unknown hosts because you know that they are going to be hurting financially if they are unreliable.

    If they provide the proofs to the blockchain, they are not really hurted financially, because they make the same money as a host which is actually uploading data back to the renter, and they even save on bandwith. But as long as the vast majority of the network acts sane, and in their best interest, SIA will work fine.

    I was just worried about the case where the majority of your peers dont return data, but I didnt realise the hoster had to burn coins to take the contract in the first place, so you're right that its too costly.

    If a large number of the hosts holding your data go offline, your data is still recoverable! Sia has a large number of protections against malicious hosts.

    This is not really true, I have so many years of experience repairing data using Reed Solomon (Par2 on Usenet), and it works very well when you miss relatively small chunks. So for example, if you miss 1% of your data. But in SIA you're much likely to loose larger parts (because the same host stores many parts because the network still is small), and those erasure codes are not efficient if you miss 25% of the data. So my feeling says it could be more efficient to just store it two or three times, then store it once + repairdata.

    EDIT: I actually measured it now, and the repairdata of a 10 mb file is only 100 kb larger than storing the file twice, so I was wrong with above.



  • @Muis The erasure codes are as efficient as you tune them to be, by setting the desired amount of redundancy. If they've only worked with 1% of missing data in your experience, that means you were using them with a very low redundancy of little more than 1x. I believe that currently, Sia uses 6x, which would allow for well more than 25% of hosts to go offline, although plans are to reduce that as the network gets more reliable.

    But Taek can cover all this, and other points you touched, in better detail.

    I will point out, however, that if you really insist on only using "trusted" hosts of the Amazon S3 type, then there are other decentralized solutions that do not have all these peer-to-peer and blockchain mechanisms attempting to let you trust any host: namely, you should have a look at Tahoe-LAFS (I hope Taek won't be annoyed by me mentioning the competition, since... it's not really competition).



  • I will point out, however, that if you really insist on only using "trusted" hosts of the Amazon S3 type, then there are other decentralized solutions that do not have all these peer-to-peer and blockchain mechanisms

    No I'm only interested in SIA because of the P2P aspects, and I was worried about having to use a whitelist, but I did not really realise at also contained PoW for hosts. Which makes it a lot easier for me to trust any random node, because I know I am not likely to be surrounded by spoofed ones.


Log in to reply