Drive Pooling...



  • As part of a host I have, instead of using RAID which in my opinion is a bit old date, I used Stablebit DrivePool to pool all of my disks together which in turn allows me to enable file/folder replication etc etc. This then presents the pooled disks as a single large drive. However as Sia pre-allocates the space on a disk, it will only use the space on a single drive, not the large pooled drive.

    Will this always be the way Sia hosting works? I.e. preallocating disk space?



  • Sia technically doesn't pre-allocate it. You can right click the file to see "space on disk" used is not anywhere near the actual file size (until you accrue contracts and start receiving data.)

    You are just using software raid instead of hardware raid. (IIRC, stablebit is essentially just raid 1e.)

    You might find it better to simply use each drive separately, and add each drive as a share folder in Sia.

    --
    SiaMining.com -- Your PPS Sia Pool.
    I'm not affiliated or work on the Sia/Nebulous team.

    0


  • @xurious You are giving wrong info. Currently SIA pre-allocate ALL space before any contracts are set.
    Files siahostdata.dat use full disk space. I.e. - you set 100 GB max in clients, and SIA pre-allocate all 100 GB Immediately and siahostdata.dat consume 100 GB on disk.

    And this is a wrong and very bad behavior. The client should reserve a space on a disk only after signing the contract with the renter(s). (But may be before actual files upload from renter to host, to ensure availability on the first request without delays). And only in the volume for which such contracts were signed.

    Until that moment, the disk space belongs to me(host owner) and I want to be able to use it for other purposes.



  • @Mad_Max
    You are right on Sia preallocating the offered storage space.
    But I don't agree on the principle to use the space for anything else until a contract is signed. That would enable intentionally bad hosts, offering (pretending to offer) no real space but collecting the contract creation fees.

    That is the point why the storage price in SC should rise. Hosting is not a "money for free scheme". Hosts puts collateral on risk, and makes the storage space unavailable for anything else.

    I don't offer all my storage at the current prices, I offer only my "spare" space. If the demand (and price) rises, I'll increase the offered space and buy additional drives.



  • @reinisp
    If "bad" host have no real space on disks it can not collect any contract creation fees.
    Right algorithm in my vision:

    1. Host owner set total max space allowed to use by SIA client. Example: 2000 GB
    2. Client checks: is this space actually exist and available to program?(try to create file(s) up to this size), but does not pre-allocate it by huge empty files - create and delete them (otherwise all 2000 GB space wasted - nobody can use it at all and nobody ever pay a bit for it - disks running 24/7 to store 2000 billions useless zero bytes).
    3. If no incoming contract to host - actual disk space should not be used by client (not counting SC blockchain and other overhead).
    4. After some time one of the renters send contract offer to this host. For example: rent 50 Gb fo 3 month.
    5. Client pre-allocate 50 Gb on disk(eg create 50 Gb empty file), and only after pre-allocation sign contract and send it back to renter.
    6. If contract signed by both sides = space pre-allocated by host, collateral for 50 GB frozen by host in contract, payment for up to 50Gb*3month storage and payment for contract creation frozen by renter in same contract.

    At this moment 50 Gb still does not used by anybody, but it dedicated/reserved for this specific renter and ready to use any time. Any uploads from renter to hosts happens ONLY after contract signed and disk space pre-allocated.

    And If bad host will try to cheat after he signed the contract- he will simple loose collateral allocated to this contract. And this loss will be much bigger compared to contract creation fee risked by renter. (typical contract creation fees now in 1-5 SC range, typical collateral in 100-500 SC per TB*month range, so in this example 50Gb * 3month = 5-25 SC collateral/loss to rogue host).
    So "bad" host will punish himself.

    P.S.
    Sia already works like algo above. For one exception - current sia client preallocate space on step 2, Although this should only happen in step 5



  • @Mad_Max
    I think offering something you can not deliver is cheating or fraud. Why should a renter or the whole network assume you really have that much space you have announced (shared)? If the shared space is not guaranteed it is a useless information for the network and host scoring. It is then simply a local variable.
    Do you estimate what happens, if someone wants to rent a terabyte? Every host (in ranking order) should attempt to create a file to test that there is 1TB of space available? Currently that would not be a big problem, but imagine the Sia network with thousands or more potential contracts every second? Some for a couple of GB, some for a couple of TB...

    Just do not offer/share your storage space you can not afford to share or for a price you can not afford. If you need the space for yourself, share less!



  • @mad_max: I am most certain sia thin provisions the space used for hosting. I know this, as I have 5TB offered to the network, but my VM is only a few dozen GB in size.

    The reason the host preallocates space is to ensure the host has room for contracts it accepts. Imagine accepting a 1TB contract, but then only have 500gb on the disk. The client slowly fills the disk up, as it is unaware how much of the contract will actually be used.

    Until that moment, the disk space belongs to me(host owner) and I want to be able to use it for other purposes.

    You HAVE used it though, you suggest the sia folder size in the UI/SIad. if you want to purpose the space for other uses, then simply choose a smaller size.

    You get penalized for having less than 4tb of free space.

    The host (it's just for testing, hence the cheap pricing): https://siahub.info/host/3002

    The VM: https://supload.com/rJSNAcTV-

    Edit: @reinisp A "fraud" host just simply punishes themselves. It doesn't damage the network. It's straightforward to spoof the amount of diskspace you have, The only possible way to ensure the space actually exists would be if Sia preallocated the space by writing uncompressible data for the entire storage. Not that it would change anything.

    As it sits now, there's no reason NOT to pad all hosts with 4tb of free disk space, as it neutralizes the penalty. Once the underlying storage is full, simple disable accepting new contacts.

    --
    SiaMining.com -- Your PPS Sia Pool.
    I'm not affiliated or work on the Sia/Nebulous team.

    1


  • @xurious said in Drive Pooling...:

    As it sits now, there's no reason NOT to pad all hosts with 4tb of free disk space, as it neutralizes the penalty. Once the underlying storage is full, simple disable accepting new contacts.

    Indeed. Some even claim they got 500TB or 1PB of storage, but there's no way of knowing whether that's true today, or just an intention, or even just bragging!



  • @xurious , @maol
    Indeed. I've never looked at the FS utilisation, only at the siahostdata.dat files.These files pretend to be the size set when added, but the filesystem utilisation is telling "nearly empty"...

    Anyway, I am against cheating. For me it would be ok to hold the files filled with uncompressible random bytes as a proof of having what I offer.
    The sad thing is- I can't tell others "Look, Sia network has nearly 3 petabytes of space" without lying as I know, if there is cheating possible,people will do it.



  • @reinisp

    Anyway, I am against cheating.

    However, it is not cheating. No one is being loses here except the host who claims they have more space than they do. All the rules are being followed.

    The renter is secured against lack of space by collateral.

    The network doesn't care how much each host.

    For me it would be ok to hold the files filled with uncompressible random bytes as a proof of having what I offer.

    This is more something to look into. However it does add extra cost to each host that isn't necessary for the security of the network.

    --
    SiaMining.com -- Your PPS Sia Pool.
    I'm not affiliated or work on the Sia/Nebulous team.

    2


  • @reinisp said in Drive Pooling...:

    @Mad_Max
    I think offering something you can not deliver is cheating or fraud. Why should a renter or the whole network assume you really have that much space you have announced (shared)? If the shared space is not guaranteed it is a useless information for the network and host scoring. It is then simply a local variable.

    It is not a cheat / fraud until the money is taken. Frauds all about money - no money = no any fraud possible.
    So available space should be checked and blocked before making a payment to contract between host and renter, not at first client setup.

    And i agree - this is almost useless information for the network and host scoring. Because it
    very easily counterfeited by dishonest hosts, but strongly penalizes small honest hosts.
    And it should be simply a local variable - instruction to local client, not the base for hosts rating.

    On a fully distributed trustless network without any centralized authorities, the only reliable guarantee is a cryptographically signed contract and the money put at risk in it.

    Do you estimate what happens, if someone wants to rent a terabyte? Every host (in ranking order) should attempt to create a file to test that there is 1TB of space available? Currently that would not be a big problem, but imagine the Sia network with thousands or more potential contracts every second? Some for a couple of GB, some for a couple of TB...

    Nothings interesting happens. Creating test empty file is a simple fast operation - only a fraction of 1 second needed to perform such check "on the fly". And It's speed does not depend on the size of this file. And if we think about distant future with much bigger contracts flow - blockchain will be the main bottleneck, not the speed of files creation on hosts. Current blockchain used in SIA barely able to handle few contracts per second if it is not short spike but average values. And renters in any case will have to wait a few minutes to confirm the concluded new contracts in the blockchain before they start using them to store data. A second of extra delay does not affect anything here.

    Just do not offer/share your storage space you can not afford to share or for a price you can not afford. If you need the space for yourself, share less!

    This would work fine only if there were no big penalties for having free storage's space < 4000 GB.
    Like it work on other distributed storage projects like STORJ. You simply set up volume you want to share - and it will be used only then renters need it. There's no point in putting more than you actually want to share.

    But such penalties for free space exists in SIA. In current network you need to WASTE 4000 GB of disk space to effectively rent just 20 GB. If you rent only 20 GB, you will be paid only for 20 GB, but actual space use will be 4020 GB. If you want rent 100 GB - you need 4100 GB to avoid penalties.
    This nonsense provokes falsification of data on the available space and distortion of network statistics by host owners.



  • @xurious said in Drive Pooling...:

    @mad_max: I am most certain sia thin provisions the space used for hosting. I know this, as I have 5TB offered to the network, but my VM is only a few dozen GB in size.

    You probably use virtual machine software, which by default automatically compresses the image data of VM.
    On a real machine and a real drive, all 5 TB are used even if there is not a single contract with renters signed yet .

    The reason the host preallocates space is to ensure the host has room for contracts it accepts. Imagine accepting a 1TB contract, but then only have 500gb on the disk. The client slowly fills the disk up, as it is unaware how much of the contract will actually be used.

    AFAIK in current implementation such big single request from renter will be automatically split to many 25-50 Gb contracts with many hosts (minimum 40 hosts i think?) on renter side. And even if such contacts can exist and reach hosts - simply preallocate disk space immediately before signing this contract, and not days / weeks / years before the offer for such POSSIBLE contract is received by host. If there is not enough space available at this moment - the contract will not be concluded and the renter simply select the next host from the long queue, and the host owner will miss a good deal (So he will be financially interested in avoiding such situations whenever possible).



  • @Mad_Max

    You probably use virtual machine software,

    Correct, ESXi (shown in the image.)

    which by default automatically compresses the image data of VM.

    Incorrect. VMWare ESXi does not have any capability for compression. ESXi, being an enterprise product, expects your storage to take care of compression if you need it. There is also no capability of deduplication. Both actions are "heavy" and reduce overall performance in general scenarios.

    On a real machine and a real drive, all 5 TB are used.

    Without diving into the code, I'm assuming sia uses a method similiar to this: https://stackoverflow.com/a/7970410

    "...use these functions to pre-allocate the clusters for the file and avoid fragmentation..."

    The sectors are pre-allocated, but not actually used. Granted, YOU can not use them for anything, but they are empty and unused until filled with contract data.

    --
    SiaMining.com -- Your PPS Sia Pool.
    I'm not affiliated or work on the Sia/Nebulous team.

    0


  • @xurious , @Mad_Max
    From this dicussion I conclude:
    Current use/implementation of the "shared space size" parameter is bad. From one side the network needs to know the total available space, from the other side the use of it in host score is leading to cheating. I don't know if currently the existence of the space is checked before contract creation. If no, it is cheating/fraud (collecting the contract fees). The network should be fair regardless of the relative height of the fee- everything should be trustworthy and transparent, including the declared available space. If the availability is checked before contract creation (assuming it can be only declared and not guaranteed), it should not be included in the host score calculation.

    I do even think the inclusion of the shared space as parameter in the scoring makes more harm than good for the performance. I think for a renter it is more preferable, if data can be uploaded to or downloaded from more smaller hosts than less but bigger hosts, if it happens concurrently, as the main limiting factor will be connection bandwidth anyway. Bigger hosts share their bandwidth between more renters, so it is slower for a particular single renter. In case of more but smaller hosts, the renters performance would be limited more by the renters available bandwidth than by the hosts bandwidth.

    A sidenote, I think using compression or dedup for filesystems storing encrypted data is complete waste of resources.
    And I still think that intentionally providing false numbers to get better score is cheating.



  • @reinisp said in Drive Pooling...:

    I think we have come to a consensus.
    Using the amount of free space to evaluate the host ratings does not guarantee anything (unless there is additional verification at signing contacting with renter, but then the original space reservation is redundant and unnecessary). But it stimulates the falsification of statistics by dishonest hosts and leads to centralization among honest ones.

    If you are an honest host planning to rent only 1 TB, then an additional "spare" 4 TB (to avoid penalties and degradation of rating) actually worsen your spending / income ratio 5 times. But if you are a huge disk farm planning to rent out say 50 TB, then an additional 4 TB gives only 8% of overhead costs. This is the way to centralize resources. Although the stated objective of the project is directly opposite.

    A sidenote, I think using compression or dedup for filesystems storing encrypted data is complete waste of resources.
    And I still think that intentionally providing false numbers to get better score is cheating.

    Yep, but only after space is filled with actual encrypted useful data received from the renter. Until this moment - it is a one of many ways to easily fake available space stats and get high rating in host queue - empty files compress almost to nothing and you can fill all space with other data while client will still think there is >4 TB of free space available on disk(s).
    Yes, you can call it cheating. But it is a cheating on other (honest) hosts, not on renters. Dishonest advantage in the competition for contracts.
    And only one reliable way to avoid this cheating is to make it meaningless my removing this stat from rating calculation.
    Even filling empty files with random uncompressible data will not help much (but will cause huge slowdowns of client work). Because use of compression / deduplication is only one of ways to fake this data. There are another ways too. ANY variables calculated on local client side can be faked and it can't be trusted.

    The host age and host uptime on the other hand are calculated by other clients independently and the host owner can not fake them in principle. Therefore, their use in the host rating calculation makes sense and useful.


Log in to reply