Core Development: Small Sector Support

Considering that the foundation will increasingly take over protocol development, it would be a logical decision to move the Nebulous Labs ‘Small Sector Support’ roadmap entry to the foundation roadmap. This will greatly benefit most if not all development activity on the Sia protocol.

Small sector support is paramount to optimize small file handling and the attached storage expenses that come with padding small files.

Although it is possible to pack multiple files into a single sector there are a number of danger points in doing so, including; (i) One has to wait for additional files to become available for batching and packing. When this is done client-side, one runs the risk of either not having enough files available within a reasonable amount of time for packing (forcing the client to flush the sector without proper packing) or interruption of service can cause the entire upload batch to fail; (ii) Object storage related architectures work with single object requests, making it non-trivial to accomplish sector packing as each object has to be successfully uploaded before proceeding to the next object.

2 Likes

I agree that this is an important feature to pursue. I’m not sure that it actually requires “small sectors,” though. What people really want is the ability to upload less than 4 MiB at a time, and I think it’s possible to achieve that without switching to a completely variable sector size.

First, some background on sectors (feel free to skip this if you are already familiar):

There are two basic units of storage on Sia: the segment and the sector. A segment is 64 bytes, and it is the smallest “addressable” unit: when you download from a host, you must download a multiple of 64 bytes, and those bytes have to be “aligned” to an offset that is a multiple of 64. That’s because our Merkle trees have a leaf size of 64 bytes.

Relatedly, when a host completes a storage proof, that proof is for a single (randomly-chosen) segment, and this proof must match the Merkle root of the most recent contract revision. But aside from that, consensus does not impose any restrictions on how the host’s data is laid out, or how renters read/write that data. To take this to an extreme, a host could be running a SQL server used by the renter; as long as the renter trusts the host to faithfully update their SQL database, they can make “revisions,” and the host can provide a storage proof to consensus when asked.

In practice, the “trust” part is not easy to solve. We don’t know how to create an efficient SQL database with verifiably-faithful reads and writes. But we do know how to do this with a big flat file made of segments. So that’s the structure that the host uses.

However, segments are kind of unwieldy. Consider storing a 64 MiB file; that’s 1 million segments. In order to download the file later, we need to know what segments are in it; but if we store the hash of every segment, that’s 32 MiB of metadata! Alternatively, we could store just an offset and a length, identifying which part of the “big flat file” on the host contains our 64 MiB; but then we lose content-addressability, along with other important cryptographic properties.

This is where sectors come in. A sector is just 65536 contiguous segments, totaling 4 MiB of data. Like a segment, a sector can be addressed by its Merkle root. We can also construct Merkle proofs within a sector. Consequently, when we store file metadata, we can store a small number of sector roots rather than a much larger number of segment roots, without any loss of security. Our 64 MiB file now requires just 16 hashes instead of a million. (By the way, our choice of 4 MiB as the sector size was mostly arbitrary, attempting to strike a good balance between small and large files.)

Since sectors are the “unit of practical storage,” we built the host around sectors rather than segments. And the most important design decision here was to store contract data as a “big flat file” made of sectors. This means that sectors must be aligned to 4 MiB boundaries, and that the total size of a contract must be a multiple of 4 MiB. This is where the “minimum file size” comes from: since the host only deals in sectors, if you want to upload less than 4 MiB, you need to add padding.

(end recap)

With all that in mind, let’s go over some potential solutions to the problem.

Option 1: Overhaul the host to operate on variable-size sectors. This prevents lots of nice optimizations in the host code that assume fixed-size sectors. There’s a very good reason why “block storage” exists: it is vastly simpler and more efficient than variable-size storage. Another issue is that this approach reveals the true size of the renter’s files to the host, which can be a significant privacy concern. (Although I suppose in practice you would still pad to 64 bytes, which is not quite as bad.) Another annoyance with this approach is that both the renter and host need to track the size of every sector, which makes lightweight renters more challenging.

Option 2: Add padding on the host side. That is, the renter says “here’s 1 KiB; assume that the rest of the sector is zeros.” The main issue here is that the renter would probably expect to pay less to store this 1 KiB file; after all, 99.97% of the sector will be zeros. But this assumes that the host can store 1 KiB much more efficiently than a full sector, and in order for that to be true, the host would have to violate its nice, efficient, fixed-size storage model. So in practice, hosts might still charge full price for padded sectors, and store fully-padded sectors on their disks. This is obviously a problem if the goal is to store millions of tiny files; it greatly decreases the host’s efficiency. This approach also suffers from the same privacy problem as option 1.

Option 3: Pack multiple files into one sector. This option was described quite well by @meije-storewise, so I won’t reiterate it here. I will note, though, that this is the only option (that I’m aware of) that doesn’t leak filesize metadata to the host.

Option 4: Pack files on the host side. Sort of a fusion of options 2 and 3; the idea is that the host maintains a special “sector buffer”, which only becomes a “true sector” once the renter has uploaded 4 MiB. One upside of this is that it avoids the pricing problem of option 2. I suspect that there is a fair amount of complexity lurking here, though; for example, what happens if you upload half a sector, then upload a full sector? Is the full sector directly written as a “true sector,” or is the first half of it appended to the buffer, then flushed, then the remaining half stored in the buffer?

I’m not sure which option is best (and there are certainly more I haven’t thought of), but I’d probably go with option 2 if I had to pick one today, in the hopes that an efficient implementation could be engineered.

Anyway, my intent here was to demonstrate that there are a number of ways we can tackle this problem that don’t involve fully-variable sector sizes. So I’d suggest that we call this feature “small file support” rather than “small sector support.”

Would be great for @Taek to weigh in here as well, since there’s probably something I missed.

1 Like

Originally we planned to small files in siad directly. You don’t need small sectors on the network, you can patch around it in software with clever segmenting and sector management in your renter. We now plan to get Skynet to support small files via a smaller sector size for a couple of reasons:

  1. The fundamental Skylink needs to be a full sector. No easy way around that.
  2. Within the renter, we have compatibility promises related to backups and snapshots, and supporting small sectors while also preserving backup and snapshot functionality is very complicated, substantially more complicated than it seems on the surface.
  3. It looks like adding small sector support to the host is not going to be as difficult as originally imagined. As far as the host is concerned, sectors only exist in 2 places - on-disk and in an in-memory map. We could add support for a smaller sector size by changing the in-memory map to also track how large a sector is, and we could add support on disk by creating a new file for small sectors. The only tricky piece of the code would be figuring out how to allocate between small sector storage and large sector storage, but in the worst case we could even just punt that up the stack to e.g. SiaCentral by having the hosts choose it.

I believe we could accomplish option 2 @nemo just by adding compression to siamux on the plaintext. Then the renter can upload a string of zeroes but the compression function (even very basic, low compute cost compression) would catch it and crush it to nothing. This saves on bandwidth but as you mentioned does not save on storage.

Changing the on-disk storage model for hosts is pretty significant, the on-disk model has huge implications for things like latency, and switching from fixed size to variable size could add a huge amount of complexity depending on how it’s added.

Option 4 also needs to consider what happens when part of a packed sector expires and the rest of the packed sector does not. For the economics of the Sia network to work out, we really need to make sure that the host has all abuse vectors covered, and option 4 opens up a ton of possibilities that are difficult to program around, I wouldn’t go down that path.

1 Like

TL;DR I think adding support for a 64kb sector to the host is the only option we have which doesn’t involve a massive amount of engineering complexity. 64kb could easily be any power of 2, but I think it would be difficult / engineering-intense to support more than 1 additional one.

2 Likes

Thanks David and Luke for the input. It appears that the interests of the community are aligned on this topic. Thus far Skynet Labs, Filebase, StoreWise and a number of other community members have explicitly stated being in favor of small sector support. Speaking for StoreWise, 64 kb sectors will help us a great deal to reduce upload bandwidth, storage expenses and increase our overall scalability. We would gladly see this proposal be incorporated into the foundation roadmap for 2021.

2 Likes