Small Grant: Sia Virtual Block Device (sia_vbd)

Ha, replicate is certainly too strong a word. I wrote in the proposal:

and later when I clarified things

I chose to compare how sia_vbd structures data to how Git works because I expect most technical people, especially developers, to be familiar with Git. This should help convey the concept more clearly. A commit in Git is very similar to what I referred to as State earlier. I even thought about naming it Commit, but felt that might lead to confusion. I’m still on the lookout for a good name to capture this concept.

Just like a Git commit refers to a specific state of the repository, a sia_vbd State refers to a specific state of the virtual block device—complete with an ID and everything. It offers similar benefits, such as deduplication, snapshotting (tagging), and branching. Importantly, nothing is overwritten; we just clean up unused data later during garbage collection, ideal for how Sia’s object storage works.

Being aware of where data is located and ensuring quick access when needed is one of the core competencies of sia_vbd. For example, a block could be:

  • Already buffered in our heap, ready to use
  • In the local disk cache
  • In the local write-ahead log (WAL)
  • Packed in one or more chunks, accessible via renterd
  • Nonexistent

sia_vbd sees the big picture and does its best to retrieve blocks in the most efficient manner possible. This fairly comprehensive understanding is crucial for handling the biggest challenges in making sia_vbd usable in practice. Let me explain further:

A naive approach could look like this:

Reading:

  1. A read request comes in to read 1200 bytes at offset 47492749221.
  2. We calculate the block number(s) and relative offset(s), then request the data directly from renterd.
  3. The data is streamed directly to the nbd client.

Writing:

  1. A write request is received to write [0x4e, 0xab, 0x01 …] to offset 47492749221.
  2. We calculate the affected block(s) and download the associated object(s) from renterd.
  3. The block(s) are modified based on the data from the write request.
  4. We delete the object(s) downloaded in step 2 via renterds API, as they are now outdated.
  5. The new block(s) are uploaded and stored as new object(s) with the same name as the ones we just deleted.

This approach is certainly enticing—it’s easy to understand and straightforward to implement. However, while this method would work technically, it quickly collapses under real-world conditions.

Here’s why:

Latency and Throughput

Reading from (and to a smaller extent writing to) the Sia storage network is a high-latency affair that can vary wildly—it’s the nature of the beast. This is especially pronounced when reading lots of small objects; the Time to First Byte can sometimes take seconds if you’re unlucky. We would end up with a block device whose throughput could be measured in KiB per second, making it impractical in practice.

sia_vbd will do its best to avoid this by trading off implementation simplicity for lower latency and higher throughput:

The main aspects to achieve this are:

  1. Blocks are not tightly coupled to their “location”; instead, they are identified by their content (hash).
  2. Blocks are heavily cached locally.
  3. New, previously unknown blocks are first committed to the local Write-Ahead Log (WAL) before being batch-written to renterd, packed in Chunks.
  4. Because sia_vbd has the full picture, it can anticipate the need for a certain block before it is requested and prepare it ahead of time (e.g. read-ahead).
  5. Again, because of this full understanding, sia_vbd can rearrange the read queue and serve requests for blocks we have available locally.
  6. Further, blocks can be prepared in the background while read requests are waiting in the queue and then served in order of availability.

There are limits, of course, and we cannot make the latency-related limitations go away completely. A significant part of the development time will be dedicated to this. It will require a lot of testing and fine-tuning to get it to the point where it works well enough for at least the most typical workloads. In the paper I linked to above, latency is specifically mentioned as the most time-consuming aspect of their implementation, and they mention “nearly 6 ms” when testing using S3. sia_vbd has to deal with latencies that are at least one or even two orders of magnitude higher—with much worse edge cases! Aggressive methods are required to get this to work.

Scaling

sia_vbd needs to be able to handle multi-TiB-sized block devices without breaking much of a sweat—a TiB is not as big as it used to be.

A naive 1 object per block approach would quickly lead to millions of tiny objects that need to be managed by renterd. The overhead would quickly become overbearing:

Example:

  • Block Size: 256 KiB
  • Blocks needed per TiB: 4,194,304
  • Sia Objects per TiB: 4,194,304

An even more naive approach could use a block size identical to the advertised sector size of the virtual block device. This would make it even easier to implement because every read/write request would exactly map to a single block. However, it would look even more extreme on the backend:

  • Block Size: 4 KiB
  • Blocks needed per TiB: 268,435,456
  • Sia Objects per TiB: 268,435,456

So, the most direct approach 1 sector == 1 vbd block == 1 sia object would require a whopping 268 million objects to store!

Clearly, this is not going to scale very far. That’s why the design of sia_vbd stores multiple blocks packed together into Chunks. Here is how the above looks with Chunks:

  • Block Size: 256 KiB
  • Chunk Size: 256 blocks
  • Blocks needed per TiB: 4,194,304
  • Chunks needed per TiB: 16,384

Approximately 16,000 objects are much more manageable than the numbers we saw with the simpler approaches. This design trade-off allows sia_vbd to scale to actually usable block device sizes at the cost of needing the Chunk indirection.

Initial Storage Size

When creating a new virtual block device, sia_vbd needs to initialize the whole structure. Without the deduplication properties of sia_vbd’s design, a naive approach would require writing the full amount of data, even if it’s all the same, like 0x00. For instance, if we create a new 1 TiB device, we would need to write a full TiB of blocks containing nothing but [0x00, 0x00, ...]. This would not only be very slow but also very wasteful, as a full 1 TiB of data would need to be written to and stored on the Sia network.

By making blocks content-addressable, immutable, and not tightly bound to their location, sia_vbd gains deduplication ability. When creating a new device, we end up with a structure that looks somewhat like this for a 1 TiB device:

  • 1 Block (256 KiB of 0x00) with ID 86bb2b521a10612d5a1d38204fac4fa632466d1866144d8a6a7e3afc050ce7ae (Blake3 hash)
  • 1 Cluster (256 references to the block ID above) with ID cac35ec206d868b7d7cb0b55f31d9425b075082b (Merkle Root of Block IDs)
  • 1 State (16384 references to the cluster ID above) with ID afe04867ec7a3845145579a95f72eca7 (Merkle Root of Cluster IDs)

The Block will be stored in a single Chunk, taking up only a few bytes due to compression. There will be a single Cluster metadata object taking roughly 8 KiB of storage (32 bytes per Block ID * number of blocks, plus headers). A single State metadata object will take about 512 KiB of storage (32 bytes per Chunk ID * the number of chunks, plus headers).

Compared to the naive approach, sia_vbd can initialize a new 1 TiB block device in a few milliseconds. It will only require about 530 KiB of active storage in its empty state, compared to 1 TiB when using the naive approach.

I hope this makes it clear that the approach sia_vbd takes was chosen with care and the trade-offs are well worth it compared to a simpler approach. The naive approach would just not be very practical in real-world situations.

Yes, but in your point 6, you state a database, which I am questioning in the design. I am not speaking about anything else. A DB adds more logic and another element to take care of.

With this said, don’t take my comments the wrong way. I am very interested in seeing sia_vbd out. I just have a hard time finding your logic optimal.

Your naive 1 approach is a little unrealistic, but I understand you did it that way to make a point. Are you aware of the 40MB optimal object in SIA (default setting, there are discussions on how to change it)? Like the sectors, this means that a 1-byte file in SIA will use 40MB. With this said, I believe your block size should be around 32MB (the power of the close 2 to 40MB). If you use a hashing function to find the SIA object, you can map sector X, offset Y, and read Z bytes to /sia_bucket/x1/x2/x3/chunckX@leafY. Like a tree but using directories.

I am passing on this knowledge from my experience with FUSE, which is a little different from NBD. Fuse has a write/read() function that requires path, offset and size. I was able to do an L2 caching that writes 1 or 2 MB blocks (these could be your chunks) and the rw() functions were able to find my cached block and calculate the offset, then the data to return.

Also, have you thought about the filesystems on top of the NBD? BRTFS, which is starting to be a standard (as it is declared stable in Linux 5.10.x if not mistaken) has deduplication and compression out of the box. ZFS has it as well. I think your 1TB zeroes write file example could be well covered by having one of these filesystems on top of a sia_vbd.

I look forward to seeing your project approved.

Would you try to implment sia_vbd with ublk ?
https://github.com/ublk-org/ublksrv

When I referred to Database, I didn’t mean a specific implementation approach. Maybe Inventory is a better term and doesn’t carry the same associations. Regardless, this is purely an implementation detail for later.

sia_vbd will operate as a single process in userspace without any external dependencies—such as an RDBMS—except for common system libraries like libc, just as stated in the proposal. Internally, it will likely use SQLite in some capacity, just like renterd does.

Yup. And the way renterd reduces the impact this has on storage efficiency through Upload packing is a great example of trading off implementation simplicity for efficiency.

Users are certainly free to use the block device however they see fit. That being said, not all possible uses will be practical. For example, adding a sia_vbd device to a ZFS zpool is probably not the best idea—but I also don’t anticipate this to be very common.

For now, the focus will be on more typical scenarios. One use case where I believe the Sia storage network can truly shine is offsite backups. sia_vbd 's ability to create snapshots/tags virtually for free is great for this.

Appreciate it!

For this, I have already done SIAFS.

Ha, interesting that you mention ublk . I’ve been wanting to experiment with it since I stumbled upon the official Rust library earlier this year. Once it matures a bit more, I’ll definitely dive into it.

However, for this project, ublk isn’t really relevant. sia_vbd handles the server-side (aka target) of a network-accessible block device. The clients (initiators) make it appear as a local device, and those already exist.

The first part of the project involves exporting the virtual block device via the nbd protocol. Linux has had an in-kernel nbd client for ages and Windows has an installable driver that acts as an nbd client.

The second part of the project aims to implement the iscsi protocol, which has even broader existing support. However, it’s significantly more complex and will require extensive compatibility testing, so that’s out of scope for now.

In the future, adding ublk as an additional way to access sia_vbd could be an option if it makes sense. It would eliminate the network layer, making it potentially more efficient. On the downside, it’s Linux-only and limits the virtual block device to the same machine that runs sia_vbd .

I am glad to receive your reply.

I have seen your siafs, great work.

ublk can be a virtual block device in user mode. Can it be mapped to the client through NBD protocol or iscsi protocol (such as SCST) on the upper layer?

I look forward to seeing your project approved.

ublk is a fairly recent addition to the Linux kernel. It’s a building block (no pun intended) that allows implementing block devices in userspace in a performant and efficient manner. It could be used, for example, to implement an nbd client in userspace—this is actually one of the examples on the official ublk GitHub:

As I mentioned in my previous reply, Linux has had a native in-kernel nbd client for a very long time, so ublk isn’t really relevant for sia_vbd.

Thanks for your latest proposal to The Sia Foundation Grants Program.

After review, the committee has decided to approve your proposal. Your provided info was excellent, and your linked Github repo from your previous grant with the comprehensive guide on how to install and run was appreciated.

We’ll reach out to your provided email address for onboarding. Program onboarding can take a few weeks to complete, so please adjust your timelines accordingly. Congratulations!

November 2024 Progress Report

What progress was made on your grant this month?

  • Studied the NBD protocol specification.
  • Examined several existing open-source implementations to improve understanding.
  • Logged network traffic between existing NBD clients and servers.
  • Utilized Wireguard to inspect traffic and gain deeper insights into the protocol.
  • Developed an initial, basic implementation of an NBD server.
  • Successfully tested the server against Linux’s built-in NBD client. The client can:
    • Connect
    • Handshake
    • Negotiate session details
    • Transition to the Transmission Phase
    • Read, Write, Flush (with a stub backend)
    • Orderly end the session

Links to repos worked on this month:

What will you be working on this month?

  • Complete the full implementation of the NBD server, including support for the more modern protocol variant (Structured Reply).
  • Begin development on the actual Sia backend.
  • Release Milestone 1.

Hello,

Thank you for your progress report!

Regards,
Kino on behalf of the Sia Foundation and Grants Committee

Milestone 1 Released

Milestone 1 of sia_vbd, the first public release, is now available!

This version is a preview. While it doesn’t yet include a renterd-backed persistence layer, it brings in all the key design elements discussed above:

  • Fully Usable Virtual Block Device
  • Content-Addressable and Deduplicated Blocks, Clusters & States
  • Transactional Writes: Every Commit leads to a new, addressable state
  • Easy Branching & Snapshotting is possible

All of these features are implemented in a way that aligns with the typical expectations of block device consumers.

sia_vbd includes a brand-new, purpose-built NBD server. The NBD protocol has evolved significantly since my last implementation. The main new features are Structured Replies and Extended Headers, which significantly enhance the protocol’s functionality and performance. However, none of the existing Rust NBD server libraries support these newer features.
So I decided to develop a new implementation from scratch. It took a bit more time upfront, but the outcome has been great. The new server includes advanced features like:

  • Structured Replies
  • Extended Headers
  • Multiple Connections
  • Extended Handshake Options: Such as block size preferences
  • Optimal Zero Handling

Despite these enhancements, the server remains fully backwards compatible with older clients. During development, it has been continuously tested against Linux’s built-in client - which does not yet support the newer protocol features - and nbdublk, a modern userland NBD client that does support the latest protocol enhancements.

Designed to reduce latency, performance has been excellent. Both ext4 and xfs have been used during testing, and both work seamlessly.

What’s Next

The next release will feature a fully functional, renterd-backed backend.

Git Repository

December 2024 Progress Report

What progress was made on your grant this month?

  • Milestone 1 was released. For detailed information, refer to the release announcement above.
  • The NBD server implementations has been completed, including full support for Structured Replies and Extended Headers.
  • Intensive testing against Linux’s built-in NBD client and the more modern nbdublk was conducted.
  • The key elements of sia_vbd’s design were implemented
  • A fully functional virtual block device with an in-memory backend was released

Links to repos worked on this month:

What will you be working on this month?

  • The persistence layer will be implemented
  • Multi-tier caching will be made available
  • The renterd backend will be completed
  • The release of Milestone 2!

Hello,

Thank you for your progress report!

Regards,
Kino on behalf of the Sia Foundation and Grants Committee

Hey @rrauch, this is a reminder to submit your progress report for January.

Apologies for the delay, I spent the last few weeks in heavy crunch mode in order to get the release out.

Here is my belated progress report:


January 2025 Progress Report

What progress was made on your grant this month?

  • Developed a data serialization format that serves as the foundation for all permanently stored data.
  • Implemented the Write-Ahead Log (WAL) with support for transactional writes.
  • Created the Inventory system to keep track of all components.
  • Built Chunks with Zstd compression and indexing.
  • Implemented the Repository for centralized storage.
  • Updated the renterd_client Rust library, adding necessary functionality for the Repository.
  • Added automatic crash recovery.
  • Enabled automatic syncing of the repository.
  • Introduced WAL Garbage Collection.
  • Rewrote Chunk indexing to overcome HTTP header size limitations by introducing Manifests.
  • Testing, testing and more testing.
  • Developed a user-friendly CLI for easy volume creation and deletion.
  • Added support for Docker and systemd.
  • Released Milestone 2.

Links to repos worked on this month:

What will you be working on this month?

  • Implement caching, which is the most critical feature still missing.
  • If time allows, add Chunk Garbage Collection.

These last few weeks have been extremely busy, and I had to deal with a significant setback.
But in the end, a lot of progress was made. The difficult parts are all done, and I don’t expect any major issues with the remaining functionality.

Milestone 2 Released

Milestone 2 of sia_vbd, the first beta release, is now available!

This version is almost fully functional:

  • NBD (Network Block Device) support
  • Cross-Platform: Runs on every platform where renterd is available.
  • Immutability: Writes never modify existing data; any change leads to a new overall state (Snapshot). Previously
    held data remains available (until eventual garbage collection).
  • Content-Addressed Storage: All data is hashed and identified by its content ID for integrity and deduplication.
  • Content Compression: Transparently compresses content (Zstd) before uploading.
  • Transactional Writes: Atomic writes with automatic rollback on failure.
  • Write-Ahead Logging: Records transactions in a local, durable WAL before being committed to eventual storage.
  • Crash Tolerant: Detects when the local, WAL-recorded state is ahead of the committed backend state.
  • Background Synchronization: Continuously uploads new data to the backend in the background, allowing fast writes
    and avoids blocking reads.
  • Multiple Block Devices and Backends: Supports multiple block devices, across one or more renterd instances.
  • Single Binary, Single Process: Delivered as a single, self-contained binary that runs as a single
    process, making deployment easy and straightforward.
  • Highly Configurable: While coming with reasonable default settings, sia_vbd offers many additional options to
    configure and fine-tune.
  • CLI Interface: Includes an easy-to-use CLI for common operations.
  • Docker and systemd support,

However, some functionality is still missing:

  • Caching: Caching is not yet implemented. Without caching, most data must be re-read multiple times from
    renterd, resulting in very slow performance due to the high latency of each read operation. Performance
    will improve significantly once caching is in place.
  • Garbage Collection: Garbage collection is currently not available, causing volumes to grow indefinitely.
    Implementing GC will allow obsolete data to be deleted over time.
  • Resizing: Block Devices can not be resized for the time being.
  • Branching CLI Support: Although branching functionality has been implemented, users currently cannot interact with
    it. CLI functions will be added to enable branch operations.
  • Tags: Tagging is not currently supported.

Test Drive

A Docker image is available to give it a quick try:

docker pull ghcr.io/rrauch/sia_vbd
docker run -it --rm ghcr.io/rrauch/sia_vbd --help

This release lacks caching, so performance will be much slower compared to the upcoming release.

More details about how to use sia_vbd can be found here:

1 Like

Milestone 3 Released

Milestone 3 of sia_vbd, the second beta release, was released a few days ago!

This version added the most important functions that where still missing in the previous release:

  • Caching: The previous release lacked any sort of caching, so performance was very slow. M3 contains a persistent caching layer for both, metadata as well as block data. The cache is configurable and is structured into 2 levels: L1 (in-memory) and L2 (on-disk).
  • Garbage Collection: Due to the lack of GC in the previous release, volumes would grow indefinitely. In this release automatic garbage collection is performed in the background. Unreferenced data will be deleted eventually.

Milestone 3 is feature complete with the exception of the following:

  • Resizing: Block Devices can not be resized for the time being.
  • Branching CLI Support: Although branching functionality has been implemented, users currently cannot interact with it. CLI functions will be added to enable branch operations.
  • Tags: Tagging is not currently supported.

The next release will contain these missing functions and will be feature complete.

Test Drive

The Docker image has been updated and is available here:

docker pull ghcr.io/rrauch/sia_vbd
docker run -it --rm ghcr.io/rrauch/sia_vbd --help

Detailed usage instructions and examples can be found here:

February 2025 Progress Report

What progress was made on your grant this month?

  • A persistent caching layer was added to significantly improve performance.
  • Implemented two-level caching: L1 (in-memory) & L2 (on-disk)
  • Added configurability for the cache. Resource limits and file system path can be configured on a per-volume basis.
  • Enabled automatic tracking of unreferenced (obsolete) data and metadata.
  • Introduced automatic background garbage collection
  • More testing was performed
  • Additional sections have been added to the README, with a detailed list of all configuration options, as well as explanations of the concepts behind sia_vbd and the terminology used.
  • Released Milestone 3.

Links to repos worked on this month:

What will you be working on this month?

  • Implement the last missing features: Resizing, Branching & Tagging
  • Release!

Version 0.4.0 Released

Version 0.4.0 of sia_vbd is out!

Progress Since the Previous Release

This release adds all remaining features that were still missing in the previous release:

Branching

Volumes can have more than a single branch. New branches can be created from any existing branch, tag or commit. Branches can be instantiated, modified and deleted without affecting the state of any other branch. Please note: Only one branch can be active at any given time.

sia_vbd branch --help

Tagging

Tags are very similar to branches and can also be created from any existing branch, tag or commit. The main difference is that tags cannot be instantiated. However, they can be used as a source of a new branch. Any data associated with an existing tag is guaranteed to not be garbage collected.

sia_vbd tag --help

Resizing

Existing Volumes can be freely resized with the CLI. Resizing only works while the Volume is stopped. Please be careful when shrinking: any data beyond the shrink-point will be lost!
Resizing only affects the selected branch, so it’s possible to create a tag or branch before resizing and roll back in case of accidental data loss.

sia_vbd volume resize --help

Get sia_vbd

sia_vbd is available from its Github Repository:

The Docker image has been kept up-to-date and is available here:

docker pull ghcr.io/rrauch/sia_vbd
docker run -it --rm ghcr.io/rrauch/sia_vbd --help

Detailed usage instructions, including configuration options and examples can be found in the Readme.

Caveat

sia_vbd does currently NOT support the recently released renterd version 2 due to a number of breaking API changes.