My understanding of sia storage is that it uses volumes which are actually huge files. Is that correct ? Obviously, the data in that file is structured in some way since transfers and files stored are chunked at least. So basically each volume file constitutes a filesystem.
So my question is, why use that scheme ?
Filesystems such as ZFS already abstract the hardware, so whatever “volume” is created in hostd has no relation to the actual hardware. RAID, growing of the filesystem, caching, error-correction, compression, striping across hard drives etc. all that is already handled by the filesystem.
So we have a double layer of filesystems at work. That will just make things worse performance-wise and even could lead to some pathological cases.
In the case where the underlying filsystem is known to be very inefficient, and if we know hostd’s implementation is better, then ok sure letting it use even raw devices might make sense. That is what some databases like oracle do actually. But those are systems on a way different level. I think for a linux/BSD based system, it would be better to leverage the OS filesystem capabilities rather than reinventing the wheel.
I admit I have not looked at the code in depth (I am completely new to go actually), so I do not know for sure how hostd actually stores things in each volume, but I am pretty certain that if each chunk is at least several MB and is just written as a separate file, maybe in a not too wide directory tree, then the native OS filesystem can do just as well if not much better.
Also, I saw some references regarding striping volumes for better throughput. For the reasons above, I think that is a complete waste of effort that would be put to much better use advancing the core sia tools and server. It is also almost garanteed to degrade performance on a system that already does striping at the filesystem level or even hardware.
My opinion is overall:
- use some shallow tree with individual files starting at whatever folder is designated as volume
- if volumes are to be used for whatever reason, maybe small system with a couple drives and without advanced filesystem like ZFS or other, then make that optional. Also any features like striping across volumes compression, etc. make that completely optional.
Again, feel free to elaborate on where and how I am wrong, all opinions welcome.