In addition to the above progress report, I would like to provide further details about what has happened so far, why certain decisions were made, and how these decisions affect the project. Furthermore, I am asking for community feedback on a few aspects and the overall direction from here.
1. About renterd_client
- the Rust client library
Although not strictly part of the initial grant proposal, I decided to move the low-level renterd API interactions into a separate library (or crate
, as they are called in Rust-Land). My reasons were:
- Separation of concerns: It is easier to reason about and work with a dedicated library.
- Better testability: renterd_client has almost 100 Sans-IO unit tests.
- Improved productivity: Working with a tested, idiomatic library streamlines development.
- Reusability: The code needs to be implemented anyway. As a standalone crate, it can be reused across projects. Future Rust projects will only need to add a single dependency to interface with renterd.
This approach required extra effort initially, but I believed that it would pay off in the end. The implementation took about two weeks, during which I had to simultaneously look at the API docs, the renterd source code, and send manual requests with curl
to double-check everything. Frankly this process was somewhat tedious, but the resulting library has been rock-solid.
Once this project progresses further, I plan to publish the library on crates.io, the main Rust crate repository, making it easily accessible to any Rust-based project. I hope this library will be a useful tool for the Sia ecosystem, providing a reusable, well-tested component that more projects can be built upon.
There is still some work to be done, as I had to modify a third-party
crate (reqwest-file) to make everything work. I intend to contribute my changes back to the original crate. However, reqwest-file appears to be very inactive, and I might need to fork it.
2. NFS
vs Sia Objects
- Semantic Differences and the “Need for State”
When I started working on the NFS gateway, my plan was to keep it basically stateless—no persistent state across restarts. This would not only simplify development but also improve the user experience. Just the binary and config, no other state. Simple.
However, when examining the differences between how NFS expects a file system to work and the reality of how Sia Objects function, I encountered a significant issue:
NFS is Inode-Based, Not Path-Based
NFS expects every inode to have a permanent ID (a 64-bit uint). Emphasis on permanent—this ID is never supposed to change. Nearly all operations use this ID to refer to the inode in question. Sia Objects, on the other hand, are path-based and lack a separate ID (at least not via the renterd object API; I’m not sure about the low-level details). This semantic gap needs to be bridged somehow. Here are a few approaches:
Idea 1: Naive Stateless Mapping
Derive an ID from the object’s path using a hasher. 64-bit is large enough to disregard the collision risk, provided a decent hashing algorithm is used, e.g., XXH3.
Pros:
- Easy to implement, no persistent state needed
Cons:
- Changing the object’s path (e.g., rename or move) will change its ID
- Worse, if it’s a directory, all underlying object IDs will change recursively
Conclusion: Disqualified
Idea 2: Manually Assigning an ID
Build an in-memory representation of the file system hierarchy by getting the full list of all objects in a bucket at startup. Assign each object an ID (using a simple counter) and store it in a data structure on the heap. Periodically resync in the background to add/remove inodes as needed.
Pros:
- Inode IDs are independent of the object’s path
- In-band modifications, such as rename/move, are immediately reflected in the data structure
- Super-fast metadata operations with no need for a separate cache
Cons:
- Slow startup as all buckets need to be read completely before the gateway is ready
- Increased memory usage
- Risk of stale metadata since resyncs won’t happen often
- Changes could lead to IDs being assigned differently after a restart
Conclusion: Disqualified
The Showstopper: Stale IDs Can Lead to Data Loss
The major issue with the above non-persistent state approaches is that IDs can change while NFS clients assume they cannot. The worst-case scenario would be a user deleting the file /tmp/useless.tmp
which the NFS client believes has ID 100413
, while sia-nfs
links 100413
to /very/important/file
, resulting in the wrong file being deleted. This scenario is unacceptable. Thus, I accepted that some persistent state is necessary.
Idea 3: Assign IDs at First Sight and Store Persistently
This approach stores the state in an embedded SQLite
database. The schema is intentionally simple. Here’s an example:
id |
entry_type |
name |
parent |
10001 |
D |
dir_name |
101 |
10002 |
F |
file_name |
10002 |
Whenever metadata is received from renterd
, it gets synced with the relevant entries in the database. New entries get assigned a new ID, missing entries get deleted, and existing entries get mapped to their ID. Importantly, this approach uses an AUTOINCREMENT
ID, ensuring IDs always increase. I’m using SQLite via the excellent sqlx crate. The database engine is embedded in the binary, along with the schema and any migration scripts. sia-nfs
remains a single binary and runs as a single process.
Pros:
- Inodes are independent of the object path and persist across restarts
- The aforementioned data-loss scenario is not possible (as long as the database file remains available)
- Fast startup time (no prefetching of objects)
- Memory usage doesn’t change significantly
- All normal file system operations remain possible
- Performance is not noticeably affected
- No additional runtime dependencies or maintenance required
Cons:
- Persistent storage is required (a couple of MiB should suffice for most cases)
- More implementation work compared to other approaches
- Larger binary, as SQLite runtime is linked into it
- Requires a C compiler when building from source, in addition to
rustc
Conclusion: Winner, winner, chicken dinner
3. NFS(v3) Read and Write Operations Are Stateless
Reading from and writing to a file in a typical file system involves the following steps:
open(inode) -> file_handle
read(file_handle, offset, buffer) -> bytes_read
write(file_handle, offset, buffer) -> bytes_written
close(fh)
However, NFS version 3 (as implemented by nfsserve) simplifies this process:
reading: read(inode, offset, buffer) -> bytes_read
writing: write(inode, offset, buffer) -> bytes_written
NFSv3 skips the open/close steps, using the inode ID instead of file handles. This design likely reduces network traffic, which is beneficial when open/close operations are inexpensive, such as in a local file system.
However, this stateless approach presents specific challenges:
Reading
Each read operation requires a full GET
request, rather than streaming the file content via a single request. Small read buffers can result in numerous tiny requests to the renterd object API, which is resource-intensive and slows down read speed.
To mitigate this, I’ve implemented a DownloadManager
. It reuses connections for the same file and offset, opening new ones as needed and closing idle connections after a period of inactivity.
While this approach theoretically reduces the problem, in practise NFS clients often read ahead, sending multiple, non-contiguous read requests simultaneously, which undermines the DownloadManager
. I’ve made further adjustments to address this behavior, but more work is needed.
Writing
Writing is more complex. Sia Objects are immutable, they can not be modified or amended. Therefore, uploads have to be done in a single operation. In a stateful environment (open, write, write …, close), this is manageable. However, NFSv3’s stateless operations require a workaround.
I’ve created an UploadManager
, analogous to the DownloadManager
, which starts and reuses uploads for matching file and offset writes. Without a close
command, the upload automatically closes after a period of inactivity.
This solution isn’t perfect. Very slow writers may cause incomplete uploads, and clients expecting immediate read access after writing will experience blocking until the upload completes.
Conclusion
This aspect of the project has been particularly challenging. The current workarounds are functional but not flawless. The UploadManager
is still in progress, and the DownloadManager
requires further refinement to handle read-ahead behavior more effectively.
Given these constraints, the solutions provided are satisfactory. NFS v4, which follows a more traditional " open/read/write/close" model, does not have these issues. Unfortunately, nfsserve
does not yet support NFS v4.
4. Editing Existing File (aka Partial Writing
)
As mentioned earlier, Sia Objects are immutable and cannot be edited or amended, only overwritten. This differs from typical file system behaviour, creating a gap that needs addressing. However, it’s unclear if implementing this is a good idea. Here’s why:
The way to emulate traditional file system behavior is to implement a WAL
(Write-Ahead-Log). When a client writes to an existing file, the data is first written to a local WAL. The WAL tracks the data and offset for all write operations until the client is finished. Meanwhile, we “overlay” the WAL over the Sia Object for the local user to create the illusion of a modified file.
After the user completes all modifications (detected again through inactivity due to the lack of a close
operation), we create a new temporary object in the bucket. We then stream data from the original object, mixed with data from the WAL, to the new object. Once done, the old object is deleted, and the temporary object is renamed, effectively replacing the old object.
This process is costly, both in implementation effort and for the user. To illustrate:
A user edits a 10 GiB file, modifying 100 bytes. The user expects a simple, quick, and cheap operation, as only 100 bytes were changed. In reality, we must download 10 GiB and upload 10 GiB and store a new 10 GiB object. This behavior is unexpected and can be costly and frustrating for users.
Request for Feedback
I am reaching out to the community for discussion on whether this should be implemented. Personally, I believe this is not a good idea. However, I welcome other opinions. If it’s decided that this is necessary, I will proceed with the implementation. Until then, I will put this on hold.
5. Potential Extension of Project Scope
Below is a rough diagram illustrating the workings of sia-nfs
:
---------------------- sia-nfs --------------------
NFS client <--> | NFS Frontend <--> VFS Layer <--> renterd_client | <--> renterd
Most of the heavy lifting is done in either the VFS Layer
or within the renterd_client
crate. The NFS Frontend
part is NFS-specific, but it’s actually only a few lines of code (aside from the DownloadManager
and UploadManager
mentioned earlier). All the complex logic is implemented in a protocol-agnostic way. At this point, adding support for additional frontend protocols is relatively simple.
Request for Feedback
The project scope has been clear from the start and was outlined in the grant proposal. However, as the project has progressed - and it has progressed better than expected, putting me ahead of my original timeline - it has become evident that adding additional protocols could be an easy enhancement.
I am now seeking feedback from the community on whether there is interest in extending the project scope to become a multi-protocol Sia gateway, instead of only supporting NFS. My proposal includes adding support for:
FTP
- due to its widespread use
WebDav
- because all major operating systems support mounting it out of the box, including Windows
Both protocols seem to have usable Rust server libraries, making their implementation straightforward. To clarify: I am not requesting additional funding, as this extra work fits within the already granted budget. This opportunity arises from the architectural decisions made and the better-than-expected progress. Adding additional protocols is essentially free. However, it does modify the project scope and would necessitate a name change (possibly to sia-vfs
).
I am opening this up for community discussion. In my opinion, this would be a beneficial enhancement. Please share your thoughts on this proposal.