Introduction
Project Name: SiaHub
Name of the organization or individual submitting the proposal: Salih Toruner
Include links to previous projects and/or brief background information about the individual or organization in 100 words or less.
Salih Toruner, an infrastructure engineer with a background in Linux administration, blockchain infrastructure, and developer tooling. He holds a bachelor’s degree in electrical and electronics engineering. His most recent grant-funded project, Stellar Command Insights (SCI), was a real-time CLI monitoring and visualization tool funded by the Stellar Community Fund. He has shipped full-stack developer tools from concept through release as a solo developer.
Describe your project.
SiaHub will be a self-hostable model hub that is wire-compatible with Hugging Face, the dominant repository for open-weight AI models. The difference from Hugging Face itself is the storage layer underneath. Where Hugging Face writes model weights to Amazon S3, SiaHub will write them to the Sia decentralized storage network. The project is intended both as open-source infrastructure under Apache 2.0 and as a hosted service I will operate commercially, so the deliverables of this grant remain a live, maintained entry point for users who choose not to self-host.
A working deployment is reachable at https://siahub.app, source at https://github.com/bytemaster333/siahub, documentation at https://docs.siahub.app, and a walkthrough video at https://www.youtube.com/watch?v=dupe_7GcJ8I.
The compatibility runs at the protocol level rather than the UI level. A user who sets the environment variable HF_ENDPOINT=https://siahub.app and then runs hf upload or hf download from the standard huggingface_hub and hf_xet clients gets byte-identical round-trips against a SiaHub deployment. No fork of the client, no patched binary, and no cooperation from Hugging Face is required. The user’s existing toolchain keeps working, but the bytes land on Sia.
The system will be delivered as a Docker Compose stack of three services. siahub-cas, written in Rust with Axum, will implement the Hugging Face Hub API surface that the hf CLI calls, including repository creation, preupload negotiation, commits, file resolution, and the Git-LFS batch protocol. Alongside that surface, it will implement the Xet content-addressable storage protocol used by hf_xet. Xet is the content-addressed transport Hugging Face uses for large weight files. It chunks each file, packs the chunks into deduplicated bundles called xorbs of up to 64 MiB each, and records the reconstruction metadata in shards so a downloader can rebuild the original file from xorb byte ranges.
siahub-gateway, written in Go with chi, will serve those byte ranges over HTTP from signed URLs, including the multi-range responses required by the Xet client. It is backed by a whole-xorb disk LRU cache, with simultaneous requests for the same content collapsed into a single underlying fetch.
siahub-console, built on Vite, React, Tailwind, and shadcn/ui, will be the operator UI for API keys, the model catalog, asset inventory, usage statistics, and a Leaflet world-map of the Sia hosts holding each xorb. A static documentation site on Astro and Starlight rounds out the surface, alongside an hf-proxy shim. The shim is a stateless reverse proxy that lets a user keep their existing huggingface.co account while redirecting only the large-file bytes through Sia, by rewriting a single response header on the path through Hugging Face.
The bytes themselves move through the official Sia SDK against a self-hosted indexd daemon, the Sia Foundation’s indexer that handles consensus, wallet, contract formation, host scoring, and sector placement. The architecture takes direct advantage of Sia’s sector-scoped byte-range download, where a download(object, {offset, length}) call pulls only the storage sectors that overlap the requested range. This is the property that lets the Xet protocol’s chunk-and-range read pattern map cleanly onto Sia.
Siahub Reference Project
Who benefits from your project?
The most direct beneficiaries are Sia hosts and the Sia ecosystem at large. The Hugging Face Hub today carries roughly 77 PB across approximately two million models, an order of magnitude larger than total active Sia network capacity. Capturing even a fraction of one percent of that workload pushes contract formation, sector placement, and recurring storage and bandwidth fees onto Sia hosts at a moment of structurally low host utilization.
The workload shape is favorable. Model artifacts are large, append-mostly, immutable after upload, read-heavy, and dedup-friendly. They align with Sia’s strengths and avoid its weaknesses. SiaHub will be the first credible AI/ML model-hosting workload on Sia and the first non–Hugging Face implementation of the Xet content-addressable storage protocol on any storage network.
Machine-learning practitioners who already use the Hugging Face toolchain gain an alternative storage substrate without abandoning that toolchain. Concrete use cases include independent researchers seeking jurisdictional resilience for open-weight models, sovereign-AI initiatives in Europe and elsewhere requiring non-US storage, university labs needing decade-scale reproducibility for published checkpoints, quantization and fine-tune farms whose deduplication savings compound across thousands of related artifacts, and researchers in geographies where huggingface.co is blocked and who currently rely on the hf-mirror.com and olah mirror infrastructure pattern.
Self-hosting operators form a third audience. The same Compose stack that runs the public reference deployment is the self-host target, so any operator with Docker, a domain, and a Sia-funded recovery phrase can stand up the project from a single git clone. This is useful to enterprises with data-residency requirements, regulated industries, sovereign-AI consortia, and academic groups operating air-gapped clusters.
How does the project serve the Foundation’s mission of user-owned data? What problem does your project solve?
The world’s open-weight AI distribution runs through one cloud region. Hugging Face stores its large model and dataset files on AWS S3 in us-east-1, fronted by CloudFront. Both the legacy Git-LFS path and the current Xet path back onto a single CloudFront-fronted bucket.
This concentration has visible costs. Models have been removed under DMCA, trademark, and payment-processor pressure, and Hugging Face has been blocked in China since May 2023, pushing researchers there onto third-party mirror infrastructure. A single legal entity, in a single jurisdiction, on a single cloud region, sits between the world’s open-weight AI and the people who use it.
SiaHub addresses this without asking users to leave the toolchain they already use. By implementing the Hub and Xet wire contracts that the unmodified hf CLI already speaks, redirecting HF_ENDPOINT becomes a one-line change in a shell. Bytes that previously flowed to S3 instead flow through indexd to shards held by independent Sia hosts. Each erasure-coded slab is encrypted with a per-slab ChaCha20 key derived client-side before any host RPC, and the file is split with Reed-Solomon erasure coding so the original can be reconstructed even if some hosts are offline.
The user-owned property is enforced at the key layer. Recovery requires only the operator’s BIP-39 recovery phrase, the standard mnemonic from which the Sia App Key is derived. There is no platform tenant, no payment-processor escalation, and no third-party key custody. The operator holds the keys, controls the contracts, and can move or migrate the data without depending on any intermediary. The project plugs into Sia exclusively through the official indexddaemon and the official Sia SDKs, which aligns with the Foundation’s April 2026 grant-thematic focus on building with SDKs and on indexd.
Are you a resident of any jurisdiction on that list? No. I am not a resident of any jurisdiction listed under FATF increased monitoring or active OFAC sanctions.
Will your payment bank account be located in any jurisdiction on that list? No. The payment bank account will be held in a jurisdiction outside the FATF and OFAC lists referenced above.
Grant Specifics
Amount of money requested and justification with a comprehensive breakdown of expenses:
The total requested amount is $60,000 USD, paid out monthly across the five-month development term against the milestone schedule below. No marketing, community, or non-development items are included.
| Category | Detail | Amount (USD) |
|---|---|---|
| Development labor | One developer, full-time, five months — Rust CAS, Go gateway, React console, Compose stack, integration tests, documentation site | $58,000 |
| Infrastructure | VPS with NVMe SSD and outbound bandwidth for the reference deployment and integration-test runner, domains and TLS for siahub.app and subdomains, and Siacoin funding for the indexdwallet covering contract formation and host pinning fees |
$2,000 |
| Total | $60,000 |
Development labor is the dominant line because the core engineering sits in service code rather than in glue or tooling. It includes Xet protocol fidelity in Rust, multi-range HTTP handling in Go, Sia SDK integration on both the write and read paths, and signed-URL minting and rotation. Across five monthly milestones, the schedule produces a verifiable artifact every thirty days.
What is the high-level architecture overview for the grant? What security best practices are you following?
The system runs as a Docker Compose stack of three SiaHub services and three off-the-shelf supporting services.
siahub-cas, built on Rust, Axum, Tokio, and sqlx, is the control plane and the only service that writes to Sia. It terminates client traffic, validates authentication, persists the catalog and Xet metadata in Postgres, calls sdk.upload followed by sdk.pin_object against indexd, and mints HMAC-signed URLs for the gateway to serve.
siahub-gateway, built on Go, chi, and the official Sia Go SDK, is the data plane. It validates those signed URLs, serves single-range and multi-range responses against a whole-xorb disk LRU cache, and on cold miss issues an offset-and-length Sia download.
siahub-console is an operator UI authenticated via GitHub OAuth, with TanStack Router for routing and TanStack Query for server state. The supporting services are a digest-pinned indexd, Postgres for the catalog, Xet metadata, sessions, and append-only usage log, and Redis for rate-limit token buckets. Public traffic terminates at nginx with auto-renewing Let’s Encrypt certificates.
The split between the CAS and the gateway is the project’s primary structural protection. The CAS holds the Sia App Key and is the only service with write authority, while the gateway runs read-only and only serves bytes for URLs the CAS has signed. Signed URLs are short-lived, rotated by key identifier so a key change does not invalidate live downloads, and rejected on the gateway with the specific status code that triggers the Xet client’s retry path.
Authentication into the CAS accepts three bearer-token shapes: opaque API keys stored as SHA-256 hashes and shown to the user only at creation, SiaHub-issued Xet tokens for native traffic, and Hugging Face Xet tokens verified against Hugging Face’s public keys for traffic transiting the proxy shim. Read, write, and admin scopes are enforced by middleware. Console sessions are server-side and revocable on logout, and GitHub OAuth uses standard CSRF protection.
On the protocol surface, xorb upload bodies are hash-verified before any Sia call so bad payloads are rejected immediately and consume no contract bandwidth. Shards are version-checked and cross-checked against the catalog. Hash encoding uses the upstream merklehash crate to avoid a silent-corruption failure mode known to affect custom implementations. Rate limits run as Redis token buckets keyed per API key and per IP, budgeted by both request count and bytes per minute to accommodate parallel large-model uploads. The Sia recovery phrase stays in the operator-owned environment file, read once at startup to derive the App Key, and never reaches the database or logs.
Timeline with measurable objectives and goals.
Development is broken into five monthly milestones totaling $60,000. Each milestone produces a verifiable artifact independently testable from prior work, with budget weighted toward the milestones carrying the heaviest deliverable load.
Milestone 1 — Foundations and Sia integration (Month 1, $12,000)
The first milestone establishes the foundational stack: Compose orchestration, the indexd integration that derives the App Key and gates startup on a healthy Sia connection, and the database scaffolding the rest of the project builds on.
Deliverables:
-
A
docker-compose.ymlbringing upindexd, Postgres, and Redis under healthcheck-gateddepends_onchains, so dependent services do not start until their dependencies report healthy. -
A bootstrap binary,
siahub-cas-register, that performs theindexdconnection handshake, derives the App Key from the operator-supplied recovery phrase, and writes it back to a shared.envfile. -
A custom readiness probe blocking CAS startup until consensus is synced, the wallet has a confirmed balance above a configured threshold, and the account holds a minimum number of contracts.
-
Postgres migration scaffolding for the auth, repo, Xet, LFS, and usage tables.
-
An
.env.exampleenumerating every required variable. -
A reproducible Sia SDK round-trip script that uploads a fixture and exercises the
download(object, {offset, length})path end-to-end against both self-hostedindexdand the hostedsia.storageendpoint.
Acceptance criteria: The Compose stack brings up cleanly against both self-hosted indexd and sia.storage, the readiness probe correctly gates startup, and the SDK round-trip script completes successfully on both backends.
Milestone 2 — siahub-cas: HF Hub API and Xet protocol (Month 2, $12,000)
The second milestone implements the Rust CAS workspace, covering both the Hugging Face Hub API surface and the Xet content-addressable storage protocol.
Deliverables:
-
The Hugging Face Hub API surface covering the routes
huggingface_hubactually exercises: repository creation, README YAML validation, preupload classification, thexet-write-tokenandxet-read-tokenmints, NDJSON multi-operation commit parsing, the Git-LFS batch protocol with both basic HTTP andxetadapters, theresolvepaths that 302 to signed gateway URLs, and the public catalog endpoints. Multi-branch and multi-tag support is included from the start, with immutable tags. -
The Xet content-addressable storage surface covering xorb upload with body hash-verification before any Sia call; the dual
POST /shardsandPOST /v1/shardspaths, since production clients use the former while the OpenAPI spec declares the latter, and registering only one breaks production silently; chunk dedup queries that let the client skip re-uploading content already on the server; and both V1 and V2 reconstruction endpoints, sincexet-coretries V2 first and falls back only on failure. -
The Sia write path as a four-stage pipeline: in-memory hash verification, cache write, enqueue a Sia pin job onto a Redis stream, and a separate stateless pinner worker calling
sdk.uploadandsdk.pin_object. A xorb state machine moves rows throughuploading → pinning → pinned, withorphanedas a terminal state for unrecoverable failures. -
The three-shape bearer-token validator, the GitHub OAuth flow, and the per-key Redis token-bucket rate limiter.
-
An automated test that exercises the unmodified
hf_xetclient against the running CAS and confirms byte-identical reconstruction.
Acceptance criteria: Automated hf_xet round-trip tests pass against the running CAS, and the server passes a schematic OpenAPI validator against the official cas.openapi.yaml.
Milestone 3 — siahub-gateway: signed-URL byte serving, multi-range, disk LRU (Month 3, $12,000)
The third milestone implements the Go gateway and closes the read path.
Deliverables:
-
Signed-URL verification using HMAC-SHA256 with constant-time compare and key rotation, so current and previous keys are both accepted during the rotation window.
-
A whole-xorb disk LRU cache that writes atomically, SHA-verifies on every write so a corrupted body never reaches the cache, and evicts by total-size budget.
-
Concurrent cold misses on the same xorb collapsed into a single underlying fetch, so simultaneous requests for the same content do not multiply Sia load.
-
Range handling covering both single-range requests, returned as
206 Partial Contentwith a correctContent-Rangeheader, and multi-range requests, returned asmultipart/byterangeswith a freshly generated boundary and per-partContent-Rangeheaders. -
Streamed response bodies with no whole-xorb buffering in memory, and client disconnects cancelling the inflight Sia call within one second via request context.
Acceptance criteria: A CAS-plus-gateway integration test that uploads a xorb, requests a byte range with a cold miss flowing through Sia and into the cache fill before the response, repeats the request and confirms the cache hit, requests an expired-token URL and confirms a 403, requests a multi-range and confirms the multipart packaging, and load-tests cold popular-xorb fanout to confirm at least a 10× reduction in underlying Sia fetches.
Milestone 4 — siahub-console, documentation site, hf-proxy shim (Month 4, $15,000)
The fourth milestone delivers the operator-facing surface, including the console that operators and end users actually see, the documentation site, and the bridge shim.
Deliverables:
-
siahub-console, built with Vite, React, Tailwind, and shadcn/ui, served behind TanStack Router for type-safe routes covering the dashboard, login, onboarding, keys, models, asset inventory, stats, host map, and an operator-only diagnostic page. -
An onboarding walkthrough that auto-populates
hf uploadandhf downloadsnippets with the user’s freshly issued API key. -
API key creation, listing, and revocation, with one-time plaintext display at creation.
-
A public model catalog and per-model detail view with downloads tracking and ready-to-copy CLI, Python, and curl snippets.
-
An asset inventory with pin-state badges, and a per-asset detail page showing the Sia object ID, referencing repos, and per-host shard health.
-
A stats dashboard with per-key usage breakdowns and a Leaflet world map plotting the Sia hosts holding the user’s xorbs.
-
An operator-only diagnostic page surfacing the health of Postgres, Redis,
indexd, OAuth, contracts, and wallet balance. -
An Astro and Starlight documentation site at
docs.siahub.appcovering quickstart, upload and download references, self-hosting, the Hugging Face bridge mode, API reference, and architecture. -
The
hf-proxyshim, a stateless Go reverse proxy in front ofhuggingface.cothat rewrites theX-Xet-Cas-Urlresponse header so repository metadata stays on Hugging Face while Xet bytes detour through a SiaHub gateway.
Acceptance criteria: An outside-tester usability check, where a fresh user signs in with GitHub, generates an API key, copies the onboarding command, runs hf upload of a small test file, sees the resulting xorbs in the asset catalog, and downloads the file back through the gateway.
Milestone 5 — End-to-end verification and live reference deployment (Month 5, $9,000)
The fifth milestone closes the loop with full integration testing and the live deployment.
Deliverables:
-
A multi-GB Hugging Face round-trip integration test: a scripted run that downloads a real
.safetensorsmodel fromhuggingface.co, uploads it to a SiaHub stack, downloads it back, and asserts byte-identical SHA-256 hashes across all files. A separate run exercises both Git-LFS and Xet code paths against a smaller corpus of seeded reference models. -
A live reference deployment at
https://siahub.app,https://cas.siahub.app,https://gateway.siahub.app, andhttps://docs.siahub.app, fronted by nginx with auto-renewing Let’s Encrypt certificates. -
A
make deployworkflow that brings a fresh Linux host from an empty.envto a public-HTTPS-reachable deployment in under thirty minutes, and is safe to re-run on the same host. -
Self-hosting documentation covering deployment, GitHub OAuth setup, the choice between self-hosted
indexdand the hostedsia.storageindexer, and backup and restore procedures.
Acceptance criteria: The full acceptance checklist: clean lints across all three runtimes, byte-identical multi-GB round-trip against a real Hugging Face model, and an external tester signing in to siahub.app, completing onboarding, running a copy-paste hf upload against a real model, and confirming round-trip bytes via sha256sum.
The Grants Committee retains the right to accept, modify, or reject these milestones.
Who is the target user for your project?
Two distinct user populations will consume the same software, plus a third running it inside their own infrastructure as a private hub.
The proximal user is the operator running a SiaHub deployment, whether the maintainer of the siahub.app reference deployment or a self-hosting team. The operator interacts with the system through the console, the bootstrap binary, and the .env file, and is responsible for the recovery phrase, OAuth client secret, Sia wallet balance, and the choice between self-hosted indexd and the hosted sia.storage indexer.
The downstream user is an existing Hugging Face user, such as a researcher, MLOps engineer, quantization-farm maintainer, or sovereign-AI team, who has installed huggingface_hub and hf_xet and runs hf upload and hf download in their workflow. For this user, SiaHub appears as a one-line configuration change to HF_ENDPOINT, after which the CLI commands they already know continue to work and the bytes they upload land on Sia rather than S3.
The third population is enterprises and research groups running the entire stack inside their own infrastructure boundary. Examples include banks, regulated industries, sovereign-AI consortia, and academic labs operating air-gapped clusters. They need data residency and jurisdictional control without giving up the toolchain their researchers use elsewhere.
What are your plans for this project following the grant?
The repository remains Apache 2.0 and openly developed, and siahub.app will continue running as a live service beyond the grant period rather than going offline once milestones are signed off. The hosted deployment will be sustained through usage-based monetization, including per-tenant rate-limit tiers and paid storage and bandwidth allowances built on the metering hooks scoped into Milestone 2. Self-hosters retain full feature parity under Apache 2.0; the hosted service competes on operations, not on withholding features.
Automated tests exercise the running CAS against unmodified hf_xet clients on every change, so protocol drift against new xet-core and indexd releases is caught early.
Potential risks that will affect the outcome of the project:
The most consequential risk cluster lives in xet-core protocol details and stability. The repository explicitly disclaims stability on its internal crates, and the hf_xet wheel often changes ahead of the OpenAPI spec. Specific failure modes include a hash-encoding subtlety where a custom implementation produces silent corruption, the dual /shards and /v1/shards registration where missing one breaks production, and V2 reconstruction silently corrupting client deserialization without proper multi-range support. Mitigation pins xet_core_structures at exact versions, registers both shard paths explicitly, gates V2 reconstruction behind a feature flag that only flips on after multi-range support is verified end-to-end, and uses the upstream merklehash crate as the single source of truth for hash conversion.
A second risk is indexd first-run sync time. Chain sync, contract formation, and host approvals can run from minutes to multi-hour windows, and declaring the service ready prematurely produces failing CAS startups. The readiness probe gates explicitly on consensus sync, a confirmed wallet-balance threshold, and a minimum contract count, and the documentation sets the expectation explicitly. Operators who do not want to fund a wallet can point at the hosted sia.storage indexer instead.
Development Information
Will all of your project’s code be open-source?
Yes. The entire codebase will be released under the Apache License 2.0. This includes siahub-cas, siahub-gateway, siahub-console, the hf-proxy shim, the documentation site, the bootstrap utility, and all Compose, migration, and operator scripts.
Third-party runtime dependencies are all OSI-licensed and Apache-2.0–compatible. The xet_core_structures crate from Hugging Face, licensed Apache 2.0, is used for hash encoding and shard parsing. The xet_client crate is used as a development dependency in the test suite only. The official Sia Go and Rust SDKs from the Sia Foundation, licensed MIT, are runtime dependencies in the gateway and CAS respectively. The indexd daemon ships as a digest-pinned upstream container image.
Leave a link where code will be accessible for review.
Source code is available at https://github.com/bytemaster333/siahub. The repository is licensed under Apache 2.0 and will continue to host all code developed under this grant.
Do you agree to submit monthly progress reports?
Yes. I will submit a monthly progress report on forum.sia.tech covering that month’s milestone deliverables, links to merged commits and tagged releases, and any pitfalls or scope adjustments encountered.
Do you agree to designate a point of contact for committee questions and concerns?
Yes. Salih Toruner - [email protected]
Provide links to previous work or code from all team members.
- GitHub - bytemaster333/siahub: xet-compatible model hub on sia · GitHub
- GitHub - bytemaster333/Arbisight: Real-time CLI Telemetry & Alerting for Stylus on Arbitrum. Visualize command usage, catch errors instantly, and monitor developer flows with ease. · GitHub
- GitHub - bytemaster333/stellar-command-insights-release: Pre-packaged releases and installation scripts for Stellar Command Insights (SCI). · GitHub
- GitHub - bytemaster333/Hashirama: Hashirama: The Starknet Appchain Orchestrator A production-grade Kubernetes Operator to deploy and manage Starknet Madara L3 chains with a single YAML. Includes a built-in UI dashboard. 🏯🚀 · GitHub
Have you developed a proof of concept for this idea already?
Yes. A reference implementation is available at https://github.com/bytemaster333/siahub and reachable live at https://siahub.app, with a walkthrough video at https://www.youtube.com/watch?v=dupe_7GcJ8I.
This reference implementation is not the deliverable of the proposed grant. The grant’s deliverables are the production-grade system described in the milestone schedule above, built and verified across the five months under the budget and acceptance gates described. The reference is offered as evidence that the technical claims are tractable.
Do you agree to participate in a demo at our grants committee meeting at significant milestones or after the grant’s completion, or to work with our grants team to help showcase your project to the community?
Yes. A walkthrough video covering the hf upload and hf download round-trip against siahub.app and the operator console is available at https://www.youtube.com/watch?v=dupe_7GcJ8I, and I will demo live at the Committee’s request.
Contact Info
Email: [email protected]
Any other preferred contact methods:
- Discord: @salihtoruner
- Linkedin: https://www.linkedin.com/in/salih-toruner-4919b3212/








