Small Grant: Chi-voice pilot

Princess · January 14, 2026, 9:57am

Project Name

Chi Voice (Pilot): Community-Collected Multilingual Audio Dataset

Nmae Of The Organization Or Individual Submitting The Proposal
Princess Innocent

Describe Your Project

Overview

Chi Voice is a lightweight pilot platform for collecting and organizing short audio recordings of underrepresented and indigenous languages. Native speakers contribute spoken translations of simple English prompts (words, phrases, or sentences), creating a structured, ethically sourced audio dataset for linguistic research and early-stage speech AI development.

Many languages in the Global South lack even minimal speech datasets, not because of complexity but because of tooling barriers. Chi Voice focuses narrowly on collection, labeling, and verifiable storage of audio samples, rather than attempting to solve long-term hosting, model training, or large-scale distribution in this phase.

This proposal funds a small, well-defined pilot that demonstrates how Sia can be used as a content-addressed archival backend for reproducible language datasets, while keeping application logic intentionally simple.

https://chivr.tech/

This grant focuses on hardening and formalizing the prototype into a clean, auditable dataset pipeline.

Who Benefits

Linguists & language researchers – access to rare, labeled speech samples
AI & NLP developers – bootstrapping data for low-resource languages
Educational institutions – open datasets for study and preservation
Language communities – representation and digital preservation of spoken heritage

Technical Scope

Architecture Philosophy

Chi Voice is intentionally designed as a centralized web service (Web2-style) that uses Sia as a decentralized, content-addressed storage layer, not as a fully decentralized application.

This keeps the system simple, auditable, and aligned with Sia’s technical reality.

Storage & Data Flow

Audio files are uploaded to Sia using the new S5 TypeScript gateway client as a thin integration layer.
This library provides a stable, developer-friendly interface for content-addressed uploads and CID resolution while delegating all storage guarantees and contract management to renterd.
Chi Voice does not rely on S5 for peer-to-peer networking or experimental protocol features; it is used strictly as an application-layer client for interacting with Sia-backed storage.

The application never claims perpetual or automatic storage guarantees
- Storage is budgeted, renewable, and explicit

Metadata Stored Per Recording

Language name
Language code (Glottolog)
Prompt (word / phrase / sentence)
CID (content-addressed pointer to audio)
Optional transcription (when available)
Timestamp

Public REST API (Read-Only)

A simple REST API exposes metadata and storage pointers:

Example query parameters

language
language_code
type (word / phrase / sentence)
date_range

Response

{
  "language": "Babanki",
  "language_code": "baba1266",
  "prompt": "Good morning",
  "cid": "sia://...",
  "recording_text": "optional",
  "created_at": "2025-01-12"
}

Researchers resolve audio files directly via CID through standard gateways.

The API remains read-only, reducing scope, cost, and operational risk.

API Access Model (Freemium)

Chi Voice provides API access using a lightweight freemium model:

Free Tier

API key for all registered users
Generous free quota (e.g. 1,000 requests/month)
Designed for students, linguists, and small research projects

Paid Tier

Higher request limits for institutional or commercial users
Enables sustainable maintenance without restricting access to data

Enforcement

API keys tracked per user
Simple usage counters and quota enforcement
No complex billing or on-chain logic in this phase

Why Sia Is the Right Fit (Pilot Framing)

Sia is used as a verifiable, content-addressed archival layer, not as a promise of perpetual storage.

Each audio file is uniquely identified by its CID
Researchers can verify dataset integrity independently
Storage costs are predictable and budgeted in advance
Contracts can be renewed transparently as the dataset grows

This approach mirrors how researchers already treat physical archives: explicit funding, explicit renewal, and auditability.

How Does The Project Serve The Foundation’s Mission Of User-owned Data?

1. Decentralized Preservation of Cultural Knowledge

Indigenous languages are disappearing faster than they can be documented. By using Sia:

We store cultural data securely and immutably.
Consolidating required dataset for model training.
The project demonstrates how Web3 tools can protect heritage not just finance, and we hope the Foundation sees it’s value and potential.

2. Data Ownership for Indigenous Contributors

Chi is designed so that native speakers contribute voice recordings with full knowledge and consent — and their contributions are stored on Sia’s decentralized network.

This ensures:

Transparency: Contributors can verify and access the content they help create.
Autonomy: No corporation, government, or institution can lock or alter the cultural data once it’s on Sia.

3. Model for Future Decentralized Datasets

Chi Voice will serve as a replicable framework for other regions and cultures to follow.

By showing how Sia can power large-scale, ethically sourced voice datasets, we:

Encourage developers and researchers to use Sia for decentralized data hosting
Create momentum for a new standard of AI dataset sovereignty

Are you a resident of any jurisdiction on that list? No
Will your payment bank account be located in any jurisdiction on that list? No

Grant Specifics

Amount of money requested and justification with a reasonable breakdown of expenses:

Total Requested: $10,000

Item	Amount (USD)
Developer fees	$8,000
Open AI (gpt5) API fees (estimated 25 tokens per output $75/m)	$1,000
Web Hosting & Storage fees (Just first 12 months)	$1,000

What are the goals of this small grant? Please provide a general timeline for completion.

Our goals are:

Improve the existing web-app and user interface.
Integrate Sia storage via the S5 typescript client
Develop a simple public API for developers access to recordings library

Month 1

Finalize data schema
Integrate Sia storage via the S5 typescript client
Generate stable CIDs
Test storage integration

Month 2

Build public read-only REST API
Implement API keys + quota enforcement
Test API functionality

Month 3

Public dataset release
Documentation and example queries
Test and Release

Risks & Mitigations

Low participation in rare languages
- Targeted outreach and focused prompts
Audio quality variance
- Client-side recording guidance
Metadata errors
- Community review and duplicate sampling
Connectivity issues
- Short recordings and retry-friendly uploads

Development Information

Will all of your project’s code be open-source? Yes
Leave a link where code will be accessible for review:
https://github.com/Chi-voice/chivr

Do you agree to submit monthly progress reports?
Yes — we will submit reports on our progress here on the forum.

Contact info

Email:
[email protected]

[email protected]

mecsbecs · January 16, 2026, 2:11am

Thank you for your proposal @Princess! This will be presented at next Tuesday’s Grants Committee meeting and a response will be posted here before the end of next week.

pcfreak30 · January 16, 2026, 2:18am

Something I would consider for evaluation is the focus on renterd here even though its indirect via S5. Ironically redsolver requesting a grant for Vup, which implicitly solves the indexd aspect for S5, means its somewhat moot.

But there are inner-ecosystem dependencies here that should be taken into account, and the grant should not focus on renterd if it cannot be adapted to indexd with minimal effort (and that might be a redsolver question?).

Lastly be sure focusing on S5 is with the TS and rust S5 iteration that resdolver & jules just completed, and not the legacy v0 version from 2024.

Kudos.

Princess · January 20, 2026, 2:47pm

your insight is always appreciated.
we’re in the right ball park
I’ll be working with the S5 TS client jules developed and recently completed

mecsbecs · January 22, 2026, 10:10pm

Thanks for your continued interest and proposal to The Sia Foundation Grants Program.

After review, the Committee has decided to approve your proposal. Congratulations! They’re excited to see what you can accomplish with this grant.

We’ll reach out to your provided email address for onboarding. Onboarding can take a couple of weeks, so prepare to adjust your timelines accordingly.

Princess · January 24, 2026, 10:10pm

Thanks for the feedback @mecsbecs

I’m looking forward to kicking off.
I’ll be looking out for the mail.
Also I just added my second email. my school-provided email will stop being used soon.

mecsbecs · January 28, 2026, 7:52pm

Hi @Princess - forwarded the onboarding email to the second email address on Monday and I’ve just sent a follow-up email. Please reply so we can get started on your onboarding.

Princess · January 30, 2026, 11:43pm

Hello @mecsbecs

the identity verification link showed it’s expired when I tried accessing it

mecsbecs · February 2, 2026, 2:55pm

Hi @Princess - a new one has been emailed to you.

mike76 · February 3, 2026, 5:44am

Hi @Princess, you may want to exchange any private information via email further on. No need to do it on the Forum.

mecsbecs · February 26, 2026, 6:51pm

@Princess - your monthly progress report was due yesterday on February 25th.

Please post your progress report ASAP following the template here in order for your technical review to be conducted in a timely manner.

Princess · March 1, 2026, 12:16pm

February Progress Report

What progress was made on your grant this month?

Please summarize your progress in 3-5 sentences or bullet points:
Replace this

Set-up s5 node for Chi-voice (https://chivr-app.com)
Renterd instance running on zen
Test uploads/downloads to renterd via s5

Detail tasks worked on this month per milestone with the appropriate Pull Request(s) links as outlined in the guide

Milestone	Tasks	Pull request (s)	Aditional notes
S5 Integration	Integrated S5 to handle uploads and downloads	feat: S5/Sia decentralised archival integration by Princessinn · Pull Request #1 · Chi-voice/chivr · GitHub	file uploads/downloads were successful with cid returned

Summarize any problems that you ran into this month and how you’ll be solving them.

Please summarize your issues into a few sentences or bullet points:

I had no technical problems

What will you be working on next?

Please summarize your development goals into a few sentences or bullet points:

Milestone 2: Develop Public REST API

PS: Codes are pushed to the same Organization but in a new Repo GitHub - Chi-voice/chivr