Small Grant: Chi

Project Name: Chi, A Community-Powered Platform for Multilingual Audio Data Collection

Name of the organization or individual submitting the proposal: Princess Innocent

Describe your project.

Overview

Chi is a lightweight, privacy-conscious web application designed to collect, organize, and store audio recordings of indigenous languages. The platform enables native speakers to record translations of English words, phrases, and sentences using their own voices, forming a foundational dataset for it’s future AI translation model.
The global south is home to over 5000 spoken languages, yet most speech AI systems ignore or underrepresent them — due to a lack of accessible, labeled, and inclusive audio datasets. Without data, these languages risk digital extinction.

Existing efforts (like Mozilla’s Common Voice) barely scratch the surface of African language diversity and often rely on written text, which excludes non-literate speakers.

Chi solves this by:

  • Empowering native speakers to record spoken translations of English prompts in their own languages.

  • Storing that data securely, permanently and decentralised, giving researchers, developers, and communities access to ethically sourced language data.

Who Benefits From Your Project?

Linguistics and Researchers
Accessibility, Representation And Preservation of indigenous languages
AI And NLP Developers
Educational Institutions

How Does The Project Serve The Foundation’s Mission Of User-owned Data?

  1. Decentralized Preservation of Cultural Knowledge

Indigenous languages are disappearing faster than they can be documented. By using Sia:

  • We store cultural data securely and immutably, ensuring it outlives local servers, political changes, or institutional neglect.

  • The project demonstrates how Web3 tools can protect heritage, not just finance.

  1. Data Ownership for Indigenous Contributors

Chi is designed so that native speakers contribute voice recordings with full knowledge and consent — and their contributions are stored permanently on Sia’s decentralized network, not a private or centralized database.

This ensures:

  • Transparency: Contributors can verify and access the content they help create.

  • Autonomy: No corporation, government, or institution can lock, alter, or delete the cultural data once it’s on Sia.

3. Model for Future Decentralized Datasets

Chi Voice will serve as a replicable framework for other regions and cultures to follow.
By showing how Sia can power large-scale, ethically sourced voice datasets, we:

  • Encourage developers and researchers to use Sia for decentralized data hosting

  • Create momentum for a new standard of AI dataset sovereignty

Are you a resident of any jurisdiction on that list? No

Will your payment bank account be located in any jurisdiction on that list? No

Grant Specifics

Amount of money requested and justification with a reasonable breakdown of expenses:

Use of Funds: $8000 requested

Item Amount (USD)
Salary (full-stack platform development) $6000
Admin/hosting/ storage fees (1 year) $2000

What are the goals of this small grant? Please provide a general timeline for completion.

Our goal is to deliver a functional web platform for multilingual audio data collection at the end of the grant.

Month 1: Web App MVP A production-ready web interface for task-based audio recording with user onboarding, metadata tagging (language, location, gender, etc.), and consent forms.

Month 2: Admin Dashboard Basic moderation tools for reviewing and managing submissions, quality control, and metadata tagging.

Month 3:
Sia-Based Storage Integration Audio files and metadata stored via S5 for decentralized, verifiable, and permanent access.

Potential risks that will affect the outcome of the project:

Low Participation in Rare Languages
Some indigenous languages may have few active speakers, limiting dataset diversity.

Poor Audio Quality
Background noise or unclear recordings may affect usability of submissions.

Incorrect Language Labeling
Users may misidentify dialects, leading to inaccurate metadata.

Internet Access Constraints
Contributors in rural areas may face challenges uploading recordings due to weak connectivity.

Development Information

Will all of your project’s code be open-source?

Yes

Leave a link where code will be accessible for review.

Do you agree to submit monthly progress reports?

Yes we will submit reports on our progress here on the forum

Contact info

Email: [email protected]

By using Sia, we store cultural data securely and immutably, ensuring it outlives local servers, political changes, or institutional neglect.

That is not how Sia works. In reality, Sia is more fragile than basically every other form of data storage as you have to have a computer online constantly to repair bad sectors.

I’m not saying this is necessarily a bad proposal, but it’s important to understand the tech you’re working with.

In other words, Sia is not BTC ordinals or arweave. Not all data is on the blockchain and this is the most mis-understood thing about blockchain in general due to the hype of the industry…

You are paying for the data to stay up for as long as you wish for it to stay up.

If you are wanting forever, immutable storage, Sia is not it. I would also say I don’t think forever storage that is not metadata will end well, but that is an entire other debate.

Kudos.

I appreciate your comment… but the statement was made in regards to Sia’s decentralised (storing data across multiple computers) nature. If the world does not return to the stone age, we do expect Sia to continue growing with demand.

Sure, but you said “we store cultural data securely and immutably, ensuring it outlives local servers, political changes, or institutional neglect.”

Data absolutely will not survive on sia more than a couple months with institutional neglect or your local servers dying. It might on disk or tape, not on sia.

again i appreciate you stating that. :)
please correct me if I’m wrong, as we intend to consolidate collected data in a decentralized storage, Sia is ideal no?

That depends. Decentralized does not mean the data in “onchain forever”. You have to pay to keep the data stored.

So is more of a question of what your goals are 1st, not a question of if Sia is ideal… because we need to understand your goals 1st.

Thanks for your proposal to The Sia Foundation Grants Program.

After review, the committee has decided to reject your proposal citing the following reasons:

  • Upon reading your responses to questions in the forum thread, your understanding of how Sia works seems to need a better foundation before tackling a grant project.
  • Your GitHub profile doesn’t have much activity. Are you able to point us to some previous work that indicates an ability to complete a technical project?
  • While not required for a small grant proposal, the committee wondered if you had any ideas in mind to mitigate the risks you mentioned.

We’ll be moving this to the Rejected section of the forum. Thanks again for your proposal, and you’re always welcome to submit new requests if you feel you can address the committee’s concerns.