# Standard Grant: Chi

Project Name: Chi, A Community-Powered Platform for Multilingual Audio Data Collection

Name of the organization or individual submitting the proposal: Princess Innocent

Describe your project

Overview

Chi is a lightweight, privacy-conscious platform designed to collect, organize, and store audio recordings of indigenous languages. The platform enables native speakers to record translations of English words, phrases, and sentences using their own voices, forming a foundational dataset for its future AI translation model.

The global south is home to over 5000 spoken languages, yet most speech AI systems ignore or underrepresent them — due to a lack of accessible, labeled, and inclusive audio datasets. Without data, these languages risk digital extinction.

Endangered languages are currently dying at an accelerated rate because of globalization, mass migration, cultural replacement and linguicide etc. Approximately 454 known languages have become extinct in recent times, with over 3000 (43% of total) spoken languages considered endangered.

Existing efforts (like Mozilla’s Common Voice) barely scratch the surface of Asian and African language diversity and often rely on written text, which excludes non-literate speakers.

Whilst providing users with the list of all spoken lamguages, Chi solves this by:

  • Empowering native speakers to record spoken translations of AI generated prompts in their own languages.
  • Storing that data securely and decentralized, giving researchers, developers, and communities access to ethically sourced language data.

Chi web currently has:

  • 70+ users contributing
  • 200+ recordings
  • Over 90 languages from 3 continents recorded and counting.

These details are auto updated and can be viewed at the bottom of the home-screen. Link to proof of concept below


Who Benefits From Your Project?

  • Linguistics and Researchers
  • Accessibility, Representation And Preservation of indigenous languages
  • AI And NLP Developers
  • Educational Institutions

How Does The Project Serve The Foundation’s Mission Of User-owned Data?

1. Decentralized Preservation of Cultural Knowledge

Indigenous languages are disappearing faster than they can be documented. By using Sia:

  • We store cultural data securely and immutably.
  • Consolidating required dataset for model training.
  • The project demonstrates how Web3 tools can protect heritage not just finance, and we hope the Foundation sees it’s value and potential.

2. Data Ownership for Indigenous Contributors

Chi is designed so that native speakers contribute voice recordings with full knowledge and consent — and their contributions are stored on Sia’s decentralized network.

This ensures:

  • Transparency: Contributors can verify and access the content they help create.
  • Autonomy: No corporation, government, or institution can lock or alter the cultural data once it’s on Sia.

3. Model for Future Decentralized Datasets

Chi Voice will serve as a replicable framework for other regions and cultures to follow.

By showing how Sia can power large-scale, ethically sourced voice datasets, we:

  • Encourage developers and researchers to use Sia for decentralized data hosting
  • Create momentum for a new standard of AI dataset sovereignty

Are you a resident of any jurisdiction on that list? No
Will your payment bank account be located in any jurisdiction on that list? No


Grant Specifics

Amount of money requested and justification with a reasonable breakdown of expenses:

Use of Funds: $35,000 requested

Item Amount (USD)
Salary (full-stack platform development), Improvements on existing PWA, mobile app development (cross platform), Sia integration (via S5) $32,000
Admin/Open AI API fees/hosting/App platforms and storage fees (1 year) $3,000

What are the goals of this standard grant? Please provide a general timeline for completion.

Our goals are:

  • Develop a mobile application for the Chi platform.
  • Improve the existing web-app, UI/UX and add interesting features.
  • Integrate Sia storage.

Month 1:

Sia-Storage Integration

  • Integrate Sia storage
  • Move and store existing and future Audio files and metadata via S5

Month 2-3: Mobile App Development (Cross-Platform)

  • Define mobile-specific features and UI changes.
  • Implement UI design.
  • Integrate audio input/output for mobile.
  • Implement push notifications.
  • Add offline mode & language caching.
  • Connect with back-end.

Month 4: Test and Fixes

  • Optimize for performance and battery use
  • Internal QA and bug fixes.
  • Beta release to test group.

Month 5: App Release

  • App Store & Play Store listing setup.
  • Official app launch

Potential risks that will affect the outcome of the project:

  • Low Participation in Rare Languages
    Some indigenous languages may have few active speakers, limiting dataset diversity.

  • Poor Audio Quality
    Background noise or unclear recordings may affect usability of submissions.

  • Incorrect Language Labeling
    Users may misidentify dialects, leading to inaccurate metadata.

  • Internet Access Constraints
    Contributors in rural areas may face challenges uploading recordings due to weak connectivity.

  • Legal Risk
    Data Privacy & Consent


Mitigations

  • Partner with local communities, NGOs and language groups to drive targeted outreach.
  • Provide in-app audio quality checks and guides for optimal recording.
  • Use verification by native speakers and cross-check with multiple submissions of same language.
  • Enable offline recording with later upload when connectivity improves.
  • Obtain explicit user consent, provide clear terms of use and comply with data protection laws.

Development Information

Will all of your project’s code be open-source? Yes
Leave a link where code will be accessible for review:
https://github.com/Chi-voice

Do you agree to submit monthly progress reports?
Yes — we will submit reports on our progress here on the forum.

Have you developed a proof of concept for this idea already?
Yes, it can be accessed at https://chivr.tech/


Contact info

Email: [email protected]

Hello, a few criticisms:

  • Move and store existing and future Audio files and metadata via S5 S5 is not going to be fully ready until roughly December S5 v1: Rewrite it in Rust (Large Grant Proposal). Starting this before I view as a very large risk.
  • You have 2 things in your line items. You should break that down for the community.
  • Improve the existing web-app, UI/UX and add interesting features. Is the existing webapp FOSS, if so you should clarify where. If not, you would need to FOSS it.
  • You will want to start at a small grant likely.

One major last issue I see is We store cultural data securely and immutably. which makes me question if you understood that Sia data is not stored forever Small Grant: Chi - #3 by pcfreak30. You should clarify what your intentions are if you do understand that users will have to keep paying to keep data online.

Kudos.

Thanks @pcfreak30
I hope I clarify things

  • If the work on S5 won’t be ready till December, I could move that work down my timeline

  • I intend to improve on the UI to match the additions that would come with the mobile app. By interesting features I mean Leaderboards, Badges, Challenges…etc and other features that’ll make the platform interesting for regular folks

  • Yes it is FOSS.

  • We noticed people where excited to invite their friends to check out the platform (proposal was submitted at 70+ users, now at 80+) while I appreciate your opinion, we believe this is what is best for the platform.

  • You brought back this point, even though I stated this,

forming a foundational dataset for its future AI translation model

as our plan in both submissions. But I’ll state this again, anyone with a degree in Software Engineering should understand how Sia works. Like on Sia, users pay for storage in other storage networks like walrus, storj… etc and I also know in regards to decentralized storage, Sia is top. The Chi platform + Sia is to collect and consolidate required dataset for it’s AI model. Our intention is not to just store, but to work with that data as soon we can.

  • I could elaborate on our business model if that is what the community needs

Thanks for your proposal to The Sia Foundation Grants Program.

After review, the Committee has decided to reject your proposal citing the following reasons:

  • There is still no proof of previous work, and no code in the GitHub shared for the proof of concept.
  • The proof of concept also does not have any of the advertised 200+ recordings accessible, and shows task error messages.
  • The grant budget is lacking in detail.

We’ll be moving this to the Rejected section of the forum. Thanks again for your proposal, and you’re always welcome to submit new requests if you feel you can address the Committee’s concerns.

Hello @mecsbecs
thanks for the feedback,
I’ll do well to resubmit with the required details

In regards to members of the committee not being able to access (verify) recordings of other users,

  • currently there’s no feature that let’s one user see the recordings of another.
  • task errors occur when a user is yet to complete current task, gpt doesn’t create a new task until the existing task is complete
  • when the platform requires that, we’ll make sure the users consent is gotten.

Below are some snapshots of some current analytics:

Users

Recordings

Built some AI tools before my final exams and will push all of those to the github as well

Thanks again.

Hi @Princess, given your note in your proposal about your plans for this platform would be:

giving researchers, developers, and communities access to ethically sourced language data.

We were anticipating these recordings would be accessible from the MVP. Are you saying this functionality will be included later if your next grant proposal is successful?

Thanks, and good luck with your exams.

1 Like

Exactly @mecsbecs , we would also update our terms and make sure it includes explicit consent from users on how this data would be used as we would be giving the community access to it.

I just posted the grant proposal with the information the committee required.

1 Like