AccessDoc Pilot

Introduction

Project Name
AccessDoc Pilot — SDK-based public document publishing on Sia

Proposed fit within the current funding themes
Sharing files through platforms without vendor lock-in

Name of the organization or individual submitting the proposal
Tochukwu Samuel

Role
Project Lead and Engineer

Background
Engineer focused on web applications, backend systems, and developer tooling. My recent work has included production-style web applications, backend integrations, and tooling-oriented workflows where clear scope, release discipline, and inspectable outputs matter.

Describe your project

AccessDoc Pilot is a narrow publication workflow for organizations that need to publish public-facing documents such as notices, forms, meeting packets, policies, and scanned records.

The product does one job: it accepts an existing source document, generates a web-readable HTML derivative for supported inputs, keeps the original source document available, and exposes the published result through a public item page and a read-only JSON API.

For this pilot, a published item consists of:

  • a source object
  • a derivative object
  • application metadata
  • publication state maintained by the application

The current MVP proves the core workflow. The grant work is to standardize and harden the publication layer on the Sia SDK object model: upload the source and derivative as objects, attach application metadata, pin the required objects, and expose the published item through a stable public page and read-only JSON endpoint.

Version one is intentionally narrow. It supports PDFs and common scanned documents within configured limits. It does not attempt to be:

  • a document management suite
  • a collaboration workspace
  • a private notes product
  • a compliance platform
  • a general CMS replacement

Why this use case is needed now

A common way to publish public-facing material is still to upload a raw PDF or scanned file and stop there. That keeps the source file online, but it often leaves the information harder to search, harder to read on mobile, and harder to inspect programmatically than it needs to be.

AccessDoc addresses the narrower gap between “the file is online” and “the information is usable on the web.” It keeps the original document available while producing a web-readable derivative for supported documents. The pilot is scoped around publishing and access, not full document remediation.

Why this approach is a better fit than the obvious alternatives

Raw PDF posting keeps the source file available, but it usually leaves searchability, mobile reading, and structured access worse than necessary.

A general CMS can publish pages, but it does not usually treat the original document and generated derivative as one application-level published item with a stable read-only item endpoint.

A full remediation or document-governance suite is broader, heavier, and more expensive than this grant should attempt.

A notes or workspace product solves a different job entirely. AccessDoc is about publishing existing public documents, not creating or syncing private editable content.

Current MVP

Live app
www.accessdoc.xyz

Demo video

Demo credentials
Username: admin
Password: admin

GitHub repository

Scope of the funded work

This grant is for turning the existing MVP into a narrower and more reliable public pilot with:

  • an SDK-based publication path for source and derivative objects
  • a clearer item metadata model
  • stronger text extraction and OCR handling for the document types in scope
  • more reliable HTML derivative output for supported inputs
  • a read-only JSON API for published items
  • review and publication-status controls
  • stronger testing, deployment, and documentation
  • a short maintenance window after release

How does the projected outcome serve the Foundation’s mission of user-owned data?

This proposal is submitted under the Building with SDKs theme and is positioned as a file-publication workflow without vendor lock-in.

AccessDoc is not framed here as a generic web application with decentralized storage added later. The hosted control plane is conventional web infrastructure. The core of the proposal is the publication layer: the source and derivative are uploaded as objects, metadata is attached at the application layer, and the required objects are pinned so they remain part of the application’s tracked publication state.

That makes Sia part of the product model rather than a replaceable storage detail. The result is a narrow, inspectable application where the original source document and the derived public representation are not confined to a proprietary document portal as the only usable copy.

This is a practical user-owned-data use case: a publisher keeps control of its publication artifacts while also exposing a public page and a structured read-only interface for others to use.

Are you a resident of any jurisdiction on that list?
No

Will your payment bank account be located in any jurisdiction on that list?
No

Grant Specifics

Amount of money requested and justification with a reasonable breakdown of expenses

$10,000 USD total

The MVP already proves the core workflow. The grant work is for hardening, narrowing, and releasing it as an SDK-based public pilot.

The budget is based on 160 hours at $62.50 per hour and is tied to specific implementation work.

Budget breakdown

SDK publication pipeline and item metadata model
30 hours
$1,875

Text extraction, OCR handling, and HTML derivative hardening
30 hours
$1,875

Public item page and read-only JSON API
26 hours
$1,625

Review controls, publication status flow, and reviewer access path
18 hours
$1,125

Testing, deployment hardening, and documentation
24 hours
$1,500

Sample documents, example published items, and release preparation
16 hours
$1,000

30-day maintenance window and one patch release if needed
16 hours
$1,000

Total
160 hours
$10,000

What is the high-level architecture overview for the grant? What security best practices are you following?

High-level architecture

  • hosted staff upload and review interface
  • backend API for validation, item records, and publication control
  • worker process for text extraction, OCR, and derivative generation
  • application database for publication state and metadata fields
  • Sia SDK publication layer for source and derivative objects
  • public viewer and read-only JSON API for published items

Request and publication flow

  1. A staff user uploads a supported source document.
  2. The backend validates file type, file size, and required metadata.
  3. A worker extracts text or runs OCR where needed for supported inputs.
  4. The system generates the HTML derivative.
  5. The backend uploads the source and derivative as objects, attaches application metadata, and pins the required objects.
  6. The application records publication state and the related object identifiers.
  7. Once approved, the item is exposed through the public page and read-only JSON endpoint.

Security and implementation practices

  • no credentials will be committed to the repository
  • secrets will remain server-side through environment configuration
  • uploads will be type-limited and size-limited
  • filenames and user-supplied metadata will be validated and sanitized
  • staff endpoints will require authentication, and publish actions will require authorization
  • the public API will remain read-only
  • HTTPS/TLS will be required in deployed environments
  • dependencies will be pinned and reviewed regularly
  • release artifacts will be checksummed
  • failed or low-quality conversions will remain in review state instead of being auto-published
  • the product will make no permanence claims and will treat storage as budgeted and renewable

Development will be shipped through small, reviewable pull requests with testing notes and README updates so you can inspect progress clearly.

What are the goals of this small grant? Please provide a general timeline for completion.

The goal of this grant is to deliver a working SDK-based public pilot that proves one narrow publication flow:

A publisher can upload a supported source document, generate a web-readable HTML derivative, upload the source and derivative as objects, and expose the resulting published item through a public page and a read-only JSON API.

A second goal is to keep the pilot narrow enough to be clearly deliverable and clearly useful.

Timeline

Week 1

  • audit the current MVP
  • define the item metadata model and publication-state model
  • harden validation for supported uploads

Week 2

  • standardize the source and derivative publication path on the Sia SDK object flow
  • improve text extraction and supported OCR handling
  • tighten failure handling for low-quality source material

Week 3

  • harden HTML derivative output for the document types in scope
  • finalize the public item page
  • finalize the read-only JSON API

Week 4

  • add review and publication-status controls
  • strengthen testing and deployment flow
  • prepare sample documents and example published items

Week 5

  • finalize README and run/build instructions
  • ship the pilot release
  • publish stable demo documentation
  • begin the 30-day maintenance window

Acceptance criteria

  • a supported PDF or scan can be uploaded within configured limits
  • the system generates a web-readable HTML derivative for supported inputs
  • the source object and derivative object are both represented in one published item record
  • the read-only JSON API returns the item metadata and publication information
  • review and publication-status controls are present before an item goes live
  • the demo is deployable and end-to-end functional
  • README instructions are sufficient to build and run the project
  • the product and documentation contain no permanence claims

What are your plans for this project following the grant?

The grant includes a 30-day maintenance window for bug fixes, OCR edge cases that fall within scope, and one patch release if needed.

Beyond that funded period, the project has a concrete follow-on plan.

First 90 days after the grant

  • triage feedback from reviewers and early users
  • improve deployability and setup clarity for third-party testers
  • publish additional sample document packs for evaluation
  • refine metadata and review flow based on real usage

Next stage after the pilot

If the narrow pilot proves useful, the next stage will focus on a small number of adjacent improvements rather than broad platform expansion. Likely candidates are:

  • support for a few additional document profiles that fit the same publication model
  • lightweight multi-publisher support
  • stronger exportability of item metadata and publication records

Feedback channels

  • Sia Forum thread updates
  • GitHub issues

Sustainability path

The core project will remain open-source. A realistic sustainability path is managed hosting, sponsored deployments, or support agreements for organizations that want the workflow without maintaining the stack themselves.

This is not intended as a one-off experiment. The grant is the first stage of a longer SDK-based application with a deliberately narrow and inspectable publication workflow.

Potential risks that will affect the outcome of the project

The largest risk is scope expansion. Public document publishing can easily turn into a much larger accessibility or content-governance product. The mitigation is a strict version-one boundary:

  • one source document per published item
  • one generated derivative
  • one read-only public API
  • one narrow publication model
  • one narrow set of supported source types
  • no claim of full compliance across all document types

A second risk is poor source quality. OCR and text extraction can fail on low-quality scans or inconsistent source material. The mitigation is to support only a narrow set of source types in version one, keep difficult items in review state, and preserve the original source even when a derivative needs manual review.

A third risk is trying to broaden the project into a general content platform too early. The mitigation is to keep the funded scope tied to one publishing workflow and judge expansion only after the narrow pilot is proven.

Development Information

Will all of your project’s code be open-source?
Yes. The project will be open-source.

Leave a link where code will be accessible for review.

Do you agree to submit monthly progress reports?
Yes. Monthly progress reports will be submitted in the forum thread and will link completed tasks to the relevant pull requests and testing notes.

Contact info

Email
[email protected]

Hey, you may want to repost in the Grants category to make your proposal visible.

Hi, thank you! I just edited it.

Hello @Buidl - welcome to the Sia community and thank you for your proposal.

Could you detail how your proposed project is materially different from this one:

Thank you for the question @mecsbecs

We see AccessDoc as materially different from DecaNotes because the two products are built for different jobs.

DecaNotes is a privacy-focused note application. It is built around creating, updating, deleting, encrypting, and syncing Markdown notes across devices.

AccessDoc is a narrow publishing workflow for public-facing source documents. A user uploads an existing PDF or scan, the system extracts text or runs OCR where needed, generates a searchable HTML page, preserves the original source file and the generated derivative together as one package, and exposes that item through a public page and a read-only metadata API.

So while both can use Sia-backed storage, that is where the similarity ends. The actual workflow, content model, and output are different. DecaNotes manages private editable notes. AccessDoc manages public source documents plus generated public derivatives for read-only access.

That boundary is also intentional in the current pilot scope. AccessDoc does not include note creation, note editing, private workspaces, client-side encrypted personal notes, collaboration, or cross-device sync. The current scope is limited to public-facing documents, source-and-derivative packaging, a read-only API, and review controls for publishing. For that reason, we do not see AccessDoc as duplicating DecaNotes.

Hi @Buidl - thank you for your explanation. Your proposal comes at an interesting time, however, as we will be releasing new Grants Program funding guidelines next week. This also means the next Grants Committee meeting will be held on April 28th (with April 22 as the proposal submission deadline) to allow for adequate time for these new guidelines to be reviewed & incorporated into proposals.

Please review these guidelines when they’re released next week and then tag me when/if you’ve updated your proposal accordingly to be reviewed.

Hi @Buidl - the new funding guidelines have been posted:

After reviewing, please tag me to either look over your edited proposal or to let me know if this proposal should be moved to ‘Inactive.’

Hi @mecsbecs , thank you for the update. We have now updated our proposal accordingly. Please let me know if you have any questions.
Thank you.