Project Name: Data Guardian
Name of the organization or individual submitting the proposal: Emmanuel Damilare Adediji
Project Description
Backgrounds
We’re surrounded by data. It’s everywhere…our very lifestyle, choices, core values - everything is data and are being influenced by it. From personal data to video files, pictures, designs and schematics to numerical metrics - a whole lot depends on data.
In fact, it’s save to say that every company is a data company. Many individuals, companies and governmental parastatals store, use and dispense both processed and unprocessed data on a petabytes basis.
In this age of perilous data breaches, cyber warfare, hacking, hijacking and other often radical cyber-attacks, it has been proved that the main aim of the adversary is to gain access and insight to some sort of data or another, data theft, legitimate access denial to data, data tampering, etc.
This has made for unauthorized and unauthenticated access to data, data theft and piracy, libel, slander and blackmail due to access to confidential, classified or personal data, cyber-terrorism, cyber-bullying, the list goes on.
For individuals, this could mean embarrassment, depressions and suicidal inclinations. For enterprises, this could mean loss of profit and revenue, bottom-line dip, lawsuits and closure from customers and governmental regulatory and compliance bodies alike.
It is not an understatement however, that the governmental bodies of various countries have put various compliance laws and regulations in place, but what is a law without the means to enforce it.
The Response
To forestall the aforesaid and even worse, the individuals and enterprises do try to find different ways to protect their data - one of them being cyber-security, others include policy enforcement. Aside general protection, they also want the protection to include various aspects that are unique to their use-cases - for instance, “Can I set this data‘s expiration?”, “Will this data be accessible outside France - we don’t want it to be!”, “This design is a business secret, we don’t want it assmoessed outside 10 meter radius to our office!”, “How can I meet up with digital security compliance and regulations in my country regarding my business?”
In short, they want not only to secure and protect their data in ways unique to them, but also holistically govern their data even when it is sent outside their domain.
Networks and Protocols
As hitherto mentioned, individual’s and enterprise data usually finds themselves under various computing scenarios and hence networks and protocols. Although the primary focus for a true cyber-security approach via cryptography is by focusing on safeguarding at the data layer, there’s a truly intricate relationship between data and the network and protocols through which they travel. There are diverse networks and protocols available for use by individuals and enterprises, governmental parastatals and organizations to move, share and store data.
We discuss under the following treatise:
- High Level Networks
This include high level networks and protocols designed to move, share and retrieve data. The most common of this is the HTTP(s)/REST. However, networks and protocols are diverse and varied -
-
Alternative Web: SOAP, XML-RPC, gRPC, jsonRPC, Graph-QL, Websocket, WebRTC,etc.
-
Messaging: AMPQ, MQTT, COAP, KAFKA, etc.
In our case, we focus on HTTP(s)/REST.
For Storage and Retrieval -
IPFS, BitTorrent, Swarm, NFS, iSCSI, Fibre Channel, Sia, StorJ etc.
In this case, we focus on both NFS/iSCSI and Sia.
- Low Level Networks/Protocols:
Low level Networks and Protocols - TCP, UDP, IP MULTICAST, IRC, FTP, SSH, NNTP, IRC, RTP
Mail and Email Networks - XMPP, IMAP, POP3
The majority of the aforementioned low level networks/protocols are used in building the higher level networks/protocols.
How do we securely store, protect, govern, retrieve and share data moving through or resting in these kinds of networks.
As one can easily detect, not all data movement and storage is done via the REST or HTTP protocol only - especially in the enterprise and big organization where data gets moved in myriads of networking infrastructure.
Technology Exegesis
NFS(Network File Storage) and iSCSI(Internet Small Computer System Interface) are protocols for file storage and retrieval.
NFS is a distributed file system protocol that allows users to access files over a network as if they were stored locally. It operates on top of TCP/IP, making it readily available on most network infrastructures.
iSCSI is a block-level storage networking protocol that allows storage devices to be accessed over a standard IP network. It encapsulates SCSI commands within TCP/IP packets, enabling block-level storage access over standard Ethernet networks.
Whereas these two protocols allows for seamless storage and retrieval experience, they are usually built on centralized storage systems.
However, Sia decentralized storage offers a major advantage of the storage of the future - a truly decentralized storage solution.
Project Specifics
The system is proposed to be a highly robust and scalable NFS/iSCSi infrastructure, but with the Sia storage engine handling persistence underneath.
This will be accomplished through the following architectural and algorithmic exegesis:
- The Client Layer:
This layer acts as the User Interface frontend for the NFS/iSCSI. The user can interact with this as if it were a truly NFS/iSCSI interface - traditional and common functionalities such as mounting and un-mounting shares, discovering targets, read/write, managing/enforcing permissions/policies, etc.
- The Abstraction Layer:
This serves as a middleware between the client layer and the server gateway infrastructure. It sits between the traditional protocol implementation and the actual communication with the gateway.
This layer translates the traditional NFS/iSCSI protocol commands into appropriate API calls to the gateway service, effectively hiding the underlying complexity of the decentralized storage network from the frontend application.
On the flip side, it also acts as an intermediary between the responses coming from the server gateway to translate them into forms compatible with the client layer(as afore-discussed)
- The Server Gateway:
This layer is the custom server infrastructure through which well-defined APIs are exposed. This APIs will expose functionalities similar to the chosen traditional storage protocol (NFS or iSCSI) - such as storage, retrieval and even more (governance parameters).
Implicitly, the gateway handles both compute and memory intensive tasks such as concurrency, asynchronous strategies, cryptography, key management and rotation, (de-)compression, caching, authentication, authorization, error-handling, policy enforcement and data governance operations. It also handles both persistence and retrieval operations by interacting with the underlying distributed storage engine(Sia), using different optimization approaches and strategies e.g chunking, content-addressing, data corruption prevention, etc. for speed and scalability.
Optionally, transactions involving storage space procurement and allocation via Siacoin is also exposed in this layer.
Furthermore, aside the API, the Gateway layer also exposes a Web UI, for the admin user to easily set configuration, procure storage space, set and enforce general policies, permissions and governance parameters, and manage stored data and such other operations as could determine and control the access and usability of the expected data in motion or at rest.
- The Storage Layer:
The persistence layer is a decentralized layer which is highly robust and for this specific case, the Sia storage.
The server gateway layer utilize the various optimization strategies such as chunking, content-addressing, etc. to ensure effective storage and retrieval over this layer.
Furthermore, portions of storage size available for each user on the storage engine is mapped by the server gateway based on the storage engine’s API.
Moreover, transactions involving storage space procurement and allocation via Siacoin is handled in this layer.
How the projected outcome serve the Foundation’s mission of user-owned data
The impact and service of the proposed system is that of a combined approach :
By integrating the various aforementioned technologies stacks, the projected outcome facilitates user-owned data in the following ways:
Decentralized Control: Users control where their data resides, eliminating reliance on centralized storage providers.
Data Privacy: Fast and strong encryption protects user data from unauthorized access, even by Sia network operators.
Familiar Interface: Traditional protocols like NFS/iSCSI offer a comfortable way to interact with decentralized storage.
Future-Proof System: A blend of the distributed system and post quantum cryptography exposed by the ease of use of the NFS/iSCSI layer help ensure a true future-proof system against quantum attacks and breaches regarding user data.
Optional Interface: Exposing settings and configuration over the web also serves as a plus regarding flexibility and ease of use.
Granular Access Control: Users define who can gain access to data, how their data would be assessed, where their data could be assessed and when their data would be assessed through various policy enforcements, permission controls and data governance rules. In addition, they can also revoke access at any time.
Zero Trust Protection: In a multi-connected environment, this can add to a layer of security where security travels with data even outside the domain of the original data owner.
Transparency: Open-source code allows users to understand and verify data storage and management.
Conclusion:
The Sia Foundation’s mission of user-owned data is realized through a well-designed and robust system that leverages decentralized storage, traditional protocols, abstraction layers, secure gateways, and data privacy and governance principles.
This combined approach not only empowers data owners (viz individuals or enterprises), but also gives them back control over their valuable data, hence, aligning with the core values of the internet of the future - a truly secured, decentralized and distributed web where, no matter their location, user data can be truly said to be OWNED.
Grant Specifics
Amount of money requested and justification with a reasonable breakdown of expenses:
Budget Breakdown:
Development (3 Months): $6,912 (48 hrs/week, $12/hr)
Justification: This hourly rate reflects my experience in Software development and expertise with relevant technology stacks. Working 48 hours per week for 3 months allows for focused and efficient development.
Deployment: $900
Justification:
Cloud Resources: $500 (Estimated cost for a CPU/IO bound virtual machine instance on a cloud platform, leverage on free tier as much as possible)
CI/CD: $200 (For automated builds and deployment)
Hosting/Domain Name: $200 (Cost of domain registration and general hosting for the gateway service)
Hardware: $1,400
Justification:
1 MacBook Pro: ($1300) This budget allows for a used MacBook Pro with sufficient performance for development. Exploring used options helps maximize budget allocation for essential project components.
Alternative: Had an existing hardware been available, I would have utilized it, but I am in a resource constrained environment.
1 5G Router for Internet Connectivity: ($100)
Justification: Having a standby gadget for internet access is a boost to my productivity and efficiency in getting the job done. As I live in a developing country with unstable internet access, this will boost my productivity and workflow.
Workspace Rentage: $500
Justification: Renting a dedicated workspace with constant electricity is necessary for the fast and effective completion of the project. As I live in a developing country with unsteady power supply, this will boost my productivity and work-flow.
Contingencies: $200
Justification: Unforeseen and unplanned situations and circumstances
Total Project Budget: $9,912
Project Timeline:
Month 1:
Project setup, environment configuration, and system design.
Development of core functionalities: Abstraction Layer, Server Gateway service, infrastructure and functionalities, abstraction layer, data chunking/distribution.
Month 2:
Client Layer: NFS/iSCSI client interface and experience.
Persistence Layer: Integration with Sia API, security and optimization implementations.
Unit testing and core functionality verification.
Month 3:
Implementation of advanced features within all layers (progressive).
Modification and Refinement.
System documentation, user testing,
Success Metrics:
Functional Server Gateway enabling data access via NFS/iSCSI protocols.
Functional Abstraction Layer enabling call translation and communication.
Integration with Sia network for decentralized storage.
Functional Client Layer - interface for the NFS/iSCSI protocol.
Secure data storage with various blends of fast, strong and post-quantum encryption at rest and in transit.
Permissions, Governance and Policy enforcement configurations.
User testing and positive feedback on usability.
Additional Considerations:
This proposal prioritizes core functionalities within budget constraints. Other advanced features can be implemented later.
Open-source libraries and tools will be used whenever possible to minimize costs.
The project will prioritize collaboration with the Sia community for feedback and potential integration into existing Sia tools.
Summarily, I have outlined a cost-effective and well-defined plan to develop a secured decentralized storage gateway on top of traditional protocols. By utilizing my skills, exploring cost-saving measures, and focusing on core user needs, this project aims to deliver a valuable contribution to the Sia ecosystem.
What are the goals of this small grant?
The goals of the small grant are mapped into these corresponding project scope:
Development Environment Setup:
Purchase a laptop suitable for software development.
Set up necessary development tools and software licenses.
Workspace and Living Expenses:
Secure a conducive temporary workspace with stable electricity during the development period.
Allocate funds for living expenses during development.
System Development:
Design and Implement the aforesaid NFS/iSCSI Frontend Client layer (see description above)
Develop and Implement the aforesaid Abstraction (Middleware) layer within the Frontend application for NFS/iSCSI compatibility(see description above)
Design and Implement the aforesaid Core Server Gateway Layer, focusing on essential functionalities - concurrency, encryption, key management, policy enforcements, permissions configurations, data governance, etc…(see description above)
Design and Implement essential Data Chunking, Storage, Distribution, Corruption prevention, content addressability and retrieval functionalities for interacting with Siacoin.
Integrate with core Sia API functionalities for file management.
Security:
Implement encryption of data chunks at rest (balancing strong and fast algorithms, non-deterministic, post-quantum, key management and rotations);
Authorization and Authentication;
Implement Secure Communication Protocols for critical interactions within the system (e.g. by utilizing HTTPS for API calls).
Monitoring and Logging:
Design and Implement monitoring and logging of algorithmic level operations
Design and Implement monitoring and logging for system level operations
Testing:
Conduct both unit and integration testing to ensure core functionalities work as intended.
Documentation:
Develop an essential and robust user and developer documentation especially for the server gateway and the various User Interfaces to ensure a seamless flow among both technical and non-technical users
Potential risks that will affect the outcome of the project:
A holistic treatment of potential risks that might affect the outcome of the project to build a decentralized storage gateway for Siacoin with NFS/iSCSI access:
(1) Technical Risks:
Security Vulnerabilities: Security vulnerabilities in the custom server gateway, the Sia API integration or underlying libraries could expose user data or compromise the system’s integrity. It is expedient to stay updated with security patches and conduct thorough security audits.
Integration Challenges: Integrating Sia’s functionalities with traditional protocols like NFS/iSCSI could be complex. Compatibility issues or unexpected behavior might arise during integration and require a very thorough development and testing effort.
Data Chunking and Distribution: Implementing efficient data chunking and distribution algorithms, especially for large files, could be challenging. Inefficiencies could lead to performance issues, data corruption or wasted storage space on the Sia network.
Sia Network Immaturity: Being a relatively young project compared to other established battle-tested centralized storage solutions, the network could experience technical glitches or rather unexpected behaviors when accessed that could impact and compromise the server gateway’s functionality and hence, the NFS/iSCSI’s subsequent functionalities.
Scaling, Fault Tolerance & Latency: Developing a multi-tiered system like this could be demanding in both compute and memory resources. Both vertical and horizontal scaling concerns and issues might not be well addressed.
Also, error handling, retries, restart and supervision might not be well handled which could lead to system crashes and irrecoverable failures.
Moreover, the speed of storage, retrieval, and myriads of other operations and tasks to be performed both by the core server gateway and the persistence layer could cause higher latency and in turn very low speed of execution and crashes.
Testing Limitations: Implementing comprehensive testing for a system interacting with a decentralized network like Sia could be difficult. Inefficient testing could lead to bugs or unforeseen issues surfacing after deployment.
(2) Project Management Risks:
Limited Resources: Developing a project as a single engineer presents resource constraints. Furthermore, time constraints might necessitate compromises in functionality depth or security compared to a larger team approach.
Scope Creep: As the project progresses, new features or functionalities might seem wanted and desirable. When not carefully managed however, scope creep can lead to delays, budget overruns, and a departure from the core focus - the user-owned data goals.
Unexpected Delays: Unexpected technical hurdles, dependency issues with libraries, or unforeseen personal challenges could lead to project delays. Maintaining flexible schedule and clear communication - such as the monthly reporting - could serve as essential essential strategies for mitigating such risks.
(3) External Risks:
Sia Network Disruption: While unlikely, major disruptions within the Sia network, for instance - a large-scale outage or security breach, could impact not only the server gateway’s functionality, but also the user experience and access to data.
Fluctuations in Siacoin Price: The price of Siacoin could fluctuate significantly. This could affect the cost of storing data on the Sia network, hence, impacting user adoption of the proposed infrastructure.
Proposed Risk Mitigation Strategies:
Continuous Logging and Monitoring: Continuously Monitoring the Server Gateway and the Sia network for stability, performance and error issues through logging and a robust monitoring strategy help mitigate poor oversight and hence undetected errors and crashes. Regularly reviewing and updating libraries and dependencies would also help to address known vulnerabilities and application-level bugs.
Phased Development: Implementing project features in phases while focusing on core functionalities first makes for quick testing and iteration before expanding to more complex adjunct features later.
Robust Architecting, Designing and Development: A well architected, designed and developed system and infrastructure mitigate the risk of scaling, fault-tolerance and latency issues. Technical concerns like concurrency, asynchronous solutions, optimization strategies, design patterns, caching, etc. could be built into the very core of the system.
Thorough and Holistic Testing: While comprehensive and holistic test coverage might be challenging, it could be accomplished by prioritizing both unit and features testing to ensure that all parts of the system operates as intended. Also, using community testing tools or soliciting community feedback during development would be put into consideration.
Clear Communication: Maintaining a clear communication with potential users and community regarding the project’s potential strengths, weaknesses, limitations, and other risk metrics associated with using such a system.
Contingency Planning: Developing contingency plans for potential network disruptions or unexpected resource limitations could be taken into consideration. Scope creeps and unexpected delays could be controlled by allotting another month to project duration (hence, by making the duration four (4) months, all edge cases could be covered)
All these however, would depend on the timeline and constraints finally imposed on the scope of the project.
Development Information
Will all of your project’s code be open-source? Of course, all will be open source.
Leave a link where code will be accessible for review.
Monthly progress reports submission:
Project’s progress reports will be submitted monthly on the forum
Contact info
Email: [email protected]
Other preferred contact methods: