Grant Proposal: Sia data sources for Grafana

Introduction

Project Name: Sia data sources for Grafana

Name of the organization or individual submitting the proposal: Bustedware LLC

Describe your project.
We seek funding to develop a Grafana data source tailored specifically for Sia hostd, renterd, and walletd services. This project aims to enhance monitoring and observability within the Sia ecosystem, providing users with a powerful tool to visualize and analyze key metrics from these products. Our data source will seamlessly integrate with Grafana, offering users an intuitive and unified platform to monitor the health, performance, and utilization of their Sia hostd and renterd services.

By creating this dedicated Grafana data source, we aspire to empower Sia users with insights into the behavior of their storage infrastructure. With your support, we are excited to bring this project to life and contribute to the advancement of Sia’s monitoring capabilities, ultimately strengthening the ecosystem’s resilience and growth.

Who benefits from your project?
A Sia Grafana data source plugin would benefit a wide range of stakeholders within the Sia network and ecosystem, including:

  1. Sia Network Operators: Hosts and renters on the Sia network would greatly benefit from the plugin. Host operators can monitor their hardware and network performance, track storage utilization, and optimize their offerings based on data insights. Renters can analyze their storage usage and reliability of hosts, helping them make informed decisions about their data storage strategies.
  2. Developers and Engineers: Those responsible for maintaining and optimizing the Sia network infrastructure would find the plugin invaluable for monitoring the health and performance of hostd. It could help them identify bottlenecks, track trends, and optimize the overall system for better efficiency.
  3. Enterprise Users: Enterprises utilizing Sia for decentralized cloud storage can leverage the plugin to ensure that their storage infrastructure is performing as expected. They can monitor usage, uptime, and overall system health, providing them with the confidence to use Sia for critical data storage needs.
  4. Data Analysts: The plugin can provide valuable data for analysts who want to analyze trends, usage patterns, and other metrics related to Sia’s storage services. This can lead to better insights into user behavior and contribute to the improvement of the Sia ecosystem.
  5. Community Members: Enthusiasts and community members interested in contributing to the Sia network can use the plugin to gain a deeper understanding of its performance and contribute to its growth by making informed suggestions and optimizations.
  6. Application Developers: Developers building applications that utilize the Sia network can use the plugin to monitor the storage aspect of their applications and ensure their users have a seamless experience.

Overall, the Sia Grafana data source plugin has the potential to benefit anyone involved with the Sia ecosystem, from individual users to enterprises, by providing valuable insights and data for more effective monitoring, management, and decision-making.

How does the project serve the Foundation’s mission of user-owned data?
The Grafana data source plugin for Sias hostd and renterd would provide a single pane of glass view for your entire storage infrastructure, improving the availability of user-owned data, and insights into performance.

Grant Specifics

  • $12,000 for part time salary

Timeline with measurable objectives and goals:
Month 1: Grafana datasource is installed, authenticated, and connected to hostd. Basic dashboard is built and displaying hostd metrics
Month 2: Grafana datasource is installed, authenticated, and connected to renterd. Basic dashboard is built and displaying renterd metrics
Month 3: Grafana datasource is installed, authenticated, and connected to walletd. Basic dashboard is built and displaying walletd metrics

Potential risks that will affect the outcome of the project:
I have developed grafana plugins in the past and I know an extremely customized data source can complicate and increase the lead time to project completion. I have been running siad primarily, there’s also getting over the initial learning curve for running hostd, renterd, and walletd services.

Development Information

Will all of your project’s code be open-source?
Yes

Leave a link where code will be accessible for review.
https://github.com/bustedware/siagrafana

Do you agree to submit monthly progress reports?
Yes

Contact info

Email: [email protected]

Hello @bustedware,

Thanks for submitting this proposal. The committee has a few concerns they’d like to address before making a decision.

Using the Sia software as a data source relies on that data being stored locally in databases. While this is the case for hostd, renterd and walletd do not currently do that. What would be your solution here?

Feasibility aside, what other metrics or interesting data would you intend to pull for the software? If you have a list of preferred metrics for each app, our team can identify which are currently possible and which we might be able to build endpoints for.

Regards,
Kino on behalf of the Sia Foundation and Grants Committee

Using the Sia software as a data source relies on that data being stored locally in databases.

Not necessarily. Although I think that it would be a good extension of the plugin to be able to source data which is stored on the Sia Network… but that was not originally part of the vision for this effort.

I want this particular Grafana plugin to leverage the APIs for hostd, renterd, and walletd found here:
hostd: hostd
renterd: renterd
walletd: walletd

I haven’t fully reviewed all the available API endpoints in each service but as an example the metrics endpoint in hostd has lots of good information that could be displayed in a dashboard hostd

Absolutely if we need to plot the values over time it would need to be persisted somewhere but that doesn’t necessarily have to be the Sia Network.

If the foundation is interested, I can also develop a telegraf OUTPUT plugin which would store data on the Sia Network but I think we would need to double the amount of time and double the requested capital that I originally asked for as part of this grant. I am the original author for the telegraf MongoDB output plugin. I was planning to make a separate grant proposal for the telegraf effort but I’m happy to combine it with this one. The telegraf output plugin would be all encompassing, meaning anything telegraf can pickup would be stored on the Sia Network. I’m not certain if the Sia Network can persist data that frequently so I would probably need to batch/throttle writes for it.

Let me know if you need any additional information or if the foundation is interested in setting up a conference call to discuss further.

This is interesting. Can you point me to a repo?

1 Like

Hello @bustedware,

Thanks for the updates to your proposal. I know we’ve already reached out to clarify some questions that the committee had, but I wanted to post a formal response to that effect.

The committee wanted to potentially correct a misunderstanding regarding one of our responses, and clarify other elements of your proposal. Once you discuss this with our team, we’ll post a summary of that discussion here as well as next steps regarding this grant.

Regards,
Kino on behalf of the Sia Foundation and Grants Committee

Hello @bustedware,

Thanks for the proposal, and your answers to the team’s questions in Discord. Congratulations, the committee found them satisfactory and has voted to approve your grant proposal.

We’ll reach out soon via email for onboarding.

Regards,
Kino on behalf of the Sia Foundation and Grants Committee

Report for November 2nd, 2023

Summary

  • Started developing custom Prometheus exporter in golang. Once complete I will cut PRs to hostd, renterd, and walletd to negate the need to run a separate exporter
  • Developed python script to generate telegraf configuration for all hostd, renterd, and walletd GET API endpoints.
  • Connected prometheus json_exporter to hostd /metrics API endpoint and verified its queryable and displays in prometheus dashboard. configuring json_exporter is bulky and I think writing a custom exporter would be cleaner
  • Currently troubleshooting an issue with running the configuration in telegraf. It does not appear that the http input plugin performs the API requests in parallel and therefore does not complete collecting all the data within 60 seconds. (which is 50 seconds above the default collection interval.). Looking at parallelizing the data collection, setting intervals for each http input definition, writing a custom collector, or some combination thereof.

Log

  • deployed hostd and renterd using docker on the testnet and synced. verified successful API calls.

  • stood up an instance of prometheus and looking at existing exporters as well as writing a custom exporter.

  • encountered issues while connecting to zen testnet peers in walletd. posted in discord #core-dev and it was identified to be a bug.

  • explored configuring the http input plugin for telegraf. data like the number of siacoins is interpretted as a string in telegrafs http input plugin. golang does not support uint128 out of the box. libraries exist to handle them but they are not native to either telegraf/grafana. writing a custom collector could resolve the issue if grafana doesn’t display strings nicely like prometheus does.

  • worked on rebasing the bustedware/telegraf repository and compiled telegraf.

  • connected json_exporter to prometheus and collected sample API endpoints off hostd.

  • recorded response times for hostd, renterd, and renterd API endpoints.

  • updated README.md files with examples running services on testnet / docker for hostd, renterd, walletd. pull request for renterd , walletd , and hostd which

API response times (seconds)

  • timings are from services running out of docker environment
hostd /api/alerts - 0.007305622100830078
hostd /api/accounts - 0.004000425338745117
hostd /api/metrics - 0.003999948501586914
hostd /api/settings - 0.003999948501586914
hostd /api/state/consensus - 0.0030002593994140625
hostd /api/state/host - 0.002999544143676758
hostd /api/syncer/peers - 0.0030508041381835938
hostd /api/tpool/fee - 0.002552509307861328
hostd /api/wallet - 0.003999233245849609
hostd /api/wallet/pending - 0.0029993057250976562
hostd /api/volumes - 0.0030019283294677734
renterd /api/bus/accounts - 2.040757417678833
renterd /api/bus/alerts - 2.041416883468628
renterd /api/bus/autopilots - 2.0466368198394775
renterd /api/bus/consensus/state - 2.045085906982422
renterd /api/bus/contracts - 2.0511515140533447
renterd /api/bus/contracts/prunable - 2.032635450363159
renterd /api/bus/contracts/sets - 2.039788007736206
renterd /api/bus/hosts?offset=0&limit=-1 - 2.0698041915893555
renterd /api/bus/hosts/allowlist - 2.044011354446411
renterd /api/bus/hosts/blocklist - 2.0518131256103516
renterd /api/bus/params/gouging - 2.0311686992645264
renterd /api/bus/params/upload - 2.035468578338623
renterd /api/bus/settings - 2.034477472305298
renterd /api/bus/state - 2.035883903503418
renterd /api/bus/stats/objects - 2.043243885040283
renterd /api/bus/syncer/address - 2.0334699153900146
renterd /api/bus/syncer/peers - 2.0379533767700195
renterd /api/bus/txpool/recommendedfee - 2.03950834274292
renterd /api/bus/txpool/transactions - 2.038769483566284
renterd /api/bus/wallet - 2.0478787422180176
renterd /api/bus/wallet/outputs - 2.0494179725646973
renterd /api/bus/wallet/pending - 2.0277810096740723
renterd /api/bus/wallet/transactions - 2.0682806968688965
renterd /api/bus/webhooks - 2.041905164718628
renterd /api/autopilot/config - 2.0473508834838867
renterd /api/autopilot/state - 2.047243118286133
renterd /api/worker/id - 2.040794849395752
renterd /api/worker/rhp/contracts - 3.458223819732666
renterd /api/worker/state - 2.047581911087036
renterd /api/worker/stats/downloads - 2.038945198059082
renterd /api/worker/stats/uploads - 2.0578346252441406
walletd /api/consensus/network - 2.049074649810791
walletd /api/consensus/tip - 2.0537116527557373
walletd /api/consensus/tipstate - 2.042001485824585
walletd /api/syncer/peers - 2.0476555824279785
walletd /api/txpool/transactions - 2.043919563293457
walletd /api/txpool/fee - 2.0477449893951416
walletd /api/wallets - 2.0392003059387207

Hello @bustedware,

Thank you for you progress report, however, we do require links to public open-source repos in progress reports as part of our grant requirements.

The repo you provided in your grant proposal, bustedware/siagrafana · GitHub, is empty. Can you please link to repos which have been updated this month?

Regards,
Kino on behalf of the Sia Foundation and Grants Committee

Thank you for bringing this to my attention, I submitted my progress to the repo yesterday

Report for December 2nd, 2023

Summary

  • Submitted PR to hostd to add Prometheus metrics API endpoints
    • currently working on moving the endpoints to a different port and adding command line switch similar to addr and http. also looking at writing a prometheus exporter and keeping the endpoints out of hostd but have not started this work.
  • Created hostd fork with Prometheus metrics API endpoints.
  • Created hostd Grafana dashboard which closely mirrors the hostd UI. The dashboard is for displaying a single host at a time and you can filter hosts by network (Mainnet / Zen Testnet). The dashboard makes it easy to switch between various hostd services to get their metrics.
  • Started work on a forked grafana which adds a drop down selector so that you can colorize thresholds by label instead of by value. Currently the block height panel color is static but it would be nice to colorize it based on synced status. A couple bugs and things to do on this effort:
    • selector is added with labels from promql query - finished
    • changing threshold color wipes out the labels from the selector menu for stat panels but not time series panels. - bug
    • selector always highlights first value and current value when making a selection. - bug
    • need to apply colorization logic based on selector value - todo
    • note that this is a work in progress and is not required to import the dashboards from the siagrafana repository. long term goal is to submit a PR and have this baked into master grafana repo.
  • walletd anagami network started and began mining. will build Prometheus metrics API endpoints based off this synced walletd network

Log

  • Funded various wallets on Mainnet and Zen Testnet for hostd and renterd instances

  • Resolved an issue with getting walletd to start syncing by adding -addr localhost:9981

  • walletd gets stuck syncing at 12.6% with this error message syncing with ->173.235.144.230:9981 failed after 0 blocks: block 55949::88081e99 is invalid: transaction 8 is invalid: file contract 0 ends after v2 hardfork

  • 100% complete adding Prometheus metrics for hostd service

  • renterd has a lot of GET API endpoints compared to hostd but I don’t think a majority of them need a respective Prometheus endpoint.

  • walletd does not have very many API endpoints.

  • need to create a sample dashboard showing metrics from 2 hostd services. this will simply be a watered down version of the current dashboard and without the host selector at the top.

Report for January 2nd, 2023

Summary

  • PR213 was closed and PR259 opened for hostd which adds a -prometheus command line option on startup. By default the prometheus endpoints are disabled. When enabled, the metrics are hosted on a different port than the other API endpoints.
  • Created a python script to generate prometheus configuration files for hostd and walletd prometheus endpoints.
  • Opened PR40 for walletd which includes prometheus endpoints.

Log

  • Anagami testnet started for walletd in the itshappening branch.
  • Running a modified version of the itshappening branch in walletd which includes prometheus endpoints.

Thanks for the report! Please follow the progress report template found here in the future.

Final Progress Report for February 2nd, 2024

What progress was made on your grant this month?

  • [complete] convert sia’s hostd, renterd, and walletd endpoints to prometheus endpoints and built into service (forks)
  • [complete] automate building prometheus configuration and deploying grafana dashboards
  • [incomplete] merge to sia official repositories
  • [complete] presentation: https://www.youtube.com/watch?v=UfH0jgraxww&t=324s

Summarize any problems that you ran into this month and how you’ll be solving them.

Please summarize your issues into a few sentences or bullet points:

  • hostd prometheus encoder was developed mid January. so i spent a little bit of time syncing my hostd fork to the main repo in order to strip out the encoder and ensure that the presentation was stable and as complete as possible. it was a good thing I did as I feel like I nearly ran out of time towards the end getting the dashboards, automation, and endpoints all lined up with enough monitoring data to not look boring during the presentation
  • did not have a s3 settings object to reference for renterd prometheus settings endpoint

What will you be working on next?

Since this is a final report I’m going to dump everything left that’s remaining that could/should still be looked at:

  • move grafana api key, hostd, renterd, and walletd secrets out of siagrafana.json.
  • create grafana playlist with each dashboard created using the api. (a playlist is just a collection of dashboards that can automatically scroll around on the screen)
  • create grafana custom siacoin unit. a lot of the panels today have hardcoded units like “SC” or “mSC”, “pSC”. a custom siacoin grafana unit could automatically perform this conversion
  • this forked grafana started as a result of this effort. the forks goal was to allow a panel such as block height be colorized by a label while displaying the block height value. tl;dr not possible, see the top of the README.md in this repo for more details.
  • troubleshoot why metrics like active hosts, used storage, and total storage don’t match siastats (perhaps siastats includes historical hosts?). certain other stats also don’t match the hostd ui but could be due to time windows
  • build combined dashboards for users who are running multiple hostds, renterds, and/or walletds.
    • for example if running 2 hostds a dashboard example where a panel calculates the combined storage of all running hostds. or perhaps a dedicated section in the current hostd dashboard.
  • perform yet another pass of all prometheus metrics and endpoints for storage level optimization. this blurb from a prometheus linkedin article I found summarizes pretty well:

Labels can cause some issues for Prometheus, such as high cardinality, label conflicts, and label misuse. High cardinality occurs when there are too many unique combinations of labels and values for a metric, leading to a large number of time series that consume more memory and disk space. Label conflicts arise when different labels or values are used for the same metric across different sources or systems. Label misuse happens when labels are used for purposes not suitable for the Prometheus data model, such as storing raw data or events. These challenges can result in inefficient or incorrect queries, dashboards, or alerts.

just to make sure prometheus is as optimized as much as it can be

  • much of the metrics involving siacoin output using ExactString() and should be updated to use Siacoins() like in the example here
  • potentially perform automatic rollups for certain metrics to reduce cpu load on grafana end users. ex: % accepting contracts and % scanned metrics are used like this:

sum(renterd_host_settings_acceptingcontracts{instance="$hosts"} == 1) / (sum(renterd_host_settings_acceptingcontracts{instance="$hosts"} == 0) + sum(renterd_host_settings_acceptingcontracts{instance="$hosts"} == 1)) * 100

we can reduce the cpu load in grafana by moving this work into the renterd service.

  • [bug] the dashboard panel ids are not unique, making editing some panels tedious. need to pass through the dashboard template json files and ensure uniqueness to each id.

Thank you!

Hello @bustedware,

Thank you for your progress report and congrats on completing your grant! We will reach out to you about the wind-down of your grant soon.

Regards,
Kino on behalf of the Sia Foundation and Grants Committee