Blog:
Flash Health Monitoring on Torizon
Hey there, Leo here! Some years ago, I worked on an innovation project - by now archived - named Flash Analytics Tool. One of the project goals was to research and apply methods to estimate flash memory lifespan on Embedded Linux - more specifically, using Torizon in its early days - thus helping our customers develop flash-friendly applications and get an estimate of how long they could expect their devices to withstand in the field.
Back then, I learned that the e.MMC 5.0 standard provided a high-level overview of the flash health status, in increments of 10%, out-of-the-box. While this didn’t seem to meet all our requirements for lifetime estimation, it did provide a great overview of the flash health status.
Such data allows you, for example, to do preventive maintenance on a product before it fails in the field.
Since then, Torizon evolved and received many interesting new features, including the Torizon Platform Services integration, which in turn, enabled device monitoring.
As of TorizonCore 5.7.0, mmc-utils was also added to the TorizonCore distribution, and it reminded me of the work done in Flash Analytics. While mmc-utils can be run in a container, and it would have been the preferred way otherwise, it doesn’t stop us from using it right away from the base OS.
Having all tools available, I decided to give it a try and monitor things in a device fleet using the Torizon Platform Services.
- Learn how to read and monitor standard eMMC health data on TorizonCore.
- Learn how to visualize it on a time series chart in the Torizon Platform Services.
First and foremost, no pun intended on this section’s title! That said, let’s recap some important concepts - at a high level - to make the best out of this article. Feel free to skip it, or come back as needed.
There are mainly two base technologies behind flash: NAND and NOR. They are named after the respective architecture at a transistor level. Even though NAND costs less and is provided in many options of high storage capacity ICs, the software stack to support its use is much more complex due to its operation.
- You can execute read, write and erase operations
- It is split into blocks
- Cells: the smallest division of a raw NAND, it may contain:
- SLC: 1 bit per cell - smallest density, highest cost, highest reliability and highest lifespan
- MLC: 2 bits per cell
- TLC: 3 bits per cell
- QLC: 4 bits per cell - the opposite of SLC
- It is important to state that MLC, TLC, and QLC can be configured to operate in pseudo-SLC mode, which increases lifespan and reliability at the tradeoff of reduced storage.
- Pages: the smallest array of cells that can be accessed in a single read or program (switch bits from 1 to 0) operation.
- Eraseblocks: the smallest array of pages that can be erased (switch bits from 0 to 1) in a single operation, usually around 512kB to 4MB.
Last but not least, eMMC manufacturers overprovision the ICs with extra eraseblocks known as reserved blocks. They are not seen as additional storage and replace eraseblocks as they become bad, giving extended life to the devices. Often, some blocks become bad very early, much before expected, so the reserved blocks are also there to ensure the nominal capacity is met in an eMMC’s early days.
With time, eraseblocks wear out and become bad. When it happens, you may be able to read it, but you lose the ability to program and erase it. This is called a bad block.
Since the eMMC controller knows how many blocks there are inside the IC and is capable of identifying and marking bad blocks, it is also able to calculate the percentage of the flash that is still available, and the percentage that is worn out. The result of this calculation is standardized values that can be easily read from the Linux user space - as this article will explain how to.
- Device lifetime estimation type A: lifetime estimation for eraseblocks configured as MLC - which is the default for the user area partition (where the OS and data are stored) of MLC eMMCs. If you don’t configure your device as pSLC, you will most likely see this value increase over time. Data is provided in steps of 10%:
- For example, 0x02 means 10%-20% device lifetime reached.
- Device lifetime estimation type B: lifetime estimation for eraseblocks configured as SLC - usually the boot area partition (where the bootloader is stored) blocks and those configured by the user in pSLC mode. Usually, the bootloader area is barely touched and you most likely won’t see this indicator value change significantly over the product lifespan. Data is provided in steps of 10%:
- For example, 0x02 means 10%-20% device lifetime reached.
- Pre EOL information: overall status for reserved blocks. This indicator signalizes that the eMMC lifespan is near its end. Possible values are:
- 0x00 - Not defined.
- 0x01 - Normal: consumed less than 80% of the reserved blocks.
- 0x02 - Warning: consumed 80% of the reserved blocks.
- 0x03 - Urgent: consumed 90% of the reserved blocks.
If you want to learn more about eMMC, read the articles Flash Memory Overview on Toradex Products, eMMC (Linux), or watch the webinar Flash Memory in Embedded Linux Systems.
In the Toradex BSP, the default eMMC device is symlinked to /dev/emmc. This is quite convenient as it standardizes access across our entire range of SoMs. To read each of the aforementioned properties individually:
sudo mmc extcsd read /dev/emmc | grep -i EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_A sudo mmc extcsd read /dev/emmc | grep -i EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_B sudo mmc extcsd read /dev/emmc | grep -i EXT_CSD_PRE_EOL_INFO
An End to End Observability Pipeline Fluent Bit is a super fast, lightweight, and highly scalable logging and metrics processor and forwarder. It is the preferred choice for cloud and containerized environments.
Monitoring functionality in this framework is described through input, filter, and output plugins. I won’t dive deep into Fluent Bit itself, as there is great documentation available on docs.fluentbit.io.
Specific to Torizon, you need to add one input and one filter entry to send custom data and make it readily available in the Platform Services. I also won’t go in-depth here, as we have a dedicated article about device monitoring in TorizonCore.
sudo mmc extcsd read /dev/emmc | \ grep -e EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_A \ -e EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_B \ -e EXT_CSD_PRE_EOL_INFO | \ rev | \ cut -c 1 | \ jq -R -c -s \ 'split("\n") | { "emmc_life_time_est_typ_a": .[0], "emmc_life_time_est_typ_b": .[1], "emmc_pre_eol_info": .[2] }'
To better understand it, I suggest you run it adding one step at a time: first, run mmc alone, then run mmc | grep, mmc | grep | rev, and so on.
[INPUT] Name exec Tag emmc_health Command mmc extcsd read /dev/emmc | grep -e EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_A -e EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_B -e EXT_CSD_PRE_EOL_INFO | rev | cut -c 1 | jq -R -c -s 'split("\n") | { "emmc_life_time_est_typ_a": .[0], "emmc_life_time_est_typ_b": .[1], "emmc_pre_eol_info": .[2] }' Parser json Interval_Sec 300 [FILTER] Name nest Match emmc_health Operation nest Wildcard * Nest_under custom
sudo systemctl restart fluent-bit
- Because Fluent Bit is a daemon and runs as root, there is no need to use sudo in the configuration file.
- To be shown in the Platform Services, data must necessarily be nested under custom. This is specific to Torizon and how we implement it on the server side.
- The time interval of 300 seconds (5 minutes) is unrealistically small for measuring flash health in increments of 10%, as provided by the eMMC standard. The ideal interval might depend on how much your application writes. From a very simple test I ran to wear the flash as fast as possible, it took roughly half a day to see a 10% increment. Especially for devices on a cellular connection, to save bandwidth, it makes sense to increase the interval.
Also, know that you can get extra inspiration and learn more from another example of Disk usage custom metric.
You might be wondering how to replicate this configuration without doing it manually over and over again, for hundreds or maybe thousands of devices.
To answer that question, use the TorizonCore Builder tool to capture the Fluent Bit changes and create a custom TorizonCore image. This workflow allows you to install the custom TorizonCore during production programming with Toradex Easy Installer and send updates to devices already deployed in the field.
Before switching our focus from TorizonCore to the Torizon Platform Services, it is a good time to go grab a cup of coffee or have a look at your social network feeds. It might take a few minutes until data is sent to and shows up on the platform.
- Either: in the device management section → select a device → click the action View Detail → make sure you are in the device information tab:
- Or: in the fleet management section → select a fleet → click the action View Detail → make sure you are in the fleet overview tab:
No matter which you choose, the option to customize metrics will be presented to you. It should be intuitive to add and configure a chart, so I won’t describe it step-by-step. Here is my eMMC Health chart configuration:
And it’s all set! All you have to do is wait for data points to arrive at the platform over time. In my setup, I’ve used a fleet with 8 devices:
In the chart above, we can see that the device BR-J-08 has an unusually high health degradation. Looking at it more closely:
The affected device BR-J-08 ran a script over the weekend that wears out the flash. I don’t recommend you do it on your device, at all. In any case, you can find the script in Appendix I - wearing the flash on purpose.
To learn more about device monitoring, watch our webinar on-demand Secure Device Monitoring - Check Health, Resources and Performance and read the article Device monitoring in TorizonCore.
- TorizonCore has included mmc-utils, you can use it out-of-the-box for device monitoring (even though you also could do it in a container before).
- Sending any metric you have access to the Platform Services is easy. You are not constrained to the defaults.
- Visualizing fleet and device time series data is very easy and powerful. You can identify anomalies in a timely manner to have them fixed before the device fails.
- I can imagine a bright future when we implement device monitoring alarms. Specifically to the eMMC, it would be so convenient to set an alarm on the Pre EOL information.
Given all of that, I'd love to hear about your experiences and learn what is important to you, what you like, and what you think is missing.
See you on my next blog, bye!
- https://labs.toradex.com/projects/flash-analytics-tool - a Toradex Labs project, from the early days of Torizon. Even though it’s archived, you can get some inspiration from it.
- https://developer.toradex.com/linux-bsp/how-to/hardware-related/flash-memory-overview-on-toradex-products - the basics of flash memory on Toradex SoMs.
- https://developer.toradex.com/linux-bsp/how-to/boot/emmc-linux/ - useful practical information about eMMC and Linux.
- https://developer.toradex.com/torizon/how-to/torizon-updates/device-monitoring-in-torizoncore/ - how to monitor parameters in TorizonCore.
- https://developer.toradex.com/torizon/how-to/torizon-updates/torizon-platform-services-web-interface - the basics of the Platform Services web UI.
- https://developer.toradex.com/torizon/working-with-torizon/image-customization/capturing-changes-in-the-configuration-of-a-board-on-torizoncore - a how-to guide for making OS services customization reproducible, using TorizonCore Builder
- https://developer.toradex.com/torizon/in-depth/torizoncore-builder/torizoncore-builder-tool-customizing-torizoncore-images/ - an overview of TorizonCore Builder, how to install it, basic usage, and more.
- https://developer.toradex.com/torizon/how-to/torizon-updates/first-steps-with-torizon-remote-updates/ - deploy changes captured with TorizonCore Builder to devices in the field.
- https://developer.toradex.com/easy-installer/#what-is-the-toradex-easy-installer - install your customized TorizonCore on up to thousands of boards during production programming.
- https://developer-archives.toradex.com/software/torizon/release-details - keep an eye on new features and known bugs.

Passively watching the flash wear takes a really long time. Here is the super simple script I left running over the weekend to wear the flash quickly:
while true; do dd if=/dev/urandom of=/home/torizon/testfile bs=4096 count=250000 sync rm /home/torizon/testfile sync done
