Apalis imx6 corrupted kernel VFAT partition on eMMC at boot

Hello, I have a bunch of Apalis iMX6 V1.1A installed on some custom boards which sometimes fail to boot. These boards were bought more than a year ago and updated to V1.1B by flashing the Toradex Easy Installer first. Then a partition table with a custom linux kernel and rootfs was flashed months ago using the Toradex Easy Installer infrastructure.
Sometimes in case of fast power switch on-off-on (this is the only way I have to shut down the boards as I have no other ways to force a clean shutdown), for unknown reasons the u-boot cannot mount/find the uImage file, thus bricking the board. Please see the log here:

U-Boot 2016.11-2.7.4+g1b121c6 (Oct 04 2017 - 22:34:15 +0000)

CPU:   Freescale i.MX6D rev1.5 at 792 MHz
Reset cause: POR
I2C:   ready
DRAM:  512 MiB
PMIC:  device id: 0x10, revision id: 0x11, programmed
MMC:   FSL_SDHC: 0, FSL_SDHC: 1, FSL_SDHC: 2
auto-detected panel vga-rgb
Display: vga-rgb (640x480)
In:    serial
Out:   serial
Err:   serial
Model: Toradex Apalis iMX6 Dual 512MB V1.1A, Serial# 04862190
Net:   using PHY at 7
FEC [PRIME]
Hit any key to stop autoboot:  0

Booting from internal eMMC chip...
reading imx6q-apalis-eval.dtb
51808 bytes read in 17 ms (2.9 MiB/s)
reading uImage
Error reading cluster
** Unable to read file uImage **

emmcboot failed

I tried recovering the board by booting the Toradex Easy Installer from ram memory and verified the FAT partition (were the uImage file is stored) and its content. The VFAT partition was faulty (flag bit was set), although the uImage file was healthy. I recovered the partition using fsck.vfat and the issue at boot disappeared.

I have some questions:

  • I have both Apalis iMX6 V1.1A and V1.1B. They share the same u-boot, kernel and rootfs. Up to now, the issue appeared only on V1.1A boards. Is this issue fixed in V1.1B?
  • Searching online, I found someone else which had a similar problem using iMX6 custom boards (not Apalis). They solved the issue disabling the Spread Spectrum feature at boot. Is it possible to do the same on these boards? Is it something I can do in u-boot?
  • Do you have any suggestions, as maybe “version X of u-boot solves that issue” or any other help on that matter? My u-boot and kernel is quite old and I could update to a more recent release in the future, if this can be a fix for that issue.

The instruments where these boards are mounted should work for many years and bricking them (aka: not being able to boot) as it happened with these V1.1A boards is something that scares me.

Thank you for your help,

Hi @hyperion

Welcome to the Toradex Community!

Could you provide the software version of your module? Which carrier Board are you using?

Then a partition table with a custom linux kernel and rootfs was flashed months ago using the Toradex Easy Installer infrastructure.

What changes have you done and how did you create the custom kernel and rootfs?

Sometimes in case of fast power switch on-off-on (this is the only way I have to shut down the boards as I have no other ways to force a clean shutdown), for unknown reasons the u-boot cannot mount/find the uImage file, thus bricking the board. Please see the log here:

Why are you not running the reboot command? Have you seen this error also with the Toradex Uboot and Kernel?

I tried recovering the board by booting the Toradex Easy Installer from ram memory and verified the FAT partition (were the uImage file is stored) and its content. The VFAT partition was faulty (flag bit was set), although the uImage file was healthy. I recovered the partition using fsck.vfat and the issue at boot disappeared.

Is the Issue solved or does it occur again after fast power off-on?

I have some questions: - I have both Apalis iMX6 V1.1A and V1.1B. They share the same u-boot, kernel and rootfs. Up to now, the issue appeared only on V1.1A boards. Is this issue fixed in V1.1B? - Searching online, I found someone else which had a similar problem using iMX6 custom boards (not Apalis). They solved the issue disabling the Spread Spectrum feature at boot.

Where did you get this Information?

My u-boot and kernel is quite old and I could update to a more recent release in the future, if this can be a fix for that issue.

You should update to Bsp 2.8b5 and check if this Issue is still present.

The instruments where these boards are mounted should work for many years and bricking them (aka: not being able to boot) as it happened with these V1.1A boards is something that scares me.

Our modules are designed to run 24/7 in critical applications. Lot of our customers use these modules in extreme environmental conditions. So these modules should work for many years.

How often do you do write operations on the flash?

Best regards, Jaski

Hello Jaski,

thank you for your interest in this issue.

Could you provide the software version
of your module? Which carrier Board
are you using?

The u-boot is version is

*U-Boot 2016.11-2.7.4+g1b121c6 (Oct 04 2017 - 22:34:15 +0000)*

The kernel is:

*Linux TP500-020107 3.10.17-1.1-thermo-00001-ged459347f9e0-svn-dirty #20 SMP PREEMPT Fri Dec 15 12:51:06 CET 2017 armv7l GNU/Linux*

which is a custom version with some minor modifications in a peripheral driver since a backported CAN driver was used instead of the included one.
The rootfs is based on BSP version 2.4 (angstrom v2014.12).
The carrier board is custom made. The board exposes some buses, ethernet, usb and serial. No human interface. However, when the issue appears, moving the Apalis imx6 to a Toradex Evaluation Board does not solve the issue. The only way to fix it is to reflash the board OR fsck.vfat the VFAT partition from command line using Toradex Easy Installer in ram. The VFAT partition is never mounted when the system boots. Nobody writes/uses the VFAT partition since it only includes the uImage file and the device tree configuration. This partition is read from the bootloader.

What changes have you done and how did
you create the custom kernel and
rootfs?

I haven’t managed that build so I don’t know exactly how it was built. The kernel was built using standard make and rootfs was built as per Toradex docs.

Why are you not running the reboot
command? Have you seen this error also
with the Toradex Uboot and Kernel?

Because the instruments where these boards are supposed to run have only a power switch on mains. There is no way to have the user issue a reboot command otherwise.

Is the Issue solved or does it occur
again after fast power off-on?

The issue is not solved. I am not able to provide a detailed statistics on how many times the issue arises vs boards number, but we have less than 10 v1.1A boards here and the issue appeared in at least 5 cases. These boards are not totally under my control so I cannot power cycle them and see if the board bricks itself. In the next days I will be doing some more tests using a V1.1A board and power cycling it and see if there’s a pattern. At the moment I can see the issue happened only on Apalis imx6 v1.1A boards. Here we have less than V1.1B and none of them had this issue.

Where did you get this Information?
I got the spread spectrum information from here: https://community.nxp.com/thread/444400

You should update to Bsp 2.8b5 and
check if this Issue is still present.

I will do. Maybe the u-boot has been changed to be able to read through the broken VFAT partition.

Maybe you have an insight into this kind of timing issues or known errata (maybe the spread spectrum thing here above) that could prevent the eMMC to boot in such cases.

Thank you,

Hi
Thanks for you answer.

You have a old U-Boot and Kernel version, which is not supported any more. It is not recommended to use this version for Production. Please update to the latest Bsp at your earliest convenience.

Maybe you have an insight into this kind of timing issues or known errata (maybe the spread spectrum thing here above) that could prevent the eMMC to boot in such cases.

The spread spectrum is not at all related to this vfat partition issue.

Please let us know your results once you have done more tests.

Best regards, Jaski