Register read error on colibri T30

Hello,

I’m facing problem with a gpio configuration.
I want to use the pin 25 (GPIO PO3) as a key input (using gpio_keys). Everything worked fine with a standard module, but since my company upgraded to industrial version, the behavior became erratic.

After investigations, it appears that the problem come from the GPIO_INT_LVL register. I add some traces in the kernel, and when it works, the bit 19 (interrupt on both edge) is set, and when it doesn’t it’s cleared.
With my traces in “gpio-tegra.c”, I can see that the bit is correctly set by the driver, but something is clearing it later.

Do you know what are the drivers which are accessing this register? Or what changes between the standard and industrial version of colibri T30 could be involved?

Regards,
François

GPIO PO3 by default is the debug’s CTS signal aka FF_CTS and I guess the UART IP may very well influence its configuration unless you make sure this is not the case. That said I am not aware of any differences between the none-IT and IT variant in this respect. What exact BSP version are you talking about? What carrier board are you using? Is there anything else about your environment which may influence this behaviour?

Hello marcel,

Thank you for your answer. I don’t use UARTA (UARTB is the only port in use). I tried to change the default function in the pinmux declaration, but it didn’t change anything.

I don’t know exactly the revision of the BSP we’re using, I will check it with my project manager next week. All I can tell is that it’s based on a kernel 3.1.10. I’m using the colibri on a custom board, and the difference of behavior between the industrial and standard modules happens on the very same board.

The problem seems indeed to be influenced by the environment, because if we remove all the peripherals on our board, the problem disappear, but what I can’t explain is why, with the same version of software, the problem happen only with IT modules.

I will probably be able to provide more info on next week. Thanks for your help.

Regards,
François

More info about the BSP used:

Kernel: 2.5Beta2 linux-toradex.git - Linux kernel for Apalis, Colibri and Verdin modules

U-Boot: 2.6Beta2 Toradex System/Computer on Modules - Linux BSP Release

Rootfs/Userspace: custom based on Buildroot 2015.05

Hi,

I’m still working on this problem, and I found some new clues that make me think that the problem is finally not totally related with GPIO PO3.

I added new traces on the kernel, and it turns out that sometimes, the value read in the GPIO_INT_LVL register is wrong : as an exemple, it may happen that after 3 consecutive reads (and no writes), I get the value “0x00080800” (which is what is expected), then “0x00000800” (the bit 19 is cleared) and finally “0x00080800” again! I think that the bit 19 was never cleared, but the read failed.

I wrote a user-space program that read into this register every ms, and report if the value changed (which should never happen, because only the gpio-keys driver is writing it at initialization). I found that when I’m playing video, I periodically get the following messages:
NVMMLITE_TVMR: EOS detected
TVMR: Processing of EOS
TVMR: Sending Stream End Event
TVMR: Processing of EOS Done
and when this messages are displayed, the read errors occurs.

Does anybody ever encounter this kind of problem? I hope that there is a solution, because what happens here could potentially happen on other register, this is a real problem for the reliability of the whole software…

François

As usual I would recommend testing the same with our latest BSP. That said I’m really not aware of any such issue.

Hi Marcel,

Thanks for your help and recommendations. We first tried on an Iris board and stock Toradex BSP image with a kernel compiled at the same revision than we have and could reproduce the error where the GPIO_INT_LVL register for port O was sometimes read incorrectly.
I then updated to the latest BSP 2.7b2 kernel sources and the problem no longer appeared. Bissecting through the kernel commits, I isolated the fix to the commit apalis/colibri_t30: initialise tps65911 GPIOs.

The patch is not 100% clear to me and I’d like to better understand the changes to ensure that this commit definitely fixes our issues. Could you tell me a bit more on the TPS65911 GPIOs configuration and how the TPS65911 GPIO pins are connected to the Tegra. From the board-colibri_t30-power.c and tps6591x.c driver, it seems that the GPIO2 pin of the TPS65911 is connected to a “VDD_CORE enable” pin of the T30.

Do you think that without this configuration we may have had some inconsistencies in the powering of the GPIO controller in sleep and so sometimes read inconsistent values ?

As as quick test, you can run the attached gpio-test.c program which programs GPIO port O GPIO_INT_LVL
register with value 0x00888800 and then continuously reads this register.

We noticed that we executed this program in background and then launched a video playback with gst-launch we had wrong values read out of the register (0x00808800).

Steps to reporoduce:

  1. Colibri T30 IT module on Iris Carrier board
  2. BSP 2.7b2
  3. Kernel at commit before “apalis/colibri_t30: initialise tps65911 GPIOs
  4. On target configure GPIO pins O3 and O7 as GPIO inputs with GPIOTool
  5. Compile and run gpio-test.c
  6. Play a video with gst-launch

$ ./gpio-test &
$ gst-launch filesrc -v location=/resources/Videos/turn_off_cycler.mp4 ! decodebin2 ! xvimagesink

Messages from gpio-test indicating read errors are displayed when the video is started:

Read register value changed:
old = 0x00888800
new = 0x00808800

I still cannot fully explain what happens. Can you imagine a rational explanation given the effects we had and the patch which solved it ?

Thanks for your support,

Romain

Thanks for your help and recommendations.

You are very welcome.

We first tried on an Iris board and stock Toradex BSP image with a kernel compiled at the same revision than we have and could reproduce the error where the GPIO_INT_LVL register for port O was sometimes read incorrectly. I then updated to the latest BSP 2.7b2 kernel sources and the problem no longer appeared. Bisecting through the kernel commits, I isolated the fix to the commit apalis/colibri_t30: initialise tps65911 GPIOs.

The patch is not 100% clear to me and I’d like to better understand the changes to ensure that this commit definitely fixes our issues. Could you tell me a bit more on the TPS65911 GPIOs configuration and how the TPS65911 GPIO pins are connected to the Tegra. From the board-colibri_t30-power.c and tps6591x.c driver, it seems that the GPIO2 pin of the TPS65911 is connected to a “VDD_CORE enable” pin of the T30.

No, that is the enable pin of the TPS62362 PMIC providing the VDD CORE voltage.

Do you think that without this configuration we may have had some inconsistencies in the powering of the GPIO controller in sleep and so sometimes read inconsistent values ?

So you are saying you do use sleep modes and such? Of course even misbehaving DVFS could cause all sorts of such instabilities.

As as quick test, you can run the attached gpio-test.c program which programs GPIO port O GPIO_INT_LVL register with value 0x00888800 and then continuously reads this register.

I would really not recommend programming any GPIOs behind the Linux kernel’s back as this could have severe side effects especially together with DVFS and/or sleep modes.

We noticed that we executed this program in background and then launched a video playback with gst-launch we had wrong values read out of the register (0x00808800).

Hm, this could really point to misbehaving DVFS and/or idle/sleep mode. However it could also just be some sort of locking issue in regards to the Linux kernel vs. user space given that rather ugly approach.

Steps to reporoduce:

Colibri T30 IT module on Iris Carrier board

BSP 2.7b2

Kernel at commit before “apalis/colibri_t30: initialise tps65911 GPIOs”

On target configure GPIO pins O3 and O7 as GPIO inputs with GPIOTool

The GPIOTool is really also not meant for anything other than some interactive peaking and pocking. And “real” configuration should be done in the Linux kernel itself.

Compile and run gpio-test.c

Play a video with gst-launch

$ ./gpio-test & $ gst-launch filesrc -v location=/resources/Videos/turn_off_cycler.mp4 ! decodebin2 ! xvimagesink

Well, that one may be rather RAM buffer bound requiring careful locking.

Messages from gpio-test indicating read errors are displayed when the video is started:

Read register value changed: old = 0x00888800 new = 0x00808800

I still cannot fully explain what happens. Can you imagine a rational explanation given the effects we had and the patch which solved it ?

For me the whole test case really looks way too fishy. I suggest you trying a clean use case for proper validation thereof.