We detected a problem when using the CAN-FD bus on a Verdin module mounted on a dahlia carrier board.
When running candump on the verdin board and receiving frames with different lengths there are sometimes some bytes which are “wrong” and look like a copy of another position in the frame.
Monitoring the traffic on the CAN bus using an IXXAT USB converter does not show wrong bytes.
There are no Error Frames generated on the bus.
Different baud rates were tried and also with and without BRS, but all are showing this behavior.
Any help in fixing this issue is very appreciated.
Hi @swisstinu and Welcome to the Toradex Community!
Thanks for contacting the Toradex Support.
Could you provide the exact version of the used Software ( uname -a)?
Have you done any changes to the kernel and device-tree? If yes, please share these changes?
Regarding your tests setup, could you provide the commands used on host and target side?
I cloned the kernel from toradex_5.4-2.1.x-imx branch @ 1266d0110fce and applied the preempt_rt patch 5.4.77-rt43 to it and built a fully preemptible kernel (RT).
No modifications were done to the device-tree.
ip link set can0 type can bitrate 1000000 sample-point 0.75 dbitrate 1000000 dsample-point 0.75 fd on
On Verdin (Rx): candump can0 -e -d -x
On PC (Tx):
a script which loops:
cansend can0 123#00
cansend can0 123#00 01
…
Note: the same kernel but with the device tree for apalis and ixora works without issues on an iMX8QM
Thanks for the Input. Apalis iMX8QM is using flexcan from SoC but the verdin module has an external SPI CAN chip. So the issue might be the driver in combination with RealTime Kernel. Let me reproduce this issue using a RT-Image.
I tested on my side with the software version Linux verdin-imx8mm 5.4.77-rt43-5.2.0-devel+git.1266d0110fce and I don’t see any bytes which are long. I was using candump and for different lengths, the received package was filled with 00 at the end.
Could you install the following image on your side and check if you still see the issue?
TDX Wayland with XWayland RT 5.2.0-devel-20210124+build.200
As you have recommended I installed the TDX Wayland with XWayland RT 5.2.0-devel-20210124+build.200 image and run candump via ssh.
But I still can see the issue, it can occur for some time and then hide and later occur again. It is not filled with 0, it more looks like a copy of the first byte.
can0 RX B - 123 [01] 00
can0 RX B - 123 [02] 00 01
can0 RX B - 123 [03] 00 01 00
can0 RX B - 123 [04] 00 01 02 00
can0 RX B - 123 [05] 00 01 02 03 04
can0 RX B - 123 [01] 00
can0 RX B - 123 [02] 00 01
can0 RX B - 123 [03] 00 01 02
can0 RX B - 123 [04] 00 01 02 03
can0 RX B - 123 [05] 00 01 02 03 04
then I changed my script to have 0C as the starting byte to avoid the 00 to show that it is not filled.
Then I receive such things:
can0 RX B - 123 [02] 0C 01
can0 RX B - 123 [03] 0C 01 0C
can0 RX B - 123 [04] 0C 01 0C01
can0 RX B - 123 [05] 0C 01 0C01 02
can0 RX B - 123 [01] 0C
can0 RX B - 123 [02] 0C 01
can0 RX B - 123 [03] 0C 01 0C
can0 RX B - 123 [04] 0C 01 0C01
can0 RX B - 123 [05] 0C 01 02 03 04
here is exactly my testscript:¨
#!/bin/bash
while :
do
cansend can0 123#0C
cansend can0 123#0C01
cansend can0 123#0C0102
cansend can0 123#0C010203
cansend can0 123#0C01020304
done
If I use fixed frame size then no data is being corrupted:
can0 RX - - 123 [5] 0C 01 02 03 04
can0 RX - - 123 [5] 0C 01 02 03 04
can0 RX - - 123 [5] 0C 01 02 03 04
Could you maybe test again on your side using this script?
I thought you had issues with FlexCAN, but since you have something like me and @jaski.tx says it’s SPI, I checked and see it is indeed the same MCP2518FD. I see something like you using this chip and like you it is “it can occur for some time and then hide and later occur again”. Doing several variable payload size transfers driver enters bad state and damages received message payload. Usually 3rd and 4th byte of payload gets overwritten with 1st and 2nd. Then several similar transfers later driver enters good state and everything is fine, then again and again bad-good.
I’m going to integrate MCP2518FD in our board using Colibri. I verified already with other tools that messages are sent on bus not damaged, so MCP should receive them OK.
I had VF61 M4 code working, which talked to CAN bus using MCP2518FD, but I didn’t notice this issue. Perhaps I missed it. It will take me some time to reenable MCP2518FD on M4 so I could recheck if issue happens in bare metal or not.
Please, if someone confirms it is or it isn’t MCP2518FD HW issue sooner than me, then please let community know about it.
BTW, it doesn’t depend are messages send in FD format, with BRS on of off, issue occurs even receiving standard CAN messages.
There are two drivers for MCP2518FD, one from Martin Sperl and another one from Marc Kleine-Budde. Latest version of this has issue, and 2nd one didn’t work me. Marc’s version sends up to 4 messages and then complains about buffer full and doesn’t receive anything. Perhaps I missed something in device tree or didn’t integrate it properly.
I could get rid of the issue. Using the mcp251xfd driver from Marc Kleine-Budde from 30 Sep 2020 and backporting it to toradex kernel 5.4.77 I cannot reproduce the misbehavior with my script.
Could you please point me where did you get driver sources? Versions of mcp251xfd I tried were unable to receive any messages and sent only up to 4 messages, then complained about buffer shortage and didn’t send anything until if down-up.
This version is working on my side on the Verdin. But, if I stress the bus with a lot of frames without delays I get the warning “mcp251xfd spi2.0 can0: RX-0: MAB overflow detected”. Maybe this could be optimized using other SPI settings.