CAN packets dropped if high bus load on Apalis iMX8 Boot2Qt image and working GUI

Hello.
I do testing Apalis iMX8 to check for our case (intensive care ventilator). One of testing part is CAN bus. While testing on correct CAN bus full load at bitrate 1 MB, I’ve noted about dropped CAN packets.

b2qt-apalis-imx8:/# ip -s link show can1
3: can1: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 100
link/can 
RX: bytes  packets  errors  dropped overrun mcast   
0          16003669 0       120     0       0       
TX: bytes  packets  errors  dropped carrier collsns 
0          0        0       0       0       0 

CAN bus load is generated by my evaluation board TMDSRM48USB running on my special test program. That program generate CAN SFF packets with zero length with period of 48…49 microseconds, and I noted that number of transmitted packets does not equal to number of received packets in Apalis iMX8 application. When I check statistics, I saw that there is packages drop at driver level.
CAN packages dropped only if standard Boot2Qt demo is working.

What can I do with that? I don’t need real-time, but correct CAN stream shouldn’t lead to drop packages.
#renice -20 [pid] expectedly didn’t change situation.

Regards,
Vitaliy

It seems that problem is in top/htop utilities which I use in other terminal while testing.
If no working htop, then CAN packets didn’t drop even with standard NI=0 and working GUI at FullHD resolution.

Regards, Vitaliy

Perfect that your issue is solved. Could you update the version of the hardware of your module in Environment?

Thanks and best regards,
Jaski

Sorry, not solved.
Without htop, CAN SFF drop is less than with htop, but drop exist.

So, I ask again: what should I do to eliminate a drop of packets?

Hi @Vitaliy
Can you check what happens if you move you application to a Cortex A72 processor?

taskset -c 4,5 <your program>

Regards,
Stefan

Hello, Stefan.

After 3 minutes of running functional test with full CAN bus load (I run executable on Apalis iMX8 as you recommend):
send 3’831’515 packets, received 3’831’394 packets, dropped 121 packets. Application ran from serial console.

htop was not working; periodically (total 3 times) I ran #ip -s link show can1 to check dropped packets. Checking was done through ethernet (ssh) in standalone terminal.

Other run: dropped 62 packets of 1’037’601 sended; now I ran my application and checking application from serial terminal, without ethernet connection.

Regards, Vitaliy

Hi @Vitaliy

I overread that you’re using the RT extension. I therefore guess that the priority of the can interrupt is not high enough. Can you try to rice this priority with the chrt command?
https://www.cyberciti.biz/faq/howto-set-real-time-scheduling-priority-process/
I recommend to increase it to 99 for a first test. You can then later decrease it again:

chrt -p 99 <pid>

You can check the priorities of all your processes with the following commands:

ps -eo pid,cmd,rtprio

Regards,
Stefan

Hi, @stefan_e.tx
After an execution of all your recommendation and 2 minutes run: total sended 3’743’132, dropped 37, successfully received 3’743’095.
I suppose that problem is not in my application, but in Linux kernel or SocketCAN driver. And change priority of my application has no effect. Using htop, I see red bar for my application, not green. Here is a screenshot of htop.

Testing for #ping -f pid simultaneously from two terminals from my PC doesn’t lead to drop any of ethernet package, so IMHO ethernet driver is correctly designed, in contrast of CAN.

Sometimes I see much more seriously problems: overrun counter is not zero.

b2qt-apalis-imx8:~# ip -s link show can1
3: can1: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 100
    link/can
    RX: bytes  packets  errors  dropped overrun mcast
    0          9987413  1       74      1       0
    TX: bytes  packets  errors  dropped carrier collsns
    0          0        0       0       0       0

Regards,
Vitaliy

Hi @Vitaliy

No I couldn’t see the first comment. We didn’t verify CAN on Linux RT and probably also NXP didn’t do that yet so a bug is not completely impossible. However, I didn’t mean that you should increase the priority of your application. You should increase the priority of the CAN interrupt because if this interrupt is delayed such overruns can happen. Can you please post the output of the following command?

 ps -eo pid,cmd,rtprio

Regards,
Stefan

@stefan_e.tx
Here it is.
link text

Hi @Vitaliy

I would definitely not put your application to priority 99 because now you can block other interrupts. However, I can’t see the can interrupt at all. This could be because something changed in the kernel which I’m not aware of or because the interface was not up. I would expect that there is an irq/60 task somewhere. Can you post what /proc/interrupts shows? Please make sure that when you run the commands that the interface is up and running.

Regards,
Stefan

Hmm, I checked again on my system. Are you sure you’re using the RT extension? Did you compile the kernel yourself or what kind of image did you install? What is the output of:

zcat /proc/config.gz |grep PREEMPT

Hi, @stefan_e.tx
I can check system only on Tuesday (kernel configuration options).
As I remember, RT_PREEMPT was on uname -a output and it was in the name of a system during boot time.

Of course, can1 is started and receiving packets at that time of running ps, elsewhere my application can’t run because of errors while binding CAN socket.

I use image from meta-b2qt-embedded-qbsp-x86_64-apalis-imx8-5.14.1.qbsp downloaded from my Qt account page and I not compile anything.

Regards, Vitaliy

HI @Vitaliy

What is the output of zcat /proc/config.gz |grep PREEMPT?

and I not compile anything.

You can compile the kernel yourself with the RT_PATCH as explained here.

Best regards,
Jaski

Hi, @jaski.tx
Here is out:

b2qt-apalis-imx8:~$ zcat /proc/config.gz |grep PREEMPT
CONFIG_PREEMPT_RCU=y
CONFIG_PREEMPT_NOTIFIERS=y
# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
CONFIG_PREEMPT=y
CONFIG_PREEMPT_COUNT=y
# CONFIG_DEBUG_PREEMPT is not set
# CONFIG_PREEMPT_TRACER is not set
b2qt-apalis-imx8:~$ uname -a
Linux b2qt-apalis-imx8 4.14.117-0+ge43e3a26e1b7 #1 SMP PREEMPT Fri Nov 22 18:59:43 UTC 2019 aarch64 aarch64 aarch64 GNU/Linux
b2qt-apalis-imx8:~$

Hi @Vitaliy

Okay you don’t use the RT extension so the stuff with chrt is pointless. However, this makes it even stranger.

We have to try to reproduce your issue. Can you maybe share your test program? Or do you just do a candump?

Regards,
Stefan

Hi, @stefan_e.tx

I don’t need to run any application program (except standard Boot2Qt demo which run automatically after power-on board) to reproduce the issue.
It is easy reproduced by next steps:

  1. flash standard Boot2Qt image from Qt creator or TEZI
  2. $su
  3. #ip link set can0 up type can bitrate 1000000
  4. run program on other side to generate CAN traffic; you don’t need to run any user application on Toradex Apalis iMX8
  5. #ip -s link show can0 and see something like below:

"#ip -s link show can0   "
2: can0:  mtu 16 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 10
    link/can 
    RX: bytes  packets  errors  dropped overrun mcast   
    0          513251316 48      1115    48      0       
    TX: bytes  packets  errors  dropped carrier collsns 
    0          0        0       0       0       0

For convenience, here is my code working on Apalis iMX8 side.

If you want, I can attach my program working on RM48USB side (generation of full CAN traffic with SFF frames).

Here is output of $cat /proc/interrupts.

Hi @stefan_e.tx

candump can’t be used for generation of full CAN traffic.
There is other issue (discovered after my first testing).

Look at that graphic. It is the time distribution density for sending 1’000’000 CAN packets from Apalis iMX8.
First graphic show that real time delta between CAN SFF packets in CAN bus may be between 54 and 1110 microseconds, it is impossible in realization of SocketCAN in Apalis iMX8 to create traffic with minimum available delta of 47 microseconds.
zoomed graphic
full graphic

I have .gnuplot script which generate that plots and original text file with 1’000’000 recods; one record fix the time between two consecutive CAN packets. If need, I attach it.

HI @Vitaliy

We would need to reproduce this issue on our side and then we can come back to you.

Best regards,
Jaski

HI @Vitaliy

I would ask you for more patience in this, since we did not have time to reproduce the issue.

Best regards,
Jaski