CAN performance problem with MCP25XXFD

Hi,

we are building the Yocto Linux by using the official Toradex meta layers for the mentioned platform. During first tests of the drivers/net/can/spi/mcp25xxfd CAN driver we saw some stability problems. After patching the driver to the latest version and applying 3 other bugfix patches the driver runs stable. However already under medium bus load the driver consumes a lot of CPU and loses messages on Rx side. In the test setup we use 2 CANs: the CAN from the SOM module and a second CAN with the same CAN controller (MCP2518FD) on our carrier board. For the CAN load test we connect both CAN lines and terminate it with 120Ohms. Then we start the “cansequence” tool which sends an incrementing integer sequence which is checked on receiver side. So both CAN lines are operated in Rx and Tx direction at the same time. The attached image show the CPU load of the involved processes. It sums up to 40% CPU load when we have only 55% bus load. In addition many CAN messages get lost. The messages are correctly transmitted but they get lost during Rx handling.
Have you seen such problems already when testing the CAN driver under load?

BR,
Harald

Hi Harald, did you see this thread?
https://www.toradex.com/community/questions/60859/mcp2517fdmcp2518fd-can-controller-driver-errors-on.html?childToView=61191#answer-61191
I think Toradex are evaluating internally at the moment.

Hi @edwaugh,

High interrupt rate is performance killer anyway. Is it VF50 with no L2 cache or iMX8, both will suffer a lot from high interrupt rate. What is your message rate? Buss load doesn’t really matter, but message rate does, it is directly proportional to interrupt rate.

I didn’t know about cansequence. Source code doesn’t look having many threads, but in your screen shot I see many cansequence instances. One for send, one for receive and another two for what? What I learned in the past with FlexCAN that for better CAN performance only one application should be accessing CAN socket and I had to recombine 3 different executables into single one. So first try one machine to act as sender and another one as receiver.

Edward

Hi @harricane and Welcome to the Toradex Community!

Regarding your Setup, what is your communication speed?
Could you share some commands or scripts to do your tests?

Thanks and best regards,
Jaski

Hi Edward,

yes, high interrupt rate is bad for performance but cannot be avoided for realtime systems. We have a system with 1kHz update rate where several messages are sent over CAN. We operate the 2 CAN lines at 1MBit.
Regarding test setup: both CANs are sending and receiving to/from each other.
The attached script is used to run the test. As you see cansequence is started for each line and for each direction (Rx/Tx). The frequency parameter of the tool is not available in the original version.
Your advice to try only one sender and one receiver scales down the problem quite linearly. This seems not to be a big issue. Most time is spent in driver work.

BR,
Harald

link text

Hi Jaski,

I aswered your question also in my response to Edward. Up to now it looks like the overhead of Linux interrupt processing plus the processing of SPI transfers is quite costly. A more detailed analysis with tools like Kernelshark could show where most time is spent.

BR,
Harald

Hi @harricane,

Linux is not real time OS. But 1kHz CAN message rate is OK even for much slower Colibries than iMX8. BTW how many cores do you have? 40% CPU usage is out of 100%*Number of cores. For 2 cores it really mens only 20% of total CPU usage, for 4 cores - only 10%.

My advice was regarding single CAN controller instance. SocketCAN allows many apps to share single CAN controller. I meant that many CAN apps sharing single CAN interface use CPU more compared to single application serving the same functionality of those separate apps. That’s just my experience. All separate apps were sending/receiving. One cansequence for Rx. another one for Tx perhaps is different, but kernel still feeds both open sockets with Rx data…

Most time is spent in driver work.

How do you know? Sometimes top/htop may full you. Of course driver plays central role, but CPU usage depends as well on what’s done in you apps. For example sending received CAN message immediately to TCP could double interrupt rate…

Edward

Hi @harricane,

I hope you wired your MCP25XXFD instances to different SPI ports?

Edward

Hi @Edward,

yes, we use two different SPI ports. We did the same CAN test on the i.MX8M Plus which has an on-chip FlexCAN controller. There the CPU load which was shown in TOP related to driver kernel threads was significantly lower, i.e. ~5% per CAN line. This is not unexpected since a lot of time is consumed by SPI processing. If the TOP tool does not show wrong values for the kernel threads then it gives a good indication of the problem. As said, a deep analysis has to be done with Kernelshark. However, I do not expect a lot of pontential for performance boosts with the SPI CAN controller operated in Linux. I assume with realtime OS like SysBIOS or FreeRTOS the controller could be operated under all load conditions in a stable way.
Your comment that CPU usage increases with the number of open CAN sockets is true. But as far as I saw the actual driver work is idenpendent of the number of open sockets. Rx messages are pushed to the network stack from IRQ context and Tx messages are handled by the softirq kernel thread(s).
I was hoping that Toradex did already some CAN performance tests and have some ideas how to deal with the problem … if possible at all.

BR
Harald

Hi @Edward,

yes, we use two separate SPI ports. I wrote a more detailed answer but somehow it always gets deleted in this forum. Strange …

Harald

hi @harricane

You answer was just in Moderation, which I just released.
Thanks for the information. We have done some tests, but not fully finished.
I will provide you some results this week.

Best regards,
Jaski

Hi @harricane

Yes, socketcan is costly. Yes, top gives indication but not always shows right numbers, user app behavior may significantly affect numbers top/htop shows for kernel threads. Yes, SPI is costly. I believe SDMA is able to handle several SPI transfers involving hardware CS toggling in the right places without CPU intervention. But it is not implemented. For eCSPI, which is able to queue up to 64 bytes, SDMA is practically never user with MCP25XXFD.

mcp25xxfd driver performs significantly better compared to mcp251xfd driver. Both are for the same chips MCP2517FD/MCP2518FD. You may try.

Comparing to sub-100MHz MCU’s, which handle FlexCAN easily at top speeds, Linux driver puts tons of load on much more capable CPUs. It looks unavoidable. Someone wondered in this Community how do I run 1Mbps CAN on Linux, it is so unpractical. That’s just for idea will someone bother a lot to speedup CAN on Linux or not. Yes I do use 1Mbps CAN, for best results I use M4 helper. Without M4, mcp25xxfd driver solves my needs for single CAN instance, it is capable to handle all 1Mbps Rx CAN traffic on iMX7D, of course with well made user space software.

Best Regards

Edward

Hi @Edward,

thanks for your feedback. This confirms our test results.

BR,
Harald

Hi @harricane,

@Edward described the different Can peripherals very well. On my first tests, I had similar results. Do you need any support for your issue or is it clear which way you will continue for your application?

Best regards,
Jaski

Hi @jaski.tx,

we are currently investigating options to enable SDMA for the SPI data transfers. As a second option we check if we can use the M4 for the CAN processing.
For now we have no further question.

Thanks,
Harald

Hi @harricane

Thanks very much for your feedback. I think too processing CAN messages on M4 could be a solution. Regarding SDMA for SPI transfers, why do you think it is not working for iMX8MM?

Best regards,
Jaski

Hi @jaski.tx,

Regarding the SDMA on SPI topic: I just refered to the statement of @Edward who said “… I believe SDMA is able to handle several SPI transfers involving hardware CS toggling in the right places without CPU intervention. But it is not implemented”. Can you confirm that SDMA for SPI works for the imx8m?

I have another question. Do you know an implementation of the mcp25xxfd device driver for the M4?

BR,
Harald

Regarding the SDMA, there are some NXP erratas also around the SPI transfer and not all are corrected yet. Currently we are avoiding the issue using GPIO chip select which is obviously not the best option for Performance.

I have another question. Do you know an implementation of the mcp25xxfd device driver for the M4?

No, we don’t have any Implementation. But we are internally considering this option.

Best regards,
Jaski