Best interface for low-latency data transfer from external microcontroller to Linux?

m.sauer · May 4, 2020, 3:40pm

Hello,

we would like to transfer real-time critical measurement values from an external Cortex-M4 processor to the Linux user space program running on our Apalis iMX8QM. In the future, we plan to port the code from the external Cortex-M4 to the internal Cortex-M4 on the iMX8-SoC to maximize performance. But for the moment we don’t want to change the external system that is taking the actual measurements.

The measurement results are provided by the external processor at a data rate between 1 to 4 kHz. Each measurement result is 8 Bytes long and should be transfered to Linux user space as fast as possible.

Could you provide advice which interface would give the best performance for this task? On the external Cortex-M4 we have SPI and UART availible.

UART would be easier to implement on the Linux side, but I don’t know if there are any buffers in the TTY-Layer that will degrade real-time performance? From the theoretical bandwidth SPI would be a little bit faster, but setup times for a SPI transfer we’re seeing on our actual setup are longer than the actual SPI transfer of 8 bytes.

Any recommendations would be highly appreciated.

Best Regards,
M.Sauer

m.sauer · May 7, 2020, 2:29pm

Dear @diego_b.tx ,

thank you very much for your thoughts! So there no is delay from any buffer layers to be expected, no matter which interface we choose? I asked because we had some trouble with delays when trying to use RPMsg to communicate with the M4 on the iMX7 SoC: RPMsg linux driver and delays - Technical Support - Toradex Community

diego_b.tx · May 11, 2020, 8:50am

Dear @m.sauer

You’re welcome.
Well, there can be delays, but mostly independent of which communication interface you are using. You have to keep in mind that Linux is not a real-time operation system and you always have to expect delays, depending on many factors e.g. usage of the system, amount of processes and interrupts on the system, architecture of the system and the application and more. The Linux kernel is a very complex system and it’s not possible to make a common and accurate estimation of expected delays. Therefore you have to check your particular use-case and check (maybe with a prototype) if you can meet your requirements. And if you can’t meet it from the beginning on, you can start trying to optimize your system.

There is the option to build our images with the PREEMPT_RT patch, which by definition improves latency, increases responsiveness, and makes Linux more suitable for desktop and real-time applications. But still, this doesn’t make Linux a RTOS, it’s more an adoption of Real Time. The reference distro configuration can be found at layers/meta-toradex-distro/conf/distro/tdx-x11-rt.conf in our meta-layers.

Best regards
Diego

m.sauer · May 11, 2020, 2:27pm

Dear @diego_b.tx,

thank you very much for the feedback! I know that Linux isn’t a RTOS, even with PREEMPT-RT enabled.

But I would expect a latency when using a communication interface, that’s in the range of the worst case latency measured by cyclictest. For a system like the IMX8, that’s typically below 100µs.

During our experiments with RPMsg, we’ve encountered latencies of up to 20 ms, which is way slower than I would’ve expected. As far as I understood, some TTY buffer was the reason for this behaviour.

We haven’t found the time yet to test the fix to the char driver, mentionend here, but I hope it improves this latency for RPMsg.

Would you expect delays of >>100µs when using UART or SPI on the Cortex-A side to communicate with an external Cortex-M processor?

diego_b.tx · May 12, 2020, 7:59am

Dear @m.sauer

As already mentioned, the dependencies are heavy dependent on different things and it is not possible to make a common statement or estimation. In my opinion the expected typical latency of 100µs is quite low and I am not sure if you really can reach that, even with an optimized driver. Additionally, even if you would be able to reach such a low value, you still need to expect way higher values that can occur in worst cases from time to time. You may want to check this page, where we show the distribution of different delays with different system settings.

The driver implementation has a huge impact on the delay when using it and I don’t know the details or any numbers of the Uart or SPI driver to be exptected. The driver may buffer some things or has different priorities of the events happening. Therefore I would really suggest to start trying and optimizing, because that’s the only way to get the real numbers and the experience. And you probably won’t come around to look deeper into the used driver and make some customized optimizations. The same applies for the RPMsg driver. At the end that’s the difference to a microcontroller, where you even can work on clock level. Expecting a maximum latency of 100µs (even with worst case) on an external communication interface in Linux is probably not possible or needs at least some effort to reach.

Best regards
Diego

m.sauer · May 12, 2020, 4:17pm

Dear @diego_b.tx,

thank you very much for sharing your thoughts on this topic! I’m aware, that we might not achieve 100µs latency, but at least we’ll try to get as low as possible.

I agree that most likely the driver used will limit the performance, but i think this use case is not uncommon and I was hoping that maybe someone already tried something similar and could share his experience if the TTY driver or SPI driver perform better with regard to worst case latency.

The page you’re mentioning gives some rough estimates, on the IMX8 the worst case latencies seem even better:

diego_b.tx · May 13, 2020, 6:30am

Dear @m.sauer

You are welcome.
You can also find some discussions about the same or at least a similar topic in the community, e.g.:
imx6/linux: Large and inconsistent latency when issuing an irq triggered spi read
Apalis i.MX6 UART latency
SPI and realtime

And I saw that you already started a new discussion here.

We would be happy to hear your experience once you’ve done some tests!

Best regards
Diego

m.sauer · May 13, 2020, 3:24pm

Dear @diego_b.tx,

thank you very much for the links, we’ll definitely have a thorough look at it. The other discussion that you are mentioning is actually concerning communication with an IC using SPI . For connecting an external Cortex-M processor, we could use both SPI or UART, which is not possible when talking to the IC.

We’re currently investigating if our system performs better when only toggling an GPIO using GPIOD, which would indicate that the spi driver isn’t real-time capable.

jaski.tx · May 22, 2020, 6:53am

Hi

You are welcome.

We’re currently investigating if our system performs better when only toggling an GPIO using GPIOD, which would indicate that the spi driver isn’t real-time capable.

So you would write your own driver, which simulates SPI Communication and should be faster and more stable than the hardware SPI block? It does not depend on the driver, Linux is not a real-time OS.

Best regards,
Jaski

diego_b.tx · May 6, 2020, 8:47am

Dear @m.sauer

Basically you can use both and you won’t reach the limits with both protocols. You may have to choose a higher baudrate when using uart.

In general it’s up to you. In my personal opinion I would go for UART, because it’s a simple protocol which can also be debugged easily from another host. Additionally you may can reuse more components later when porting it to the Cortex-M4 of the SoC, since the communication uses TTY as well. But in general I highly recommend to introduce a good abstraction anyway, which allows you to switch to another protocol easily.

I hope to help you with this recommendation, even if it is pretty general.

Best regards
Diego

m.sauer · May 22, 2020, 9:58am

Dear @jaski.tx ,

we’re not trying to write our own SPI driver. We’re just trying to prove that the system’s RT performance is better when only toggling an GPIO than when communicating through SPI. This would indicate, that there is room for improvement in the SPI communication because the driver may not be written RT-aware.

Here are some first results:

CH1: falling edge after waiting a predefined time
CH2+3: Falling edge after SPI transfer is finished
CH4: _CS-line

Source-code and further information: [iMX8] SPI Real Time Performance - Technical Support - Toradex Community

jaski.tx · May 22, 2020, 11:58am

HI @m.sauer

This would indicate, that there is room for improvement in the SPI communication because the driver may not be written RT-aware.

Are you talking about delay or throughput of the data?

Real-time means that your system becomes deterministic and not super fast.

Please have a look here and here.

Best regards,
Jaski

m.sauer · May 22, 2020, 12:14pm

@jaski.tx ,

Thank you very much for the links, I’m aware of the difference between latency and throughput. What bothers us at the moment is not that the SPI transfer takes too long (=throughput), but that it’s jitter is >100% of the maxmium SPI transfer time.

Our first priority is to reduce the jitter of the SPI transfer duration (falling edges on CH2 and CH3 above). This jitter way bigger than the duration of the two SPI transfers.

SPI transfer #1: time between first _CS LOW (=CH4) and the falling edge on CH3
SPI transfer #2: time between second _CS LOW (=CH4) and the falling edge on CH2

jaski.tx · June 4, 2020, 8:17am

Hi @m.sauer

Did this help you?

Best regards,
Jaski

m.sauer · June 5, 2020, 8:05am

Unfortunately, not yet. See error report in the other thread.

jaski.tx · June 5, 2020, 11:39am

Ok, Thanks. Lets wait for Stefan’s response.

Best regards,
Jaski