Slow tcpip comms and packet monitor on WCE7

Our Colibri T20 acting as a tcpip client receives data from a remote board which includes an ADC+MCU and a W5500 SPI to Ethernet bridge.

Although the MCU gets samples at the correct rate from its local ADC, the T20 gets the data at a lower rate and we loose samples.

It’s a 100Mbps Eth link, and the required data flow is only <2Mbps. All packets sent by the MCU get correctly to the T20. But fewer than expected are actually sent, as if the socket was overloaded.

is there a packet sniffing tool to debug tcpip on WCE7+T20?

could it be a delayed ACK situation caused by the T20? We are using a tcpclient with C# at the T20 end.

Note: We are using a USB Hub that channels all Ethernet traffic into the USB port of the Iris carrier board using ASIX chips. And we are debugging at the same time as receiving packets from the ADC. Possibly too much data through the USB port…

Dear @Henry

The USB as bottleneck is one possible explanation. I think we first should understand the structure of the data transferred from the MCU to the T20. For example if there are many TCP packets with very little actual data payload - such as one sample per TCP packet - the protocol overhead can get huge.

I recommend to sniff the TCP traffic in order to understand where the bottleneck is. Unfortunately there is no such tool for WinCE.
It should be possible to use a (managed) Ethernet Switch in Hub mode, so you can attach a PC and sniff the data between MCU and T20 with Wireshark or a similar tool.

Or you take the trial-and-error approach: Collect multiple samples on the MCU side before sending them, so that the data transferred at once is in the range of 1kB. Does this help?

Regards, Andy

We don’t have access to a managed Eth switch but if push comes to shove we will need to get hold of one.

Packing more than 16bytes before sending the data does not help. Packing 30 sets of 16bytes in one packet or sending 16bytes in each packet does not make any difference. The transmission takes always around 40% more time than expected for a given number of samples.

Yet we know that the MCU is getting samples at the right rate from the ADC.

We are starting to suspect that the ASIX bridge and USB Hub on a custom connection board could be the root cause. If we uncover the root cause we will post it here.

Dear @Henry

There’s currently nothing I can do on the Toradex side to help you.
One option you could do is to create a small PC test application which sends dummy data to the Colibri module, instead of the Microcontroller. In this configuration it would be easy to use a Colibri Evaluation board instead of your custom board, and sniff the network traffic e.g. with Wireshark.

Regards, Andy

Dear @andy.tx

Debugging our C# code while running Toradex Task Manager 2.0, we have noticed that both cores in our T20 are mostly idle until the tcpclient.connect (ip,port) function is called.

Then one core reaches 99% CPU use and the other one is at 60%. Even as the code is stuck at a breakpoint just after the connection is made.

It makes “some” sense because the data acquisition board starts sending samples over the socket as soon as the socket connection is established.

But is it reasonable that the Asix USB driver driver running in the background takes that match CPU usage? CPU usage is clearly linked to our loss of samples.

We are puzzled about this. Any suggestions?

Dear @Henry

I have no simple explanation for such a high CPU load. I tried to reproduce the situation by sending a large number of small ethernet packets from my PC to the Colibri, but I never reached such a high load.

I suggest the following tests

  1. Make sure the issue is not related to the fact that you are debugging over the same Ethernet interface: Copy the executable (in release mode) to the Colibri and let it run. Disconnect any device from the Ethernet which is not absolutely required. Do you still see similar CPU loads?
  2. Please tell me more details about the Ethernet communication
  • Is the dataflow uni-directional from the DAQ device to the Colibri?
  • Are you using TCP or UDP packets?
  • How many bytes are in a packet?
  • How many packets do you send per second?
  1. I did all my tests using a native C/C++ tool. Maybe you need to move parts of your code from C# to the native C/C++ domain in order to optimize performance.

Regards, Andy

Dear @andy.tx

Thanks for your message.

We have tried 1. as suggested in your note. We generate a release version and ran it after disconnecting the USB connection (we are debugging via USB). The CPU usage is about the same (100% for one core and 60-70% for the other core).

The data flow is unidirectional. The DAQ device starts sending packets of 15 bytes as soon as the socket connection is created. The data rate is 240.000 bytes/second (16.000 packets of 15 bytes each, per second).

Could you paste your C/C++ code, or even better, post the VS2008 solution? We will then change the IP for the remote data generator and see what happens on our board.

Dear @Henry

I can’t send you the source code, but the binary utilities I used

Both are identical utilities, the first is to run on the Colibri T20, the second to run on a PC.
You can use the utilities as follows:

  1. On the Colibri, simply start winsock.exe without any parameters. It will listen to an incoming TCP connection on port 8123.
  2. Open the network dialog on the colibri, to find its IP address. In my case this is 10.0.1.30.
  3. On the PC, open a command window and enter
    Winsock_pc.exe 10.0.1.30 -n 16 -i 1 -l 15
    This will send 16 TCP packets every millisecond, each 15 bytes long.

This generates a CPU load of 4-5% on both cores in my setup.

Regards, Andy

Dear @andy.tx

Using winsock_pc as the dummy data generator on a PC and winsock on the Colibri as the data receiver, the CPU usage on the embedded is perfect, around 5%.

Using our C# code on the Colibri and your dummy data generator on a PC, the CPU usage is still 5%. Good.

But our data acquisition board as the data generator, and our code receiving the data on the embedded push the two CPU cores up to 90% and 60%.

We have also observed the Mobile Device Center connection dropping, as well as the VS debugger crashing (they share the USB with the ACQ board).

Our guess is that the USB bus is probably flooded with samples even though flow control should have stopped the ACQ board from sending more data.

It would be convenient to be able to generate a log of TCPIP packets and post it for your feedback. Is there a convenient way to do it on WCE7 without generating a new OS image?

Dear @Henry

I’m afaid there is no way to log the TCP/IP packets on WEC7.
I recommend you make the network traffic available through an Ethernet hub, then you can use Wireshark or a similar software on the PC to monitor the traffic.

Regards, Andy