Apalis iMX6D problem with transferring huge files through 100Mbit ethernet connection

Hello,
I have encountered blocker while developing software for our client based on Apalis iMX6D Module. When sending huge files ( >1.1GB) through scp from Ixora Development Board to host PC with 100Mbit ethernet card the transfer is terribly slow or even stalled. Transfers in opposite direction (from Host PC to Apalis Module) works fine with constant 12MB/s rate. When we set up the host card to 1Gbit or 10Mbits the same configuration works well. We have verified our host PC connection with different 100Mbit ethernet device to prove that our testing set-up is correct.

When transmitted packets are small like for example in SSH connection the problem is not disturbing normal work but TCP retransmissions can be encountered as well.

Wireshark show multiple TCP retransmissions when sending huge file:

Some information about setup:

  • Apalis iMX6D V.1.1A 512MB and V1.1B 1GB
  • Beseboard: Ixora V1.0A or our custom baseboard with Micrel KSZ8895 (100Mbit) switch
  • Linux apalis-imx6 4.9.67-+g1db9f06709ad #1 SMP Sat Feb 3 16:19:58 CET 2018 armv7l GNU/Linux
  • Image built with Yocto, cloned from Index of /toradex-bsp-platform.git -b LinuxImageV2.8
  • Image recipe: console-tdx-image.bb
  • Host PC Ethernet Card: Intel® 82577LC Gigabit Ethernet PHY (100Mbit speed set up from userspace)
  • Tested also with: KSZ8895 (100mbit switch)

We need to fix this issue because our baseboard is equipped with KSZ8895 (100Mbit) switch. Temporarily we decreased Ethernet Speed to 10Mbit for development but for our target system it’s not enough.

Have someone encountered similar issues? Any help will be appreciated.

Update:
The problem is confirmed by Toradex:

We did some tests on our side. You are
right the 100Mbit full duplex is
causing packet loss on Ethernet
communication. But if you set the
speed on toradex module to half duplex
mode, then you get a good speed and no
packet loss.

We have tested half duplex mode by setting it with:

ethtool -s eth0 speed 100 duplex half

Unfortunately it does not help anyway.

Hi,
tank you for your answer. Indeed the issue is connected with duplex mismatch. I have checked kernel logs after connecting to the network manually set up to 100Mb full duplex and fec driver prints:

[66809.477282] fec 2188000.ethernet eth0: Link is Up - 100Mbps/Half - flow control off

What is obviously wrong. When I try to force 100Mb full duplex on Apalis with ethtool and my host PC ethernet card has DISABLED auto-negotiation and link is fixed for 100Mb full duplex, fec driver prints:

[ 205.526106] fec 2188000.ethernet eth0: Link is Up - 10Mbps/Half - flow control off

What is even worse…

The set up that i found working for Ixora board is- Apalis Module fixed with 100Mb full duplex link and host ethernet card set up for autonegotiation. Then everything works fine as you described: 12MB/s flow both directions.

Unfortunately similar setup does still not functionate for our baseboard configuration with KSZ8895 switch. I am still trying different configurations for ksz8895 switch and Apalis configuration (system and device tree).

So if you say, the problem occurs after some time, what is your use case. Is this data stored somewhere on flash of apalis or on the mass storage device connected through USB.

I’ve also noticed strange behaviour earlier that packet losses have appeared after some time but for now after discovering ethernet link negotiation problems I think that was just random case and it is probably not relevant.

Just for information: the data sent was stored on eMMC memory on Apalis module. For ensuring that the problem is not storage related we also have tested SSD drive connected to mSATA.
We also have tested connection speed with iperf3 and we observed packet losses on 100Mbit.

Unfortunately similar setup does still not functionate for our baseboard configuration with
KSZ8895 switch. I am still trying different configurations for ksz8895 switch and Apalis
configuration (system and device tree).

If you connect a switch between your host and apalis module, then you should fix the speed on the switch to 100MBit (if possible) and fix the speed on other devices in network also to 100MBit. Additionally flow control should be enabled.

We have tried to set up fixed link speed on switch for 100Mbit full duplex. Although the fec kernel driver shows that the link speed is 10Mbit half duplex. Flow control is on. We have proven switch set up on the other port connected to PC. The same set up gives expected results. Also switch registers have expected value.

I suppose that somehow information about link type is lost somewhere between KSZ9031PHY and iMX6. I know how to read PHY registers in uboot but i still need to find out how to get the values in userspace when system is running (our switch power supply is being controlled by GPIO pin so during boot up procedure switch is turned off).

You can use ethtool or mii for reading PHY registers. Ethtool is already installed in the toradex images. If you want to use mii, you can compile and install the package.

Reading registers properly was able only with Mii-tool. PHY Control reg (0x1f bits 1f.5 1f.3) indicates 100BASE-TX Full duplex as expected.

 1000 796d 0022 1622 0501 45e1 0005 2001
 0000 0000 4000 0000 0000 4002 03ff 3000
 0000 00f4 0000 4841 4802 0000 0000 0200
 0000 0000 0000 0500 0000 0000 0000 0328

The only two things that are not clear for me are:

  • Register 13h – Digital PMA/PCS Status bit 13.1(value = 0) what indicates 100BASE-TX
    Link Status Not Ok (it does not come along with register 0x1f?)
  • Register 0h – Basic Control bit 0.8 Duplex Mode. It’s value is 0 what implies Half Duplex. It is also not consistent with status register 0x1f. However I’m not sure if this bit is valid when auto-negotiation is on. KSZ9031 documentation does not explain it clearly.

With MII-Tool I am not able to write PHY registers. I need to find another tool or modify the driver. Could you compare this register dump to dump from your working configuration with 100Mbit Full duplex?

Hi

I set up 100Mbit, full duplex and autoneg off on Apalis-imx6, by doing this command:
ethtool -s eth0 speed 100 duplex full autoneg off.

My register dump is the following:

2100 794d 0022 1622 0501 c5e1 000d 2001
0000 0000 4000 0000 0000 4002 03ff 3000
0000 00f4 0000 3879 4802 0000 0000 0200
0000 0000 0000 0500 0000 0000 0000 0328

Hi,
I have reproduced your configuration and it works well when Ixora Eval is connected directly to ethernet slot in my PC or through commercial ethernet switch(D-link GO-SW-8G).

When I try to connect it through KSZ8895 switch on our board or try to use onboard Apalis module TX still stucks at 100Mbit FD/HD and wireshark shows multiple retransmissions. We have checked KSZ8895 switch separately with different devices both native 100Mbit and 1Gb manualy set to 100Mbit FD/HD and it has worked properly… Only Set-up iMX6->KSZ9031RNX(RGMII)->KSZ8895MQX(ETH) has problem with transmision.

I have found post on the NXP community where someone have had the same problem:

The solution described there is:

Our design had not correctly connected the ENET_CRS_DV reset line from IMX-6q to the PHY. Once we fixed this, both tx/rx is working fine at 100BaseT now

Another interesting topic is:

I have found fix in kernel sources named ksz9031rn_phy_fixup (mach-imx5q.c:104) which probably fixes skew rate.
Now I am compiling kernel with additional prints to see if this fix is applied.

Both solutions suggest hardware related problems with KSZ9031 and iMX6 connection. Could you confirm if ENET_CRS_DV is properly wired on the Apalis iMX6 boards and defined in the device tree (we don’t have access to schematics) and also if ksz9031rn_phy_fixup is used in recent kernel?

I checked with hardware engineer, the output ENET_CRS_DV is correctly wired. Can you send us a schematic of wiring on your carrier board between Apalis imx6 → KSZ9031RNX(RGMII)->KSZ8895MQX(ETH) through OTRS? Thanks.

We have similar problems with duplex mode under WEC2013 (Apalis iXM6Q V1.1A on a custom board).

We have to set the connection to 100 MBit half duplex via registry.
With autonegotiation full duplex is used and the transfer rate is below 100 kb/s.

hi erwin, check the post below. This issue is connected to duplex mismatch. If you fix the speed on both sides, you should have the speed you want without any package loss.

hi @sgh: Could you ask a new question with description of the needed details for the Issue? Thanks.

Hi

With New Question, I meant to start a new Thread? Could you do this Please? Thanks.

Hi

The issue seems to be connected to Duplex_mismatch.
I tried the following:

  1. Direct Host to Target Module Connection and set up speed of 100MBit/s and Full Duplex on Host and Module. Transfer speed in both directions is 11.2MB/s and there is no packet loss.
  2. Host to Target Module Connection through a 100Mbit Switch () and autonegate on host and target.
    Transfer speed in both directions is 11.2MB/s and there is no packet loss.

This is the result of the tests
bigfile.tar.gz 100% 1496MB 11.2MB/s 02:14

Du you also loose ping-packets. We have a very similar problem. Sending 100 pings/s we see packetloss of everyting from 0/4000 packets to several hundrets/4000. Right of the productionline on device might work well, the next in line suffer from those massive packetlosses.

Did you find any solution ?

@jaski.tx I’m unsue what you mean, but here is a more elaborate question.

Do you also loose ping-packets?

We have the following system. A sing PCB with en STM32-microntroler MII connected to the a KSZ8895MQ switch.

I can ping the micro - it responds typically in around 720us. The speed is 100Mb/s autoneg on.
In linux I run this command to test.

$ sudo ping -c 120000 -i 0.003 

This gives me this result:

120000 packets transmitted, 119914 received, 0.0716667% packet loss, time 1357ms
rtt min/avg/max/mdev = 0.284/0.719/7.006/0.105 ms

So 84 packets lost.

Doing the same test at 10Mb/s autoneg on, gives this result:

120000 packets transmitted, 119983 received, 0.0141667% packet loss, time 860ms
rtt min/avg/max/mdev = 0.593/0.949/6.823/0.078 ms

So 17 packets lost.

Every test we have done indicates that error rate is somewhat proportional to the link speed. Auto negotiation on/off makes to difference.

Now you would say that the STM32-micro could be the problem. I said that too. But it turns out that doing the same ping-test between two PCs connected to the switch gives the same result. Connecting the PCs directly using a crossover cable results in no packet loss.

To make matters worse, grabbing 2 consecutive PCBs off the productionline and testing them sometimes results in much worse results - several hundred or thousand packages lost.

Yup - will do that. Recently we have had some kind of progress. I will post a new thread when our current investigations are done tomorrow.

Perfect. Thanks very much for your Input.

@mateusz28
How is the KSZ8895 clocked? Does it have a dedicated crystal?