Hello,
I have encountered blocker while developing software for our client based on Apalis iMX6D Module. When sending huge files ( >1.1GB) through scp from Ixora Development Board to host PC with 100Mbit ethernet card the transfer is terribly slow or even stalled. Transfers in opposite direction (from Host PC to Apalis Module) works fine with constant 12MB/s rate. When we set up the host card to 1Gbit or 10Mbits the same configuration works well. We have verified our host PC connection with different 100Mbit ethernet device to prove that our testing set-up is correct.
When transmitted packets are small like for example in SSH connection the problem is not disturbing normal work but TCP retransmissions can be encountered as well.
Wireshark show multiple TCP retransmissions when sending huge file:
Host PC Ethernet Card: Intel® 82577LC Gigabit Ethernet PHY (100Mbit speed set up from userspace)
Tested also with: KSZ8895 (100mbit switch)
We need to fix this issue because our baseboard is equipped with KSZ8895 (100Mbit) switch. Temporarily we decreased Ethernet Speed to 10Mbit for development but for our target system it’s not enough.
Have someone encountered similar issues? Any help will be appreciated.
We did some tests on our side. You are
right the 100Mbit full duplex is
causing packet loss on Ethernet
communication. But if you set the
speed on toradex module to half duplex
mode, then you get a good speed and no
packet loss.
We have tested half duplex mode by setting it with:
Hi,
tank you for your answer. Indeed the issue is connected with duplex mismatch. I have checked kernel logs after connecting to the network manually set up to 100Mb full duplex and fec driver prints:
[66809.477282] fec 2188000.ethernet eth0: Link is Up - 100Mbps/Half - flow control off
What is obviously wrong. When I try to force 100Mb full duplex on Apalis with ethtool and my host PC ethernet card has DISABLED auto-negotiation and link is fixed for 100Mb full duplex, fec driver prints:
[ 205.526106] fec 2188000.ethernet eth0: Link is Up - 10Mbps/Half - flow control off
What is even worse…
The set up that i found working for Ixora board is- Apalis Module fixed with 100Mb full duplex link and host ethernet card set up for autonegotiation. Then everything works fine as you described: 12MB/s flow both directions.
Unfortunately similar setup does still not functionate for our baseboard configuration with KSZ8895 switch. I am still trying different configurations for ksz8895 switch and Apalis configuration (system and device tree).
So if you say, the problem occurs after some time, what is your use case. Is this data stored somewhere on flash of apalis or on the mass storage device connected through USB.
I’ve also noticed strange behaviour earlier that packet losses have appeared after some time but for now after discovering ethernet link negotiation problems I think that was just random case and it is probably not relevant.
Just for information: the data sent was stored on eMMC memory on Apalis module. For ensuring that the problem is not storage related we also have tested SSD drive connected to mSATA.
We also have tested connection speed with iperf3 and we observed packet losses on 100Mbit.
Unfortunately similar setup does still not functionate for our baseboard configuration with
KSZ8895 switch. I am still trying different configurations for ksz8895 switch and Apalis
configuration (system and device tree).
If you connect a switch between your host and apalis module, then you should fix the speed on the switch to 100MBit (if possible) and fix the speed on other devices in network also to 100MBit. Additionally flow control should be enabled.
We have tried to set up fixed link speed on switch for 100Mbit full duplex. Although the fec kernel driver shows that the link speed is 10Mbit half duplex. Flow control is on. We have proven switch set up on the other port connected to PC. The same set up gives expected results. Also switch registers have expected value.
I suppose that somehow information about link type is lost somewhere between KSZ9031PHY and iMX6. I know how to read PHY registers in uboot but i still need to find out how to get the values in userspace when system is running (our switch power supply is being controlled by GPIO pin so during boot up procedure switch is turned off).
You can use ethtool or mii for reading PHY registers. Ethtool is already installed in the toradex images. If you want to use mii, you can compile and install the package.
The only two things that are not clear for me are:
Register 13h – Digital PMA/PCS Status bit 13.1(value = 0) what indicates 100BASE-TX
Link Status Not Ok (it does not come along with register 0x1f?)
Register 0h – Basic Control bit 0.8 Duplex Mode. It’s value is 0 what implies Half Duplex. It is also not consistent with status register 0x1f. However I’m not sure if this bit is valid when auto-negotiation is on. KSZ9031 documentation does not explain it clearly.
With MII-Tool I am not able to write PHY registers. I need to find another tool or modify the driver. Could you compare this register dump to dump from your working configuration with 100Mbit Full duplex?
Hi,
I have reproduced your configuration and it works well when Ixora Eval is connected directly to ethernet slot in my PC or through commercial ethernet switch(D-link GO-SW-8G).
When I try to connect it through KSZ8895 switch on our board or try to use onboard Apalis module TX still stucks at 100Mbit FD/HD and wireshark shows multiple retransmissions. We have checked KSZ8895 switch separately with different devices both native 100Mbit and 1Gb manualy set to 100Mbit FD/HD and it has worked properly… Only Set-up iMX6->KSZ9031RNX(RGMII)->KSZ8895MQX(ETH) has problem with transmision.
I have found post on the NXP community where someone have had the same problem:
The solution described there is:
Our design had not correctly connected the ENET_CRS_DV reset line from IMX-6q to the PHY. Once we fixed this, both tx/rx is working fine at 100BaseT now
Another interesting topic is:
I have found fix in kernel sources named ksz9031rn_phy_fixup (mach-imx5q.c:104) which probably fixes skew rate.
Now I am compiling kernel with additional prints to see if this fix is applied.
Both solutions suggest hardware related problems with KSZ9031 and iMX6 connection. Could you confirm if ENET_CRS_DV is properly wired on the Apalis iMX6 boards and defined in the device tree (we don’t have access to schematics) and also if ksz9031rn_phy_fixup is used in recent kernel?
I checked with hardware engineer, the output ENET_CRS_DV is correctly wired. Can you send us a schematic of wiring on your carrier board between Apalis imx6 → KSZ9031RNX(RGMII)->KSZ8895MQX(ETH) through OTRS? Thanks.
hi erwin, check the post below. This issue is connected to duplex mismatch. If you fix the speed on both sides, you should have the speed you want without any package loss.
The issue seems to be connected to Duplex_mismatch.
I tried the following:
Direct Host to Target Module Connection and set up speed of 100MBit/s and Full Duplex on Host and Module. Transfer speed in both directions is 11.2MB/s and there is no packet loss.
Host to Target Module Connection through a 100Mbit Switch () and autonegate on host and target.
Transfer speed in both directions is 11.2MB/s and there is no packet loss.
This is the result of the tests
bigfile.tar.gz 100% 1496MB 11.2MB/s 02:14
Du you also loose ping-packets. We have a very similar problem. Sending 100 pings/s we see packetloss of everyting from 0/4000 packets to several hundrets/4000. Right of the productionline on device might work well, the next in line suffer from those massive packetlosses.
@jaski.tx I’m unsue what you mean, but here is a more elaborate question.
Do you also loose ping-packets?
We have the following system. A sing PCB with en STM32-microntroler MII connected to the a KSZ8895MQ switch.
I can ping the micro - it responds typically in around 720us. The speed is 100Mb/s autoneg on.
In linux I run this command to test.
$ sudo ping -c 120000 -i 0.003
This gives me this result:
120000 packets transmitted, 119914 received, 0.0716667% packet loss, time 1357ms
rtt min/avg/max/mdev = 0.284/0.719/7.006/0.105 ms
So 84 packets lost.
Doing the same test at 10Mb/s autoneg on, gives this result:
120000 packets transmitted, 119983 received, 0.0141667% packet loss, time 860ms
rtt min/avg/max/mdev = 0.593/0.949/6.823/0.078 ms
So 17 packets lost.
Every test we have done indicates that error rate is somewhat proportional to the link speed. Auto negotiation on/off makes to difference.
Now you would say that the STM32-micro could be the problem. I said that too. But it turns out that doing the same ping-test between two PCs connected to the switch gives the same result. Connecting the PCs directly using a crossover cable results in no packet loss.
To make matters worse, grabbing 2 consecutive PCBs off the productionline and testing them sometimes results in much worse results - several hundred or thousand packages lost.