[lime-dev] MediaTek routers, ethernet and VLAN 802.11ad

Ilario Gelmetti iochesonome at gmail.com
Wed Aug 7 18:30:41 UTC 2019


Dear all,
I was testing LibreMesh (together with Gio and SAn, lime-packages master
branch compiled on top of OpenWrt's openwrt-18.06 branch) on a
MediaTek-based router: YouHua WR1200JS.
Everything works fine apart the routing on cabled connections.
Seems that these routers does not like VLAN of type 802.1ad on cable.
It could be an OpenWrt bug or a bug on the device.
Can anyone check and confirm on other MediaTek devices please?

Here I make a list of what I tested:

* setting the routing protocols to run on 802.1q interfaces (rather than
on 802.1ad, we usually don't do it as it gave problems with TP-Link
routers, can be done giving a third argument in /etc/config/lime, like
"list protocols babeld:17:8021q") and the routing protocols see each
other via cable, works well (two identically configured routers see each
other as neighbours via eth0-1_17 in Babeld, prompted with "echo dump |
nc ::1 30003")

* listening with Wireshark on the laptop, I receive from the cable
broken IPv6 multicast packets. They are correctly marked as VLAN 802.1ad
ID 17 but the rest of the packet content is Error/Malformed.

* creating an 802.1ad interface on my laptop (e.g. "ip link add link
enp0s25 name enp0s25.17 type vlan proto 802.1ad id 17; ip link set
enp0s25.17 up"), adding an /24 IP on both sides and pinging from the
router to the laptop. My laptop receives the router's ARP requests and
answers, but the router keeps asking as if it did not receive the answer.

* while pinging from the laptop (10.2.1.2) to the router (10.2.1.1) on
the just created tagged cabled interface, I connect via wifi and ssh to
the router and run tcpdump on it:
** running it on eth0 shows that my ARP requests physically reach the
router and are properly tagged ("tcpdump -i eth0 -nn -e vlan"):
21:03:45.354344 54:ee:75:7a:c2:1f > ff:ff:ff:ff:ff:ff, ethertype
802.1Q-QinQ (0x88a8), length 64: vlan 1, p 0, ethertype 802.1Q-QinQ,
vlan 17, p 0, ethertype ARP, Request who-has 10.2.1.1 tell 10.2.1.2,
length 42
** running it on eth0-1_17 shows broken UDP packets (the same Malformed
IPv6 multicast packets I received with Wireshark) which likely are
generated by Babeld, BUT NO ARP request at all:
21:05:45.395359 IP6 (class 0xc0, flowlabel 0x854bc, hlim 1, next-header
UDP (17) payload length: 89) fe80::d65f:25ff:feeb:7ead.6696 >
ff02::1:6.6696: [bad udp cksum 0x77ed -> 0x7ce5!] UDP, length 81
21:05:49.255355 IP6 (class 0xc0, flowlabel 0x854bc, hlim 1, next-header
UDP (17) payload length: 20) fe80::d65f:25ff:feeb:7ead.6696 >
ff02::1:6.6696: [bad udp cksum 0x77a8 -> 0xa0e9!] UDP, length 12
21:05:53.225372 IP6 (class 0xc0, flowlabel 0x854bc, hlim 1, next-header
UDP (17) payload length: 20) fe80::d65f:25ff:feeb:7ead.6696 >
ff02::1:6.6696: [bad udp cksum 0x77a8 -> 0xa0e8!] UDP, length 12
21:05:57.385373 IP6 (class 0xc0, flowlabel 0x854bc, hlim 1, next-header
UDP (17) payload length: 20) fe80::d65f:25ff:feeb:7ead.6696 >
ff02::1:6.6696: [bad udp cksum 0x77a8 -> 0xa0e7!] UDP, length 12
21:06:01.245355 IP6 (class 0xc0, flowlabel 0x854bc, hlim 1, next-header
UDP (17) payload length: 89) fe80::d65f:25ff:feeb:7ead.6696 >
ff02::1:6.6696: [bad udp cksum 0x77ed -> 0x7ce1!] UDP, length 81

* flashed the YouHua router with OpenWrt 18.06.4 as downloaded from the
OpenWrt website and created the 802.1ad interfaces using the ip command
(installing the ip-full package, "ip link add link eth0.1 name eth0-1_17
type vlan proto 802.1ad id 17; ip link set eth0-1_17 up; ip address add
10.2.1.1/24 dev eth0-1_17") and still it does not ping (my laptop's ARP
requests and my laptop's ARP answers does not get to eth0-1_17)

* on the same clean router, using nping I sent a raw ethernet packet on
the eth0-1_17 interface (using the command "nping --send-eth
--source-mac ff:ff:ff:ff:ff:ff --dest-mac ff:ff:ff:ff:ff:ff --data
aaaabbbbccccddddeeeeffffffffeeeeddddccccbbbbaaaa -e eth0-1_17 -N
8.8.8.8") and captured it on the laptop.
What I got is broken (notice that instead of "aa aa bb bb cc cc" on the
second line, I have "aa aa 0e 9c cc cc").
This is when capturing on enp0s25 (plain ethernet)
0000   ff ff ff ff ff ff ff ff ff ff ff ff 88 a8 00 11
0010   08 00 08 00 4c 14 ab ea 00 01 aa aa 0e 9c cc cc
0020   dd dd ee ee ff ff ff ff ee ee dd dd cc cc bb bb
0030   aa aa d6 5f 25 ff fe eb 7e ac ae 2c 00 16 b7 e6
0040   4a c6 4f ee f2 fa

And this is when capturing on enp0s25.17 (VLAN 802.1ad ID 17 interface)
0000   ff ff ff ff ff ff ff ff ff ff ff ff 08 00 08 00
0010   2c 48 cb b2 00 05 aa aa 9a 9a cc cc dd dd ee ee
0020   ff ff ff ff ee ee dd dd cc cc bb bb aa aa 64 68
0030   63 70 20 31 2e 32 38 2e 34 0c 07 4f 70 65 6e 57
0040   72 74

the latest part of the packet, both when listening on enp0s25 or on
enp0s25.17, varies: usually does not have a transcription while
sometimes it can be transcribed as:

0030   aa aa 64 68 63 70 20 31 2e 32 38 2e 34 0c 07 4f ..dhcp 1.28.4..O
0040   70 65 6e 57 72 74                               penWrt

where 1.28.4 looks like the busybox version on the router, no idea why
or how this got here.

Capturing the packet with tcpdump from inside the router, listening on
eth0-1_17 I got:
0000   ff ff ff ff ff ff ff ff ff ff ff ff 08 00 45 00
0010   00 34 f5 88 00 00 40 01 6a 2e 0a 02 01 01 08 08
0020   08 08 08 00 2f 89 c8 71 00 05 aa aa bb bb cc cc
0030   dd dd ee ee ff ff ff ff ee ee dd dd cc cc bb bb
0040   aa aa

then, listening on eth0.1 I got:
0000   ff ff ff ff ff ff ff ff ff ff ff ff 88 a8 00 11
0010   08 00 45 00 00 34 21 a6 00 00 40 01 3e 11 0a 02
0020   01 01 08 08 08 08 08 00 26 19 d1 e1 00 05 aa aa
0030   bb bb cc cc dd dd ee ee ff ff ff ff ee ee dd dd
0040   cc cc bb bb aa aa

and listening on eth0:
0000   ff ff ff ff ff ff ff ff ff ff ff ff 81 00 00 01
0010   88 a8 00 11 08 00 45 00 00 34 4c e4 00 00 40 01
0020   12 d3 0a 02 01 01 08 08 08 08 08 00 c8 4f 2f ab
0030   00 05 aa aa bb bb cc cc dd dd ee ee ff ff ff ff
0040   ee ee dd dd cc cc bb bb aa aa

so that all these three captures taken from inside the router look good.

As a comparison, I used the same nping command on a TP-Link WDR3600
router and the packet captured on my laptop looks perfectly ok, sniffing
on enp0s25:
0000   ff ff ff ff ff ff ff ff ff ff ff ff 88 a8 00 11
0010   08 00 45 00 00 34 88 39 00 00 40 01 6f 59 0a 0d
0020   69 1a 08 08 08 08 08 00 da bd 1d 3d 00 05 aa aa
0030   bb bb cc cc dd dd ee ee ff ff ff ff ee ee dd dd
0040   cc cc bb bb aa aa

And capturing on enp0s25.17:
0000   ff ff ff ff ff ff ff ff ff ff ff ff 08 00 45 00
0010   00 34 33 0e 00 00 40 01 c4 84 0a 0d 69 1a 08 08
0020   08 08 08 00 60 93 97 67 00 05 aa aa bb bb cc cc
0030   dd dd ee ee ff ff ff ff ee ee dd dd cc cc bb bb
0040   aa aa

In case this bug a hardware one for all the MediaTek-based routers, I
would suggest considering running Babeld on the br-lan bridge without
any VLAN (neither 802.1q nor 802.1ad) rather than on eth0-1_17.
BMX6 was already running on the bridge and to avoid it to run also
inside BATMAN-adv we were using this ebtables rule:
https://github.com/libremesh/lime-packages/blob/master/packages/lime-proto-bmx6/src/bmx6.lua#L133-L134
we could do the same for Babeld (and for consistency I would also not
use VLAN for it on wireless mesh interfaces).

Thanks && ciao;
Ilario

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 195 bytes
Desc: OpenPGP digital signature
URL: <http://lists.libremesh.org/pipermail/lime-dev/attachments/20190807/319d9ed6/attachment.sig>


More information about the lime-dev mailing list