[lime-dev] MediaTek routers, ethernet and VLAN 802.11ad

guifipedro guifipedro at gmail.com
Fri Aug 23 09:24:01 UTC 2019


+1 configuring networks and devices without VLAN in routing part. And
you provided a great example why not.

VLAN thing (by default and for a community network) is very weird,
specially when dealing with some hardware - hardware that changes in
time. We are talking about lowcost hardware that changes in revisions,
remember the case of ubiquiti nanostation m5 having eth0 and eth1 in
the past (physical interfaces) and then they changed to eth0.1 and
eth0.2 (VLANs, internal switch).

in our community network, we are removing VLAN in cable part
(depending on the network setup, you can arrange that traffic to a
vlan in the managed switch). We would like to completely remove VLAN
in the wifi-radio part, but that means reflashing the full network
(non retrocompatible change), we will see how we make it.

On Sun, Aug 11, 2019 at 8:05 PM Ilario Gelmetti <iochesonome at gmail.com> wrote:
>
> Dear all,
> I posted a revised version of this email also on OpenWrt forum [1] but
> received no answer until now.
> Can anyone reproduce on other MediaTek-based router (e.g. most of Xiaomi
> ones), with the instructions included in the forum post [1] please?
> In my opinion we have to positively or negatively confirm this *before*
> the next LibreMesh release, as if it is real I think we should make
> LibreMesh use Babeld on br-lan (as it already happens for BMX6) and
> without VLAN.
> Thanks!!
> Ilario
>
> [1]
> https://forum.openwrt.org/t/mediatek-and-vlan-802-1ad-on-ethernet/42346
>
>
> On 8/7/19 8:30 PM, Ilario Gelmetti wrote:
> > Dear all,
> > I was testing LibreMesh (together with Gio and SAn, lime-packages master
> > branch compiled on top of OpenWrt's openwrt-18.06 branch) on a
> > MediaTek-based router: YouHua WR1200JS.
> > Everything works fine apart the routing on cabled connections.
> > Seems that these routers does not like VLAN of type 802.1ad on cable.
> > It could be an OpenWrt bug or a bug on the device.
> > Can anyone check and confirm on other MediaTek devices please?
> >
> > Here I make a list of what I tested:
> >
> > * setting the routing protocols to run on 802.1q interfaces (rather than
> > on 802.1ad, we usually don't do it as it gave problems with TP-Link
> > routers, can be done giving a third argument in /etc/config/lime, like
> > "list protocols babeld:17:8021q") and the routing protocols see each
> > other via cable, works well (two identically configured routers see each
> > other as neighbours via eth0-1_17 in Babeld, prompted with "echo dump |
> > nc ::1 30003")
> >
> > * listening with Wireshark on the laptop, I receive from the cable
> > broken IPv6 multicast packets. They are correctly marked as VLAN 802.1ad
> > ID 17 but the rest of the packet content is Error/Malformed.
> >
> > * creating an 802.1ad interface on my laptop (e.g. "ip link add link
> > enp0s25 name enp0s25.17 type vlan proto 802.1ad id 17; ip link set
> > enp0s25.17 up"), adding an /24 IP on both sides and pinging from the
> > router to the laptop. My laptop receives the router's ARP requests and
> > answers, but the router keeps asking as if it did not receive the answer.
> >
> > * while pinging from the laptop (10.2.1.2) to the router (10.2.1.1) on
> > the just created tagged cabled interface, I connect via wifi and ssh to
> > the router and run tcpdump on it:
> > ** running it on eth0 shows that my ARP requests physically reach the
> > router and are properly tagged ("tcpdump -i eth0 -nn -e vlan"):
> > 21:03:45.354344 54:ee:75:7a:c2:1f > ff:ff:ff:ff:ff:ff, ethertype
> > 802.1Q-QinQ (0x88a8), length 64: vlan 1, p 0, ethertype 802.1Q-QinQ,
> > vlan 17, p 0, ethertype ARP, Request who-has 10.2.1.1 tell 10.2.1.2,
> > length 42
> > ** running it on eth0-1_17 shows broken UDP packets (the same Malformed
> > IPv6 multicast packets I received with Wireshark) which likely are
> > generated by Babeld, BUT NO ARP request at all:
> > 21:05:45.395359 IP6 (class 0xc0, flowlabel 0x854bc, hlim 1, next-header
> > UDP (17) payload length: 89) fe80::d65f:25ff:feeb:7ead.6696 >
> > ff02::1:6.6696: [bad udp cksum 0x77ed -> 0x7ce5!] UDP, length 81
> > 21:05:49.255355 IP6 (class 0xc0, flowlabel 0x854bc, hlim 1, next-header
> > UDP (17) payload length: 20) fe80::d65f:25ff:feeb:7ead.6696 >
> > ff02::1:6.6696: [bad udp cksum 0x77a8 -> 0xa0e9!] UDP, length 12
> > 21:05:53.225372 IP6 (class 0xc0, flowlabel 0x854bc, hlim 1, next-header
> > UDP (17) payload length: 20) fe80::d65f:25ff:feeb:7ead.6696 >
> > ff02::1:6.6696: [bad udp cksum 0x77a8 -> 0xa0e8!] UDP, length 12
> > 21:05:57.385373 IP6 (class 0xc0, flowlabel 0x854bc, hlim 1, next-header
> > UDP (17) payload length: 20) fe80::d65f:25ff:feeb:7ead.6696 >
> > ff02::1:6.6696: [bad udp cksum 0x77a8 -> 0xa0e7!] UDP, length 12
> > 21:06:01.245355 IP6 (class 0xc0, flowlabel 0x854bc, hlim 1, next-header
> > UDP (17) payload length: 89) fe80::d65f:25ff:feeb:7ead.6696 >
> > ff02::1:6.6696: [bad udp cksum 0x77ed -> 0x7ce1!] UDP, length 81
> >
> > * flashed the YouHua router with OpenWrt 18.06.4 as downloaded from the
> > OpenWrt website and created the 802.1ad interfaces using the ip command
> > (installing the ip-full package, "ip link add link eth0.1 name eth0-1_17
> > type vlan proto 802.1ad id 17; ip link set eth0-1_17 up; ip address add
> > 10.2.1.1/24 dev eth0-1_17") and still it does not ping (my laptop's ARP
> > requests and my laptop's ARP answers does not get to eth0-1_17)
> >
> > * on the same clean router, using nping I sent a raw ethernet packet on
> > the eth0-1_17 interface (using the command "nping --send-eth
> > --source-mac ff:ff:ff:ff:ff:ff --dest-mac ff:ff:ff:ff:ff:ff --data
> > aaaabbbbccccddddeeeeffffffffeeeeddddccccbbbbaaaa -e eth0-1_17 -N
> > 8.8.8.8") and captured it on the laptop.
> > What I got is broken (notice that instead of "aa aa bb bb cc cc" on the
> > second line, I have "aa aa 0e 9c cc cc").
> > This is when capturing on enp0s25 (plain ethernet)
> > 0000   ff ff ff ff ff ff ff ff ff ff ff ff 88 a8 00 11
> > 0010   08 00 08 00 4c 14 ab ea 00 01 aa aa 0e 9c cc cc
> > 0020   dd dd ee ee ff ff ff ff ee ee dd dd cc cc bb bb
> > 0030   aa aa d6 5f 25 ff fe eb 7e ac ae 2c 00 16 b7 e6
> > 0040   4a c6 4f ee f2 fa
> >
> > And this is when capturing on enp0s25.17 (VLAN 802.1ad ID 17 interface)
> > 0000   ff ff ff ff ff ff ff ff ff ff ff ff 08 00 08 00
> > 0010   2c 48 cb b2 00 05 aa aa 9a 9a cc cc dd dd ee ee
> > 0020   ff ff ff ff ee ee dd dd cc cc bb bb aa aa 64 68
> > 0030   63 70 20 31 2e 32 38 2e 34 0c 07 4f 70 65 6e 57
> > 0040   72 74
> >
> > the latest part of the packet, both when listening on enp0s25 or on
> > enp0s25.17, varies: usually does not have a transcription while
> > sometimes it can be transcribed as:
> >
> > 0030   aa aa 64 68 63 70 20 31 2e 32 38 2e 34 0c 07 4f ..dhcp 1.28.4..O
> > 0040   70 65 6e 57 72 74                               penWrt
> >
> > where 1.28.4 looks like the busybox version on the router, no idea why
> > or how this got here.
> >
> > Capturing the packet with tcpdump from inside the router, listening on
> > eth0-1_17 I got:
> > 0000   ff ff ff ff ff ff ff ff ff ff ff ff 08 00 45 00
> > 0010   00 34 f5 88 00 00 40 01 6a 2e 0a 02 01 01 08 08
> > 0020   08 08 08 00 2f 89 c8 71 00 05 aa aa bb bb cc cc
> > 0030   dd dd ee ee ff ff ff ff ee ee dd dd cc cc bb bb
> > 0040   aa aa
> >
> > then, listening on eth0.1 I got:
> > 0000   ff ff ff ff ff ff ff ff ff ff ff ff 88 a8 00 11
> > 0010   08 00 45 00 00 34 21 a6 00 00 40 01 3e 11 0a 02
> > 0020   01 01 08 08 08 08 08 00 26 19 d1 e1 00 05 aa aa
> > 0030   bb bb cc cc dd dd ee ee ff ff ff ff ee ee dd dd
> > 0040   cc cc bb bb aa aa
> >
> > and listening on eth0:
> > 0000   ff ff ff ff ff ff ff ff ff ff ff ff 81 00 00 01
> > 0010   88 a8 00 11 08 00 45 00 00 34 4c e4 00 00 40 01
> > 0020   12 d3 0a 02 01 01 08 08 08 08 08 00 c8 4f 2f ab
> > 0030   00 05 aa aa bb bb cc cc dd dd ee ee ff ff ff ff
> > 0040   ee ee dd dd cc cc bb bb aa aa
> >
> > so that all these three captures taken from inside the router look good.
> >
> > As a comparison, I used the same nping command on a TP-Link WDR3600
> > router and the packet captured on my laptop looks perfectly ok, sniffing
> > on enp0s25:
> > 0000   ff ff ff ff ff ff ff ff ff ff ff ff 88 a8 00 11
> > 0010   08 00 45 00 00 34 88 39 00 00 40 01 6f 59 0a 0d
> > 0020   69 1a 08 08 08 08 08 00 da bd 1d 3d 00 05 aa aa
> > 0030   bb bb cc cc dd dd ee ee ff ff ff ff ee ee dd dd
> > 0040   cc cc bb bb aa aa
> >
> > And capturing on enp0s25.17:
> > 0000   ff ff ff ff ff ff ff ff ff ff ff ff 08 00 45 00
> > 0010   00 34 33 0e 00 00 40 01 c4 84 0a 0d 69 1a 08 08
> > 0020   08 08 08 00 60 93 97 67 00 05 aa aa bb bb cc cc
> > 0030   dd dd ee ee ff ff ff ff ee ee dd dd cc cc bb bb
> > 0040   aa aa
> >
> > In case this bug a hardware one for all the MediaTek-based routers, I
> > would suggest considering running Babeld on the br-lan bridge without
> > any VLAN (neither 802.1q nor 802.1ad) rather than on eth0-1_17.
> > BMX6 was already running on the bridge and to avoid it to run also
> > inside BATMAN-adv we were using this ebtables rule:
> > https://github.com/libremesh/lime-packages/blob/master/packages/lime-proto-bmx6/src/bmx6.lua#L133-L134
> > we could do the same for Babeld (and for consistency I would also not
> > use VLAN for it on wireless mesh interfaces).
> >
> > Thanks && ciao;
> > Ilario
> >
>
>
> --
> Ilario Gelmetti
> iochesonome at gmail.com
> igelmetti at iciq.es
> ilario.gelmetti at estudiants.urv.cat
>
> _______________________________________________
> lime-dev mailing list
> lime-dev at lists.libremesh.org
> https://lists.libremesh.org/mailman/listinfo/lime-dev


More information about the lime-dev mailing list