This new field reflects the TQ to the selected gateway.
Before this commit, if you had connectivity issues in a larger mesh,
it was a tedious task to understand which nodes are affected and which
are not. By providing this new value for each node, it becomes easier
to see which nodes are affected by the connectivity issues and which
are not.
The new field "gateway_tq" is located at the toplevel of the
statistics resource (next to "gateway" and "gateway_nexthop"):
gluon-neighbour-info -d ::1 -r statistics
{
...
"gateway": "02:a1:71:04:09:10",
"gateway_nexthop": "88:e6:40:20:90:10",
"gateway_tq": 193,
...
}
With the new role-based interface configuration, it would be better to
rename the wan/wan6 interfaces to uplink/uplink6, but that would cause
unnecessary churn for the firewall configuration, so it is left for a
later update.
As all interfaces with the 'uplink' role are in the br-wan bridge, it is
not possible to assign these to the 'mesh' role independently - instead,
br-wan is added as a mesh interface as soon as a single interface has
both the 'uplink' and 'mesh' roles. The UCI section for this
configuration is now called 'mesh_uplink' instead of 'mesh_wan'.
For all interfaces that have the 'mesh', but not the 'uplink' role a
second configuration 'mesh_other' is created. If there is more than one
such interface, all these interfaces are bridged as well (creating a
bridge 'br-mesh_other'). This replaces the 'mesh_lan' section with its
optional 'br-mesh_lan' bridge, but can also include interfaces that were
not considered "LAN" when interfaces roles are modified (via site.conf
or manually).
This removes PKG_VERSION and PKG_RELEASE from most Makefiles, as the
value was never useful for Gluon packages; instead, PKG_VERSION is set
to 1 in gluon.mk.
It also removes two other weird definitions:
- gluon-iptables-clamp-mss-to-pmtu replicating the old PKG_VERSION logic
from gluon-core, but without the fixed PKG_BUILD_DIR to prevent
unnessary rebuilds
- gluon-hoodselector set GLUON_VERSION=3
Allow the transmission of IPv6 multicast packets as long as they are not
flooded through the whole mesh.
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
With batman-adv 2020.4 and the according backports to batman-adv v2019.2
several more bugs were found and fixed regarding the batman-adv
multicast optimizations feature.
Also a "wakeup-call" feature was added to the Linux bridge IGMP/MLD
snooping code in Gluon to work around issues with Android devices.
With batman-adv now at v2019.2, multicast-to-multi-unicasts conversion
is supported, too. Which means that even if there are a few outdated nodes
these and all other recipients will be served multicast packets via unicast,
too, as long as the sum of receiving nodes does not exceed the multicast
fanout setting (default: 16). If is exceeded, then batman-adv will revert
back to broadcast flooding automatically.
Long story short, with all these extra measures in place, let's reenable
the batman-adv multicast optimizations to reduce the layer 2 overhead
and in preparation for multicast applications in the future.
The default is enabled for this feature anyway, so removing the
"batctl multicast_mode 0" overwrite is sufficient.
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
With very bad timing, it is possible that the teardown script of a
gluon_mesh interface runs when bat0 was just created, but primary0 is not
yet added to it. Although there is no hardif to remove in this case,
bat0 will still be deleted, because there is no hardif in bat0.
Disable the interface removal logic by passing `-M` to `batctl interface`.
As a partial fix to #496, do not touch the MAC address of the WAN
interface when using VXLANs (as only the MAC address of the VXLAN
interface matters to batman-adv).
In addition this PR contains:
- split of gluon-respondd provider into multiple source files
- minor additional cleanups in gluon-mesh-babel respondd provider
(untested, as the babel respondd provider already doesn't compile prior
to these changes...)
This reverts commit 9b1eb40fe7.
With the batman-adv v2019.2 upgrade reverted (c1a7733956), the batman-adv
multicast-to-multi-unicast feature is not available yet. Without that it is
going to be very unlikely of the batman-adv multicast optimizations to
take effect. E.g. some outdated nodes would disable it.
To avoid confusion and diversion with a few communities having it enabled
and most implicitly deactivated, just deactivate it for all for now
until batman-adv is updated to v2019.2 or greater again.
The new routing_algo site.conf value BATMAN_IV_LEGACY is introduced. With
these changes, the routing_algo setting becomes mandatory.
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
We cannot add the same file (here: /lib/gluon/mesh-batman-adv/compat) to
two, installed packages. Therefore, instead of determining the compat
version number from this file, infer it from the batman-adv release
version number instead.
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Several fixes and enhancements related to multicast were added upstream
in batman-adv. So let's give the batman-adv multicast optimizations
another go.
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
The batctl v2013.4 build was removed from the batman-adv-legacy package
as the current, upstream batctl releases work with batman-adv-legacy,
too.
As a replacement we need to add the upstream batctl dependency to
gluon-mesh-batman-adv-14 to have a batctl available again here.
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
The commit ee63ed42fe ("gluon-mesh-batman-adv: List neighbors with
non-best direct link") removed the check whether a neighbor has the
BATADV_ATTR_FLAG_BEST set. But consumers may still want to filter out or
mark neighbors which don't have this flag set. To assist with such a
feature, enhance the neighbor object with an extra boolean "best" attribute
which stores whether the BATADV_ATTR_FLAG_BEST was found or not.
Reported-by: Vincent Wiemann <webmaster@codefetch.de>
Links between two direct neighbors are not always the best route between
these devices. The flag BATADV_ATTR_FLAG_BEST would not be set for these
originator entries and the respondd module would just ignore this entry.
This causes missing links in meshviewer and similar tools. And when the
link quality is nearly equal and but fluctuates slightly, these links will
from time to time appear and disappear on the map.
Fixes: 2e0e24a992 ("announce neighbours using alfred/gluon-announce")
The amount of local wifi clients is currently counted by two different
ways:
* asking the kernel wifi layer for the number of of clients on 2.4GHz and
5GHz band
* asking batman-adv for the number of non-timed out entries in the local
translation table with WiFi flag
The number of wifi24+wifi5 and the number of TT wifi client counts are
reported via respondd to various consumers. The ffrgb meshviewer is
displaying these values as:
* 2,4 GHz: wifi24
* 5 GHz: wifi5
* other: (TT local wifi+non-wifi clients) - (wifi24 + wifi5)
But the local translation table is holding entries much longer than the
wifi layer. It can therefore easily happen that a wifi client disappears in
the kernel wifi layer and batman-adv still has the entry stored in the
local TT.
The ffrgb meshviewer would then show this count in the category "other".
This often results in confusions because "other" is usually for ethernet
clients. And nodes with a frequently disappearing larger group of clients
(near bus stations or larger intersections) often show most clients under
the group "other" even when this devices doesn't have a LAN ethernet port.
It is better for presentation to calculate the number of total wifi clients
by summing up wifi24 + wifi5. And getting the number of total clients (non
wifi + wifi) by adding the result of the previous calculation to the sum of
non-wifi client in the local batman-adv translation table.
Fixes: 89a9d8138c ("gluon-mesh-batman-adv-core: Announce client count by frequency")
Reported-by: Pascal Wettin <p.wettin@gmx.de>
The commit b3762fc61c ("gluon-client-bridge: move IPv4 local subnet route
to br-client (#1312)") moves the IPv4 prefix from the local-port interface
to br-client. A client requesting an IPv4 connection to the IPv4 anycast
address of the node (the device running gluon) will create following
packets:
1. ARP packet from client to get the MAC of the mac address of the anycast
IPv4 address
2. ARP reply from node to client with the anycast MAC address for the IPv4
anycast address
3. IPv4 packet from client which requires reply (for example ICMP echo
request)
4. ARP request for the client MAC address for its IPv4 address in prefix4
(done with the mac address of br-client and transmitted over br-client)
5. IPv4 packet from node (transmitted over br-client with br-client MAC
address) as reply for the client IPv4 packet (for example ICMP echo
reply)
The step 4 and 5 are problematic here because packets use the node specific
MAC addresses from br-client instead of the anycast MAC address. The client
will receive the ARP packet with the node specific MAC address and change
their own neighbor IP (translation) table. This will for example break the
access to the status page to the connected device or the anycast DNS
forwarder implementation when the client roams to a different node.
This reverts commit b3762fc61c and adds an
upgrade code to remove local_node_route on on existing installations.
The commit b3762fc61c ("gluon-client-bridge: move IPv4 local subnet route
to br-client (#1312)") moves the IPv4 prefix from the local-port interface
to br-client. A client requesting an IPv4 connection to the IPv4 anycast
address of the node (the device running gluon) will create following
packets:
1. ARP packet from client to get the MAC of the mac address of the anycast
IPv4 address
2. ARP reply from node to client with the anycast MAC address for the IPv4
anycast address
3. IPv4 packet from client which requires reply (for example ICMP echo
request)
4. ARP request for the client MAC address for its IPv4 address in prefix4
(done with the mac address of br-client and transmitted over br-client)
5. IPv4 packet from node (transmitted over br-client with br-client MAC
address) as reply for the client IPv4 packet (for example ICMP echo
reply)
The step 4 is extremely problematic here. ARP replies with the anycast IPv4
address must not be submitted or received via bat0 - expecially not when it
contains an node specific MAC address as source. When it is still done then
the wrong MAC address is stored in the batadv DAT cache and ARP packet is
maybe even forwarded to clients. This latter is especially true for ARP
requests which are broadcast and will be flooded to the complete mesh.
Clients will see these ARP packets and change their own neighbor IP
(translation) table. They will then try to submit the packets for IPv4
anycast addresses to the complete wrong device in the mesh. This will for
example break the access to the status page to the connected device or the
anycast DNS forwarder implementation. Especially the latter causes extreme
latency when clients try to connect to server using a domain name or even
breaks the connection setup process completely. Both are caused by the
unanswered DNS requests which at first glance look like packet loss.
An node must therefore take care of:
* not transmitting ARP packets related to the anycast IPv4 address over
bat0
* drop ARP packets related to the anycast IPv4 when they are received on
bat0 from a still broken node
* don't accept ARP packets related to the anycast IPv4 replies on local
node when it comes from bat0
Fixes: b3762fc61c ("gluon-client-bridge: move IPv4 local subnet route to br-client (#1312)")
In multidomain setups, VXLAN is enabled by default, but can be disabled in
domain configs using the mesh/vxlan option. In single domain setups, the
mesh/vxlan option is mandatory.
The UCI option for legacy mode is removed.
Fixes#1364
net.ipv6.conf.br-client.forwarding is moved from gluon-client-bridge to
gluon-mesh-batman-adv, as the setting is not useful with non-bridged
protocols.
The RFC standard multicast querier interval is 120s. Our querier uses in
interval of 20s for better support of roaming clients, but our robustness
setting of 3 leads to external queriers using the standard interval to be
timeout after only 60s, leading to frequent "querier appeared/disappeared"
messages. Increase robustness so that external queriers with any interval
<180s are supported.