Ethernet loop in office

mozarellamozarella Member, Beta Tester Posts: 79 ✭✭✭
Does anyone have experience with network loops, spanning tree protocoll and redundant lines between switches?

Though our switches should use spanning tree protocol to provide redundancy. We have used redundant lines to our access-switches. Network specialists have installed this over one year ago. A few months ago, we notices strange things within our network. continuous ping to different virtual machines are answering with timeout sometimes, but nobody did sth before and after. So we thought the VMs are overloaded or had problem with storage.. long history. checked hardware and vmware, but no result, everything works fine. Networkspecialist from HPE has done logfile analyses and say we have network loop. Funny, because there isn't any loop besides redundant ways to access switches. And this is set to spanning tree, just one line is configured as "forwarding" and the 2nd is configured as "blocking".

Does anyone know some tools for checking network for ethernet loop? In office we're using HPE and aruba components. At home i'm using netgear (cheapest 8-port-managed switch) which has loop detection built in already. And also unifi which also could detect loops and use spanning tree protocoll. I wonder why more expensive hardware won't be able to detect ethernet loop.
Actually the aruba-switches say there's no loop and spanning tree is working fine.
But we have noticed through HPE's logfile analysis, two MAC-addresses are coming up to two ports of our redundant applied core-switches.
Can somebody give me some advice what i should check?

mozarella
kltaylorVioletChepilinh

Answers

  • kltaylorkltaylor Member, Beta Tester Posts: 570 ✭✭✭✭✭
    What was the purpose initially for the loop to be implemented?
    "There's a fine line between audacity and idiocy."
    -Warden Anastasia Luccio, Captain
    VioletChepil
  • mozarellamozarella Member, Beta Tester Posts: 79 ✭✭✭
    Actually we didn't implement any loop. We have changed all switches (access-switches and also core-switches) to implement a new network and new ESXi-hosts. The new network is designed that most of the access-switches are connected redundant to core-switch by using spanning tree protocoll.
    For example we have two core-switches called infrz1cs1 (infrastructure rechenzentrum (server-room) 1 core switch 1) and infrz1cs2. We also have two access-switches (48 port; sfp+ ports) called infrz1as1 and infrz1as2 connected via two DAC-cable to both of the core-switches. So there are 4 lines. Example infrz1as1 is connected with A1-port to infrz1cs1 to port 1. infrz1as1 A2-port is connected to infrz1cs2 port 1.
    cs1 and cs2 are also conntected with port 9 and 10 to each other. MSTP (spanning tree) is always disabling the 2nd way. So there shouldn't be any loop.

    Maybe somebody is connecting a switch twice to another switch by mistake. How to find this? I just notice that some servers (VMs) are not possible to get right answer from ping-command, just deliver timeout, but only sometimes. Most of the time it's working well.
    VioletChepil
  • kltaylorkltaylor Member, Beta Tester Posts: 570 ✭✭✭✭✭
    I have to be honest, I don't have experience with that at all.  It seems like it would be similar to a network consisting of a firewall, domain server, and smart switches.
    Without using advanced routing through the firewall to achieve multiple subnets on the same network (where my experience lies) trying to find what you're looking to find is going to be an ordeal for sure.
    "There's a fine line between audacity and idiocy."
    -Warden Anastasia Luccio, Captain
    VioletChepilHronos
  • mozarellamozarella Member, Beta Tester Posts: 79 ✭✭✭
    We don't have different subnets within the LAN like you mentioned. subnet (DMZ, multiple internet-connections, VPN..) are seperated through firewall and are not "inline" at the LAN.

    DNS, DHCP, Windows Active Directory, fileservices and other services are running within Windows VMs at our three ESXi-hosts.
  • VioletChepilVioletChepil London, UKMember Posts: 2,474 admin
    Ok I'll see if any other of the network experts have anything to add on this topic!
    @TheCustomCave @Romulus @Pooh @Hronos @Idroy @GadgetVirtuoso - anything to add on this topic? 

    Community Manager at Fing

  • PoohPooh Member, Beta Tester Posts: 675 ✭✭✭✭✭
    Sorry @VioletChepil - this is something this Bear's got no experience on... but I am following the thread with interest.
    People say nothing is impossible, but I do nothing every day.
    VioletChepil
  • TheCustomCaveTheCustomCave Member, Beta Tester Posts: 48 ✭✭✭
    Wireshark may be worth looking at. That could at least give you some idea of which switch is going crazy.
    I've had some similar issues with some of my switches with constant arp flooding. My usual response is to check the ports from within the management console of the switch itself, see what's being hammered the most. Usually loops would cause total blackouts on the network rather than sporadic ping responses so I'm not entirely convinced it's down to the loop.

    VioletChepilkltaylormozarellaMarc
  • RomulusRomulus Member, Beta Tester Posts: 34 ✭✭✭
    I have no experience in this area either. But you might want to try and audit the devices on your network to make sure you have nothing added by users. Examples: Laptop connected wired and wirelessly to the network that is bridging. A rogue WAP that someone added, someone plugging cables where they should not.

    You should also consider a bad cable, a stupid bad cable in a core part of our network at work caused us months of poor performance.
    I think using wireshark should help but it's probably not going to be an easy thing to track down with that.
    VioletChepilkltaylormozarella
  • kltaylorkltaylor Member, Beta Tester Posts: 570 ✭✭✭✭✭
    Wireshark may be worth looking at. That could at least give you some idea of which switch is going crazy.
    I've had some similar issues with some of my switches with constant arp flooding. My usual response is to check the ports from within the management console of the switch itself, see what's being hammered the most. Usually loops would cause total blackouts on the network rather than sporadic ping responses so I'm not entirely convinced it's down to the loop.

    Wireshark can provide a lot of useful information about the traffic being received through TCP ports.
    "There's a fine line between audacity and idiocy."
    -Warden Anastasia Luccio, Captain
    TheCustomCaveVioletChepil
  • kltaylorkltaylor Member, Beta Tester Posts: 570 ✭✭✭✭✭
    Romulus said:
    I have no experience in this area either. But you might want to try and audit the devices on your network to make sure you have nothing added by users. Examples: Laptop connected wired and wirelessly to the network that is bridging. A rogue WAP that someone added, someone plugging cables where they should not.

    You should also consider a bad cable, a stupid bad cable in a core part of our network at work caused us months of poor performance.
    I think using wireshark should help but it's probably not going to be an easy thing to track down with that.
    I completely agree with the suggestion for an audit.  As I've stated before here, mute/block it and see who yells about it. =)
    "There's a fine line between audacity and idiocy."
    -Warden Anastasia Luccio, Captain
    TheCustomCaveVioletChepil
  • mozarellamozarella Member, Beta Tester Posts: 79 ✭✭✭
    Thanks for your suggestions. Really interesting ways to go on and check. i'll try my best.

    kltaylorVioletChepil
  • HronosHronos Member, Beta Tester Posts: 283 ✭✭✭✭
    Pooh said:
    Sorry @VioletChepil - this is something this Bear's got no experience on... but I am following the thread with interest.
    Same here! hehehe
    I hope to learn something more!
    Keep looking up!
    kltaylorVioletChepil
  • inhinh Member, Beta Tester Posts: 2

    Spanning tree is kind of dated at this point.


    I would use port channels.

    VioletChepilkltaylor
  • kltaylorkltaylor Member, Beta Tester Posts: 570 ✭✭✭✭✭
    inh said:

    Spanning tree is kind of dated at this point.


    I would use port channels.

    I was thinking the same thing but was waiting for validation through another poster.
    "There's a fine line between audacity and idiocy."
    -Warden Anastasia Luccio, Captain
    VioletChepil
  • mozarellamozarella Member, Beta Tester Posts: 79 ✭✭✭
    Some news about this topic. We had specialist in house. Logfile-analysis around nearly all switches has found that MAC-address flip. Actually not the MAC-addresses, but one MAC is seen at different ports. Should be a ethernet loop. But actually the switches should detect ethernet loops. Spanning-tree config is correct.
    I just noticed a device, which is connected to a unmanaged switch behind a managed switch. So i could notice the MAC-address in the managed switch in front of the unmanaged one. I also saw the IP-address (200.200.200.xx). This doesn't fit to our LAN's IP-segment. And it's actually not a private IP-address. For the first, i've set to lockout this MAC-address at this switch. Next step is to find the device and check.
    But actually i wonder why fingbox didn't see this device. The MAC-address isn't shown up in the fingbox's list. The device is hard-wired to switch. When i set up a notebook with similar IP-address and connect this to another switch (which is in my office-region), i could get answer from that device (ofcource before the MAC-address-lockout was set). Why fingbox didn't recognize this device? Because of the different IP-address?
    Actually fingbox is recognizing devices before they'll get an IP-address over DHCP. Already when the first communication is running.
    kltaylorCiaran
  • VioletChepilVioletChepil London, UKMember Posts: 2,474 admin
    Hi @mozarella - @Robin and I are having a bit of a hard time trying to understand what Fingbox should be detecting in this scenario? 
    So the device connecting to the second switch is not being detected?
    @kltaylor @Pooh @Marc wondering if you may have additional thoughts/insight on this as network set-up is more professional.

    Community Manager at Fing

  • MarcMarc Member, Beta Tester Posts: 483 ✭✭✭✭✭
    While I've not had much experience with Fingbox in a commercial setting, I, and I would venture others, have seen quirky behavior when the item being identified is remote to the Fingbox, either via range extenders, mesh networks or in your case, switches.  Could it be that because this is a few hops away, Fing could be having issues discovering and or identifying your hosts?
    Thats Daphnee, she's a good dog...
    VioletChepil
  • inhinh Member, Beta Tester Posts: 2
    My guess is you had the port setup as either an access port with a different VLAN, or it was a Trunk port with tag, and or a native VLAN set.  Fing dose not detect other VLAN/Subnets or than the one it is directly on. 
    VioletChepil
  • mozarellamozarella Member, Beta Tester Posts: 79 ✭✭✭
    There's not VLAN or a Trunk port. The firm's LAN is only one Subnet / VLAN 0.
    Actually it's a flat system only switches are connected. We also have some LWL / fibre connections into different buildings. But it's one big LAN / VLAN 0 and MAC-communication (broadcast..) is possible.
    Actually once a device is connected, fingbox will detect it. But we have a device which is connected to one switch. LAN is IP 192.168.15x.xxx but this device has set up ip 200.200.200.53 and i wonder why fingbox can't detect this device. The IP is different from our IP-Range, so it's not possible to access this IP, but the MAC-address is in the same ethernet.
    I've used a notebook connected to our ethernet (actually it has IP 192.168.15x.xxx). I've just set the IP to 200.200.200.54 and so i could ping 200.200.200.53. This means the device is accessible and connected to the ethernet. Just the IP doesn't fit to the LAN's IP-range.
    If fingbox could detect devices just through physical connection to a switch (because of the ARP-layer), then fingbox should also detect this device. Fingbox also detects devices when they get connected to a LAN, right before IP-address could be sent out from DHCP-server.

  • VioletChepilVioletChepil London, UKMember Posts: 2,474 admin
    Ok I'm checking if @Robin can offer any additional advice on this one. He'll be online tomorrow. 

    Community Manager at Fing

  • RobinRobin Administrator Posts: 143 admin
    Hi @mozarella
    As you mentioned the IP address subnets are different so I believe the device is sharing some other network gateway than the one on which Fingbox is activated. You can check this by going into the router settings and then assigning the same subnet IP address and then perform a scan again to see if the fingbox is able to detect the device or not.
    VioletChepil
  • mozarellamozarella Member, Beta Tester Posts: 79 ✭✭✭
    Actually there's no other gateway. The gateway is a clustered watchguard firewallsystem which is conntected to different internet-connections. So the LAN is just one subnet, where fingbox is placed in.
    If i'll change the router's ip, the complete LAN is out of duty. Especially DNS and DHCP are not used within the watchguard cluster, it's served by windows ADS-Servers.
    I can set the funny ip-range to a notebook and then use netscanner to check if there are other ips of this range within the LAN-subnet.
    VioletChepil
  • VioletChepilVioletChepil London, UKMember Posts: 2,474 admin
    Ok @mozarella - let us know how you get on.

    Community Manager at Fing

  • mozarellamozarella Member, Beta Tester Posts: 79 ✭✭✭
    Yes, got something intersting. I've set the funny IP-address to a notebook and had network-scan / ip-scan. Indeed i found two more devices with similar IP-range. I've set entries to the closest managed switch, lockout-mac for thouse mac-addresses. I've done this, because my office is not close to that part of the company to have a walk there. Will need to check the network-wire within the next few days.
    But this was not the main-trouble in our network, the problems are still alive. Need to go on checking, do logfile-analyses of switches and so on...
    VioletChepil
  • mozarellamozarella Member, Beta Tester Posts: 79 ✭✭✭
    I think, i got the solution. Not sure if the problem is really gone, but a few days are passed since we changed sth and i didn't get ping-timeout that much anymore.
    I've done deep network analysis "by feet", not just doing logfile-analysis. We had a part of the firm, where just unmanaged switches are placed in. I've used a managed switch to put to that places. So i could get the logfiles asap after i got ping-timeouts. Through this logfiles i could see and folow the ports who switched. At the end, i could disable different clients, each time i got timeouts back. Suddenly this timeouts are stopped and i know which part i should check. A workstation far far away from server-room was connected to a old wifi-router. This wifi-router caused the problem. After change this router with a new model + using a 5-port-switch, we didn't have some network-trouble anymore.
    Actually i don't know, why this wifi-router caused the problem, because i can't really set strange options. Maybe the daisy-chain was too long?
    VioletChepil
  • HronosHronos Member, Beta Tester Posts: 283 ✭✭✭✭
    mozarella said:
    Actually i don't know, why this wifi-router caused the problem, because i can't really set strange options. Maybe the daisy-chain was too long?
    Maybe that, maybe an incompatibility with the old firmware on that Wi-Fi router.  some times that's a problema and pretty difficult to address.
    Keep looking up!
    VioletChepil
Sign In or Register to comment.