Network Event Status 


If the info has spelling and grammer errors, please remember that network engineers post here, live, without any executive review.


10/18/16 Much of the day. Odd routing. We found cogent a large bgp peer was mistakenly announcing routes we stopped sending them to cogent. There was some sort of problem where they were stating they were the preferred route for some. We shut the session and things when back to the normal fast speed. So if you had slowness that was much of it. We also found Amazon antoehr peer had issues and still may with some resources on thier network. We have seen the lag be intermitant. Still may occur - unknown as its not in our network.



10/2/16 1:25 PM NOC Event Over 1:38 PM DNS servers used by many internal to FIC network went down due to failed UPS when Palo Alto had power spike on the ground. UPS is now bad. Replaced UPS and restarted DNS servers. This is a perfect example (one of them) on why you want to run your own DNS in your network for your users to resolve domain names. Setup FIC DNS first entries then add a 3rd like google's 8.8.8.8 - your network will not be down due to DNS. The 8.8.8.8 is not as fast as FIC name servers but it is a good backup - better than nothing.



9/9/2016 5:43pm We have the logs and also managed to grab core memory dump files from the routers. These indicate a hack attempt. (We were able to note the ip addresses and blackhole them.) The router just happened to be in a very busy mode as a carrier with 700,000 prefixes dropped and was coming back online. This occurs in the background and customers do not notice that packets are routing properly via other network paths. So normally this is not even an issue except the hack inbound occurred at the same time and the CPU load of BGP was already running with zero idle time. Router code decided to reboot as that created a system-health-check failure. A reboot on a router with so many bgp sessions can cause other network issues as it takes time to establish all its sessions and build tables for optimal routing once again. Network appears to be stable. We are watching as always.

9/9/2016 3:59pm more details...a routing protocol would not sync properly with neighboring routers. Until the sync was complete the routes were not exchanged with other routers. Routes are settling fine. Tables now have 1/2 a million - 1 million more to go.

9/9/2016 3:45pm Northern California a Core Router rebooted - we did not do it - investigating and will post more soon.



8/25/2016 This morning a router was rebooted about 6:40 AM. After reboot no one noticed a trunk port disabled. This caused some sites to be unavailable to northern California sites. We think the router had a poor save command and somehow that card got dropped. All should be well.







             Fiber Internet Center LLC          SF 650-330-0428         LA & OC 714-619-0146      SD 619-398-0268         © 2012