Losing Ethernet connection randomly

Hey All,

I'm here again, this time with a strange question...

We maintain the IT infrastructure in a school. We have multiple buildings in the school campus. We have a PRIVA Systems - HVAC building automation system & remote controller (industrial PC's).

In one building, lets say the main building, we have the main automation system to manage the HVAC, it controls al the heating and air flow systems.

The maintenance engineers can remotely log in into the system via TeamViewer app or Windows Remote desktop from a system in our maintenance office which we can likewise manage the system.

From the beginning we have had problems with the office system and only that one PC. Sometimes the PC looses the connection with internet, yet the PC is still running.

We can't ping the PC and we can't connect by RDP nor can we connect by TeamViewer app either... nothing.

It has a fixed IP address, all settings are normal... no strange errors or whatever in the logs...

The PC went back to the manufacturer, they checked it over a time of 2 months... in that time we hooked up an other PC with the same settings as the original... and guess what! No problems at all !!

The manufacturer didn't find anything on there side as well...

Now the big issue is: They say (firm that installed the industrial PC): It is your fault if the device is't running properly cause it is an internal network issue...

We can't find anything .. the strange thing is that the laptop workes fine with the same settings...

The industrial PC (PRIVA SX100) runs normal over several weeks, than suddenly no connection, sometimes it runs for 2 days, sometimes 4 ... before the "error" occurs..

What is your suggestion?

okay we can replace the %#*@ thing .. but we have to find the problem so we can explain it to several interested parties. (and pay the cheque or.... :) )

Can I check a specific log? ...

All the power options are set to never sleep. Also the network adapter power options are set to never sleep.

There is one thing that I noticed :

We saw that the secondary DNS was causing trouble, so we deleted it and only fill in the primary DNS what is, in this case, the same as the default gateway.

Secondary DNS is given by the ISP.

When we removed the secondary, the PC was going well, when it failed again and rebooted the pc, we saw that the secondary DNS was back... how is this possible?

Can this be the major problem?

Thanks in advance!

Answer this question I have this problem too

Is this a good question?

Score 1

Comments:

Add a comment
Black Friday
Broken doesn't stand a chance.

1 Answer

Most Helpful Answer

You could have a few possible issues here:

The first is your IP addressing of your internal network may create an issue. Lets say you are using non-Routable addressing (Private Network) internally and your PRIVA automation system which is in the boiler rm has a fixed IP address in the address block you are using. Now the network path between the boiler rm and your office needs to be checked out. Ideally, you have a direct connection (no routers or switches) to the hub your office PC is sitting on. If you can't achieve that then make sure you only have layer 2 switching (no routing) devices between the two systems. Now within your management PC create a host file with the IP address and the host name. So now the systems routing tables see's this connection as a direct connection (same IP subnet on a flat network).

Now lets move on... Your management PC in your office likely has a DHCP address assignment. Thats fine but you'll want to reserve the IP address within the DHCP server (if you want you can also do this with the boiler system). That way the DHCP server won't alter the settings which could have messed you up. Ref: What Is DHCP

Now the bully in the room! Windows its self has a name service WINS. I Like to say you WINS you lose! As I hate it! It always messes you up as the pecking order is WINS first then it will fall over to IP name services (Host & DNS). What can happen is the WINS DB can get corrupted and depending on what you assign your systems for names it can confuse the lookup. So what I do is use a different name for the Windows systems (for WINS) than the TC/IP name I assign for the same system as an example ' wacme-school.com' and 'acme-school.com' That way WINS is bypassed! Ref: Windows Internet Name Service (WINS) Overview

As to gaining access to the Internet from this system (outwards) your DHCP settings should be fine setting the nearest routers IP address as your gateway and DNS, then use the helper function within the router to forward the request to your DNS in-house (that is if you have one) and then the ISP's DNS.

As to the reverse, gaining access to the PRIVA automation system from the Internet. I would recommend you do this through a VPN tunnel so you can control who is gaining access to the system (either one) and I would recommend you limit access if you can to the office system only (not the system in the boiler rm). Then piggy back the connection over to the PRIVA automation system in the boiler rm. That way you have a second door before someone can gain access to the system (you never know these days...) and limit it to as few people as needed (don't forget to alter the passwords for everyone if someone leaves! and delete their systems account and VPN access as well).

As to how to monitor things... You'll need to setup an RMON probe to pole the systems (not to often now!) Then review its logs for what happened. Ref: RMON

Last piece here.. As you have multiple buildings one of the issues you can encounter is a ground loop. Review with your cabler the ANSI/TIA-607-B Generic Telecommunications Bonding and Grounding (Earthing) for Customer Premises You may want to ask them if they have BICSI certified people on their staff. You may need to have a power meter probe setup across the buildings to monitor each buildings ground.

Was this answer helpful?

Score 1

Comments:

Hey Dan,

Thank you very much in the first place!

I'll give you further information on the issue...

The PC in the boiler room has a fixed IP on another subnet. (via an outlet to the Ethernet switch & patch rack). From there it is directly connected to the ISP Router (this is in the range (internal) of 192.168.1.1).

We tested the outlet connector and connected the PC directly to the ISP Router (same problem)... and we hooked up a test the laptop for 2 months or more and that was stable!

Our internal DHCP is coming from another router in another IP subnet: 192.168.0.1 it's giving out all the other IP addresses to the computers & etc... we have a fixed pool as well in this subnet, but anyway...

I will try to make the host file (didn't do this before in practice, only saw it in theory) ...

I will try to find more info on WINS I know the netsh command, but never really worked with it.

We have no internal DNS, we only use the default gateway (in the school it's all standalone computers, no server edition... schools you know, no budget...)

What the VPN is concerned :

We can't fully manage the HVAC PC because even though its in our building, we paid for it but we are only responsible for our network alone not the PC it belongs to a contractor (HVAC engineers), they financed a lot in the building. But, I will talk to one of the engineers to set up a VPN connection together.

I will look further to solve this problem. I will let you know something if your suggestions have worked for me.

Is it a possibility that I can create SNMP traps? Or use Wireshark and let it run on the HVAC PC? Then I can see (I guess) when it goes wrong? I've used Wireshark a couple of times (in a lab environment for a couple courses, but its been a long time).

Thanks again. Kevin

by

@kevind - One of the issues I was trying to get away from is when you have devices crossing internally different subnets. This is where the router can become an issue as the routing table will need an explicit route between the subnets (Port A to Port B and bidirectionally) as the route caching can expire. Review your Routers docs on how. You can also setup a IP Tunnel I try to stay away from tunnels but they are useful! The risk is if you are altering your network you can mess it up if you forget about them.

Today even in a school we need to build smarter networks from infiltration of rogues which is why resources like HVAC systems need to be more tightly controlled. Which is why a VPN is a smart way to protect your assets!

As for SNMP traps sure you can do that but its less useful on the system. Remember if it can't access the net it can't tell anyone. Which is why I use RMON probes. Here I have an independent system polling the systems in question (don't poll to frequently as you'll mess up your system primary purpose). The probe should trace the connection and tell you what side of the path was disconnected. If you can monitor on the switch any network storms as that can effect things at the Media Layers. A good way to think this is you're in a building on the top floor all you can see is the floor below you nothing lower so if the 1st floor fails you'll go crashing down but you won't know why! Was it the 1st, the 2nd floor or higher that failed? Given your location (7th floor- ISO/OSI model)

by

Hi Dan,

So this is how i solved the problem ( hopefully for good) ...

I blocked WINS.. and i saw there was an issue with IPv6 as well... but couldn't solve the issue, as mentioned earlier, we cannot acces all the settings.

I checked the logs again ( after reconnecting to the pc , what was hard cause it was continiouly failing acces the network.

I saw that there was an major issue with the DNS resolving.

After trying multiple things, i filled an open DNS in as secondary DNS ( google..) and suddenly everything was working fine again.

Still a mistery..

The IPv6 was on , but everything was going by IPv4 ( fixed IP... ) when i unckecked the IPv6 he couldn't find his network connection, when i re checked the IPv6 , he found his network again but no connection the the internet ( DNS problem..)

by

@kevind - So your ISP's DNS is not reliable from your connection point! That maybe related to the limits of your internet connection and other activity going on on your's or you ISP network.

What can happen here is the (connection pipe) bi-Directionally is not able to sustain the data flows. This is a common issue with asymmetric connections. As the flow inward to you is limited, but the flow outward is bigger.

So what happens here is the priority of the connection then becomes an issue this is where the DNS query is not as high as the ACK's and other TCP flows.

What you may want to consider if the problem shows up again is look at creating your own caching DNS within your net. Or, setup local host files on the systems which are talking within the local Intra-network (not Internet) which was one of the direction I had recommended at the start.

by

Add a comment

Add your answer

Kevin D will be eternally grateful.
View Statistics:

Past 24 Hours: 0

Past 7 Days: 1

Past 30 Days: 5

All Time: 52