Strange reset during the day

Started by Alain Boulet, December 26, 2012, 09:15:51 PM

Previous topic - Next topic

TomW

For what it is worth:

I have not noticed any resets since I quit watching the Classics with the Local App for long periods of time.

I monitor it with a local log and via an online graph page using a little binary RossW cooked up that uses the ethernet and reads Modbus registers. Thanks Ross!

Just FYI and not certain it helps sort out what is going on with resets.

Tom
Do NOT mistake me for any kind of "expert".

( ͡° ͜ʖ ͡°)


24 Trina 310 watt modules, SMA SunnyBoy 7.7 KW Grid Tie inverter.

I thought that they were angels, but much to my surprise, We climbed aboard their starship and headed for the skies

aroxburgh

#91
boB:

I have been copying files back from a local server all day today, and no resets with local traffic approaching 100 Mbit/s (see attached screen dump).

The difference is that I have not been causing congestion on the DSL connection (described a couple of postings ago).

It would seem that a sure way to cause the resets is to saturate the Internet connection, even though the local network traffic is many times higher than the 1.3 Mb/s DSL limitation.

As before, I am no longer using the Local App (for several weeks now), so my most recent resetting behavior seems to somehow involve the My MidNite traffic traversing the Interest from my Classic to MidNite Solar.

Note that I did not get any of the random resetting behavior with my Classic 150 unit new-out-of-the-box last November, but not quite factory-fresh.  ;)  Due to its manufacture date, its firmware had to be updated before it could even talk to other devices over Ethernet.  It was only after Ethernet was added that the resetting problem started happening, and it got worse when the Local App was running. Only belatedly did I notice that resets happen most often when my 1.3 Mb/s DSL connection is congested for seconds or minutes at a time. I have not yet proven that all of the resets correlate with Internet congestion.

Therefore, it would seem to be very clear that the cause of the random resets is the Classic's Ethernet hardware and/or software drivers, and/or the something about the way that the application layer is talking to the Ethernet driver. It would be great if others can glance over at their Network Meter widget (as well as the Classic display) whenever they hear the Classic resetting...and report their findings here.

Good luck with setting up an Ethernet hub to monitor all of the packet traffic in and out of your Classic.

Al
aj4rf
Surveyor SV-235 travel trailer with 1.2 kW PV (6 x Grape Solar GS-3-195, Unirac Solarmount); MidNite Classic 150, MNBCM; 410 Ah @ 12 V (two Trojan L16RE-B); Magnum MS2812 2800 W pure sine inverter, ME-ARC50, BMK; Magnite E-Panel; power transfer cam switch; Dometic 459530 High Effiency Aircon

TomW

Coincidence or not, my Classics have not been sending data to mymidnite during the time of not seeing resets, either?

Just what I am (not) seeing here.

Tom
Do NOT mistake me for any kind of "expert".

( ͡° ͜ʖ ͡°)


24 Trina 310 watt modules, SMA SunnyBoy 7.7 KW Grid Tie inverter.

I thought that they were angels, but much to my surprise, We climbed aboard their starship and headed for the skies

RossW

Quote from: TomW on June 03, 2013, 08:39:46 PM
Coincidence or not, my Classics have not been sending data to mymidnite during the time of not seeing resets, either?

As reported to Bob in IRC yesterday - mine reset shortly before midday local time.
Mine is not set to send data to mymidnite at all. This is (I think) the 2nd time I've caught mine reset in as many weeks.
3600W on 6 tracking arrays.
7200W on 2 fixed array.
Midnite Classic 150
Outback Flexmax FM80
16 x LiFePO4 600AH cells
16 x LiFePO4 300AH cells
Selectronics SP-PRO 481 5kW inverter
Fronius 6kW AC coupled inverter
Home-brew 4-cyl propane powered 14kVa genset
2kW wind turbine

aroxburgh

Quote from: rossw on June 03, 2013, 09:58:50 PM
As reported to Bob in IRC yesterday - mine reset shortly before midday local time.
Mine is not set to send data to mymidnite at all. This is (I think) the 2nd time I've caught mine reset in as many weeks.

Ross:
My apologies for asking such an obvious question, but is is your clock off by 12 hours? (Not so likely since it is a 24-hour setting)
Also, did the 1st reset that you mention occurred in the past two week happen at a similar time, i.e. just before the end of 12 hours or 24 hours?

Moreover, was solar generation rate near maximum during these resets? (Others have suggested this, although none of my resets have correlated with solar output.)

Finally, is your Classic actually connected to your LAN?
If so, perhaps your device is actually active on your LAN, in which case problems in the Ethernet interface/driver/Classic firmware could interact with other network traffic on your LAN.

What are your Classic's network settings?

Thanks for your input!  ;D
Al
aj4rf
Surveyor SV-235 travel trailer with 1.2 kW PV (6 x Grape Solar GS-3-195, Unirac Solarmount); MidNite Classic 150, MNBCM; 410 Ah @ 12 V (two Trojan L16RE-B); Magnum MS2812 2800 W pure sine inverter, ME-ARC50, BMK; Magnite E-Panel; power transfer cam switch; Dometic 459530 High Effiency Aircon

aroxburgh

Ross:

IRC? I'm not familiar with that in a MidNite Solar context. Please fill me in.

Thanks!
Al
Surveyor SV-235 travel trailer with 1.2 kW PV (6 x Grape Solar GS-3-195, Unirac Solarmount); MidNite Classic 150, MNBCM; 410 Ah @ 12 V (two Trojan L16RE-B); Magnum MS2812 2800 W pure sine inverter, ME-ARC50, BMK; Magnite E-Panel; power transfer cam switch; Dometic 459530 High Effiency Aircon

RossW

Quote from: aroxburgh on June 04, 2013, 01:59:25 PM
Quote from: rossw on June 03, 2013, 09:58:50 PM
As reported to Bob in IRC yesterday - mine reset shortly before midday local time.
Mine is not set to send data to mymidnite at all. This is (I think) the 2nd time I've caught mine reset in as many weeks.

Ross:
My apologies for asking such an obvious question, but is is your clock off by 12 hours? (Not so likely since it is a 24-hour setting)

Nope, clock is close.
   ./classicmodbus -t `cat classic.addr`                     
   ID Solar2
   ClassicTime 06:34:47  05/06/2013
   SystemTime 06:35:24 05/06/2013


Quote
Also, did the 1st reset that you mention occurred in the past two week happen at a similar time, i.e. just before the end of 12 hours or 24 hours?

"near" yes - about 90 minutes from a 12-hour boundary I think.

Quote
Moreover, was solar generation rate near maximum during these resets? (Others have suggested this, although none of my resets have correlated with solar output.)

One was near a rapid transition from low to near-full output as the last of the fog broke up.

Quote
Finally, is your Classic actually connected to your LAN?
If so, perhaps your device is actually active on your LAN, in which case problems in the Ethernet interface/driver/Classic firmware could interact with other network traffic on your LAN.

What are your Classic's network settings?

Thanks for your input!  ;D

Well, yes, it is connected to the LAN, as it has always been - and I haven't said otherwise. What I did say is that it is not configured to send data to mymidnite.

The classic, and three small PLCs are in my "battery room", where I achieve isolation by using a WRT54GL wireless router (reflashed to run DD-WRT) in bridged mode. None of the other devices connected to its ethernet are experiencing any problems, nor have they in the couple of years they have been running.
3600W on 6 tracking arrays.
7200W on 2 fixed array.
Midnite Classic 150
Outback Flexmax FM80
16 x LiFePO4 600AH cells
16 x LiFePO4 300AH cells
Selectronics SP-PRO 481 5kW inverter
Fronius 6kW AC coupled inverter
Home-brew 4-cyl propane powered 14kVa genset
2kW wind turbine

RossW

Quote from: aroxburgh on June 04, 2013, 02:02:18 PM
Ross:

IRC? I'm not familiar with that in a MidNite Solar context. Please fill me in.

Internet Relay Chat.  Bob and Ryan are frequently in attendance.
(As are several other midnite forum members!)

You can rock up and say g'day -  irc.rossw.net  in channel #otherpower
3600W on 6 tracking arrays.
7200W on 2 fixed array.
Midnite Classic 150
Outback Flexmax FM80
16 x LiFePO4 600AH cells
16 x LiFePO4 300AH cells
Selectronics SP-PRO 481 5kW inverter
Fronius 6kW AC coupled inverter
Home-brew 4-cyl propane powered 14kVa genset
2kW wind turbine

aroxburgh

#98
Quote from: rossw on June 04, 2013, 04:41:20 PM
Well, yes, it is connected to the LAN, as it has always been - and I haven't said otherwise. What I did say is that it is not configured to send data to mymidnite.

Ross, thank you for your reply. However, since your Classic is connected to your LAN, it will be interacting with it, sending and receiving packets...DHCP, ARP, broadcast UDP packets, etc. Therefore, the question of what are the network settings of a Classic that does the random reset thing is quite relevant. When you get time to look, please post your Classic network settings; I would find it useful to compare my settings (which we included in an earlier post) with yours.

The relevance of discussing your Classic's Ethernet interface is that much of the evidence suggests that things happening on the LAN connected to the Classic cause a condition, either in the Ethernet hardware, the Ethernet driver, or the Classic firmware code, that leads to watchdog timeout. The resets happen irrespective of whether the Local App is in use (in my case that's not even still installed). I don't know yet if turning off communication to Ryan's mymidnite data collection effort makes any difference to the frequency of resets. Actually, perhaps Ryan could answer that one already, since he will probably be aware if there are any Classic users experiencing the resets, who are not sending data to mymidnite.

Thanks!
Al
aj4rf
Surveyor SV-235 travel trailer with 1.2 kW PV (6 x Grape Solar GS-3-195, Unirac Solarmount); MidNite Classic 150, MNBCM; 410 Ah @ 12 V (two Trojan L16RE-B); Magnum MS2812 2800 W pure sine inverter, ME-ARC50, BMK; Magnite E-Panel; power transfer cam switch; Dometic 459530 High Effiency Aircon

RossW

Quote from: aroxburgh on June 04, 2013, 07:39:03 PM
Thank you for your replay. However, since your Classic is connected to your LAN, it will be interacting with it, sending and receiving packets...DHCP, ARP, broadcast UDP packets, etc. Therefore, the question of what are the network settings of a Classic that does the random reset thing is quite relevant. When you get time to look, please post your Classic network settings; I would find it useful to compare my settings (which we included in an earlier post) with yours.

OK. Pretty sure I have stp enabled, and as the wireless link is bridged to the internal 5-port *SWITCH*, there should be minimal "other" traffic hitting it. Sure, broadcasts will, ARP requests (but not replies except to the classic itself) will, but there will be not much other unsolicited traffic hitting it.

We already know it's "chattering" DHCP far more than it should. There's something funny going on there, and I suppose it's possible that will have some effect. I have the classic set to obtain an address via DHCP, but my DHCP server always gives it the same IP. It's on a live IP, I don't run NAT, so I'd prefer not to print the addresses in a public forum ;)

Primary and secondary DNS are being assigned by the DHCPD, and point to a caching, recursive nameserver on the local network. Gateway is my local filtering box which in turn forwards packets in and out via the actual router.

Which other network details (specifically) do you want?
3600W on 6 tracking arrays.
7200W on 2 fixed array.
Midnite Classic 150
Outback Flexmax FM80
16 x LiFePO4 600AH cells
16 x LiFePO4 300AH cells
Selectronics SP-PRO 481 5kW inverter
Fronius 6kW AC coupled inverter
Home-brew 4-cyl propane powered 14kVa genset
2kW wind turbine

aroxburgh

Ross: That amount of detail is perfect for now. After I get my hands on an old "free-pile" Ethernet hub, I'll be back on track with wrieshark. However, I'm also hoping that boB will get himself a hub and do some more exhaustive packet sniffing as well.

The mystery in my setup is why I can make the reset by causing congestion on my (relatively slow, compared to my LAN) DSL modem.  As described earlier, my current favorite source of congestion (not by choice, due to a computer recovery) is Microsoft's software update service, which can put out very steady packet streams at a high rate.

The commonality between your system (which does not talk to mymidnite) and mine (which does) is the Classic's Ethernet hardware, Ethernet driver, and the Classic's firmware. Somewhere in there there has to be a serious "boog".  ;)

Ideally, I'd like to see the Ethernet hardware put through a standards compliance process, and run an exhaustive battery of timing, noise, and congestion stress tests using a data pattern generator and eye test scope.

Al
aj4rf
Surveyor SV-235 travel trailer with 1.2 kW PV (6 x Grape Solar GS-3-195, Unirac Solarmount); MidNite Classic 150, MNBCM; 410 Ah @ 12 V (two Trojan L16RE-B); Magnum MS2812 2800 W pure sine inverter, ME-ARC50, BMK; Magnite E-Panel; power transfer cam switch; Dometic 459530 High Effiency Aircon

RossW

Quote from: aroxburgh on June 04, 2013, 08:40:01 PM
Ross: That amount of detail is perfect for now. After I get my hands on an old "free-pile" Ethernet hub, I'll be back on track with wrieshark. However, I'm also hoping that boB will get himself a hub and do some more exhaustive packet sniffing as well.

Sniffing on my gateway box for the IP and/or MAC address of the midnight means I can pretty much see what's going on without needing a hub. (Just for reference - as hubs become harder to find, many managed switches have TAP or port mirroring functionality which allows you to achieve the same thing)

Quote
The mystery in my setup is why I can make the reset by causing congestion on my (relatively slow, compared to my LAN) DSL modem.

With zero insight to the software, my musing is unfounded. However, it's possible that there's *something* in the midnight that periodically goes off to do "stuff". Eg, NTP sync or DNS lookups for NTP.
If your outside link is congested, it could be that a blocking request could cause the WD to time out.
With my config (FreeBSD box with ipfw and dummynet) allows me to introduce delays, artificial bandwidth restrictions, "packet loss" etc to test such theories as may evolve.
3600W on 6 tracking arrays.
7200W on 2 fixed array.
Midnite Classic 150
Outback Flexmax FM80
16 x LiFePO4 600AH cells
16 x LiFePO4 300AH cells
Selectronics SP-PRO 481 5kW inverter
Fronius 6kW AC coupled inverter
Home-brew 4-cyl propane powered 14kVa genset
2kW wind turbine

dgd

Quote from: aroxburgh on June 04, 2013, 07:39:03 PM
Quote from: rossw on June 04, 2013, 04:41:20 PM
Well, yes, it is connected to the LAN, as it has always been - and I haven't said otherwise. What I did say is that it is not configured to send data to mymidnite.

.. much of the evidence suggests that things happening on the LAN connected to the Classic cause a condition, either in the Ethernet hardware, the Ethernet driver, or the Classic firmware code

Al,

I have a Classic not network connected but serial port connected to a rPi that collects data every few minutes via modbus.
The firmware is the latest. This Classic has reset twice in 10 days and once I was watching as the output wattage rose quickly
from about 120watts to 1400watts as a raincloud moved away and the cloud edge brightness rapidly increased.
So I have no doubts there may be network issues causing watchdog resets but there may be other reasons too.
I am waiting on boB's minor firmware release that sets modbus registers identifying who caused the reset.  I suspect when he
starts to get this feedback the debugging process will accelerate..  :P

dgd
Classic 250, 150,  20 140w, 6 250w PVs, 2Kw turbine, MN ac Clipper, Epanel/MNdc, Trace SW3024E (1997), Century 1050Ah 24V FLA (1999). Arduino power monitoring and web server.  Off grid since 4/2000
West Auckland, New Zealand

aroxburgh

#103
Ross:
Nice monitoring setup. I have used managed switches, but never for packet sniffing, preferring to just grab a hub off the shelf to do that. Last time I checked, suitable managed switches were still far more expensive than the EN104TP hub product that is still available from Netgear for about $110. Many of the old hubs sitting around are faster than the Netgear, which is limited to 10 Mbps.

dgd:
Yes, there is enough evidence to suggest multiple causes for the reset. Still, it may turn out that there is a single underlying cause. I'll be watching developments with great interest once boB gets the USB/serial debug enhanced. With new and improved tools in place, hopefully all of this discussion will soon become a distant memory.... I have other projects on hold, such as diverting excess power from my Classic 150 to heat my hot water...speaking of which, I wonder if anyone on the forum has experience with installing an aftermarket 12 V heater in a Suburban 120 VAC/propane 6 gallon water heater. Would I be better to just replace it with one of the triple-power-source variety?

Al
aj4rf
Surveyor SV-235 travel trailer with 1.2 kW PV (6 x Grape Solar GS-3-195, Unirac Solarmount); MidNite Classic 150, MNBCM; 410 Ah @ 12 V (two Trojan L16RE-B); Magnum MS2812 2800 W pure sine inverter, ME-ARC50, BMK; Magnite E-Panel; power transfer cam switch; Dometic 459530 High Effiency Aircon

dgd

Quote from: aroxburgh on June 05, 2013, 02:23:35 AM
.. I have other projects on hold, such as diverting excess power from my Classic 150 to heat my hot water...

I would suggest moving on with these other projects as they, the hot water one anyway, should not be effected by these random resets. I suppose the good news is that the Classic just keeps working despite these resets so its really only reporting that gets messed up and in the big RE scheme that's only a sideshow.

Dgd
Classic 250, 150,  20 140w, 6 250w PVs, 2Kw turbine, MN ac Clipper, Epanel/MNdc, Trace SW3024E (1997), Century 1050Ah 24V FLA (1999). Arduino power monitoring and web server.  Off grid since 4/2000
West Auckland, New Zealand