Strange reset during the day

Started by Alain Boulet, December 26, 2012, 09:15:51 PM

Previous topic - Next topic

RossW

Quote from: aroxburgh on May 22, 2013, 06:20:23 PM

It is great to fix problems, but work-arounds can  be effective too, at least as a stop-gap measure.

Therefore, can you give us a list of the offending "certain routers", as well as a list of the proven good models, if any?
Also, does the data you've received from Classic users indicate that there is any dependency on DHCP vs static IP, or any other router or Classic settings?

This way we should be able to determine if the problem only occurs with certain routers/settings, or actually does occur with all routers.

I started working with Bob, Ryan and Andrew on this a week back - but after an initial flurry of activity they've been off on other work for a bit.

My own observations are that in a purely DHCP the classic is constantly requesting and re-requesting an address. Where I was giving it an address from a pool, its address was bouncing around in many cases every few seconds, sometimes it'd last a few minutes.

I still have the classic set for dhcp, but I'm now always assigning it the same IP based on its MAC address, so I can "use" it but still monitor the dhcp problem. It's FREQUENT... - here's the last 20-odd minutes worth:

Thu May 23 08:21:23
Thu May 23 08:21:23
Thu May 23 08:23:56
Thu May 23 08:23:57
Thu May 23 08:26:28
Thu May 23 08:26:29
Thu May 23 08:29:01
Thu May 23 08:29:02
Thu May 23 08:31:34
Thu May 23 08:31:34
Thu May 23 08:34:07
Thu May 23 08:34:07
Thu May 23 08:36:39
Thu May 23 08:36:39
Thu May 23 08:39:12
Thu May 23 08:39:12
Thu May 23 08:41:44
Thu May 23 08:41:46
Thu May 23 08:41:46
Thu May 23 08:44:18
Thu May 23 08:44:18
Thu May 23 08:46:50
Thu May 23 08:46:51


Andrew thinks it may be the DHCPD sending odd values for option58 and/or option59. However, I'm using isc-dhcpd (arguably the "reference" dhcpd) which doesn't (in my version anyway) have any way to set or override these options for IPV4.

Once Andrew gets back on-deck, I think it's our intention to capture some packets and delve into what's being asked, offered, accepted and rejected. My lease times are certainly not a few seconds :)

I'm not (noticing) any rebooting issues with my classic, even though it's constantly doing DHCP requests, but I suppose I should go check! What's the best way to determine it??


% ./classicmodbus `cat classic.addr` 4275 4120\>8
ID Solar2
4275 = 1 (0x1)
4120 = 4


This suggests to me that it's currently in BulkMppt mode, and last "resting" state was "Anti-Click. Not enough power available (Wake Up)" (likely, since sun only came up 90 mins ago, roughly)
3600W on 6 tracking arrays.
7200W on 2 fixed array.
Midnite Classic 150
Outback Flexmax FM80
16 x LiFePO4 600AH cells
16 x LiFePO4 300AH cells
Selectronics SP-PRO 481 5kW inverter
Fronius 6kW AC coupled inverter
Home-brew 4-cyl propane powered 14kVa genset
2kW wind turbine

dgd

This strange reset occurs when Classics are configured with static IP address.
Is this debugging that's leaning towards DHCP culpability  mean the dhcp processing in the classic is somehow contributing to the random reset issue? - despite the use of static ip addressing?

Dgd
Classic 250, 150,  20 140w, 6 250w PVs, 2Kw turbine, MN ac Clipper, Epanel/MNdc, Trace SW3024E (1997), Century 1050Ah 24V FLA (1999). Arduino power monitoring and web server.  Off grid since 4/2000
West Auckland, New Zealand

TomW

Quote from: dgd on May 22, 2013, 09:54:41 PM
This strange reset occurs when Classics are configured with static IP address.
Is this debugging that's leaning towards DHCP culpability  mean the dhcp processing in the classic is somehow contributing to the random reset issue? - despite the use of static ip addressing?

Dgd

My Classics both have had static IP from the start. The resets occur on my solar Classic but I cannot tell with the Wind Classic yet because it is not working just sitting there powered up waiting for a cable upgrade from the turbine.

Just FYI.

Tom.
Do NOT mistake me for any kind of "expert".

( ͡° ͜ʖ ͡°)


24 Trina 310 watt modules, SMA SunnyBoy 7.7 KW Grid Tie inverter.

I thought that they were angels, but much to my surprise, We climbed aboard their starship and headed for the skies

RossW

Quote from: dgd on May 22, 2013, 09:54:41 PM
This strange reset occurs when Classics are configured with static IP address.
Is this debugging that's leaning towards DHCP culpability  mean the dhcp processing in the classic is somehow contributing to the random reset issue? - despite the use of static ip addressing?

OK, I missed that the random resets were only with static IPs.
Mine is still using DHCP, but always assigned the same IP. Same effect as static, but different mechanism - and perhaps helpful distinction.

Graphing my classic "State" and "Reason for resting" seems to be a quick way to spot when (or if) it happens?
3600W on 6 tracking arrays.
7200W on 2 fixed array.
Midnite Classic 150
Outback Flexmax FM80
16 x LiFePO4 600AH cells
16 x LiFePO4 300AH cells
Selectronics SP-PRO 481 5kW inverter
Fronius 6kW AC coupled inverter
Home-brew 4-cyl propane powered 14kVa genset
2kW wind turbine

boB

Quote from: rossw on May 23, 2013, 03:52:31 AM
Quote from: dgd on May 22, 2013, 09:54:41 PM
This strange reset occurs when Classics are configured with static IP address.
Is this debugging that's leaning towards DHCP culpability  mean the dhcp processing in the classic is somehow contributing to the random reset issue? - despite the use of static ip addressing?

OK, I missed that the random resets were only with static IPs.

Tom's problem is a bit different than what I have seen and heard.
My only DHCP problems have been with DHCP enabled, not static.

How can a DHCP server or router give a Classic a new IP address in static mode ?
I don't see how that is possible ?

Now, resetting in static mode is a different story and I have heard of that.
But TomW's issue  quite fit the exact same scenario as I have seen with "certain"
routers and DHCP IP address giver-outers.

boB
K7IQ 🌛  He/She/Me

dgd

There appears two different issues here. Dhcp processing and its problems that may or may not be causing resets AND a random reset that probably has nothing to do with networking.
Again I noticed the random reset on my static ip classic connected to PVs. Again it occurred with fast changing input amps, after a couple of 2 amps to 18 amps at 55 to 75 volts swings it reset (probably just coincidence) No local app connection active and not talking to my midnite.
Is there any way to set a modbus register value that indicates the calling routine that starts the WD timer running?  This would be useful tech feedback for MN when WD resets occur

Dgd
Classic 250, 150,  20 140w, 6 250w PVs, 2Kw turbine, MN ac Clipper, Epanel/MNdc, Trace SW3024E (1997), Century 1050Ah 24V FLA (1999). Arduino power monitoring and web server.  Off grid since 4/2000
West Auckland, New Zealand

boB

Quote from: dgd on May 23, 2013, 07:42:20 PM
There appears two different issues here. Dhcp processing and its problems that may or may not be causing resets AND a random reset that probably has nothing to do with networking.
Again I noticed the random reset on my static ip classic connected to PVs. Again it occurred with fast changing input amps, after a couple of 2 amps to 18 amps at 55 to 75 volts swings it reset (probably just coincidence) No local app connection active and not talking to my midnite.
Is there any way to set a modbus register value that indicates the calling routine that starts the WD timer running?  This would be useful tech feedback for MN when WD resets occur

Dgd

Very interesting observation !

Yes, there is a register but I will have to release an interim version that just has those registers as we talked
about a bit earlier in this thread.  I'll just have to do that this weekend.

boB


K7IQ 🌛  He/She/Me

RossW

Just come back inside and notice the first time I've seen mine do a reset - anything I can check from it that will help, Bob??

Graphs that MAY show what was going on: one particularly MAY be helpful:

This one needs some explaining:  the blue is the actual number of average watts AS MEASURED by a precision pyranometer located about 20 feet from the arrays.

The red line is the peak power measured in the sample time (5 minutes)

The green line is the calculated power each sample time, using the calculated position of the sun for this time and day of the year - so it's basically the "clear-sky power".

The black line is a calculated "possible power" available for a tracking array perpendicular to the suns rays, at the current measured actual solar energy. (Pyranometers are cosine corrected to measure power on a horizontal plane)

What this picture shows is that the fog finally broke up and solar energy rapidly spiked at about the time the classic appears to have rebooted.


This one is the actual amps measured from each of the arrays


Power from the classic, as reported by the classic.


Daily amphours/10, and daily kilowatt-hours as reported by the classic.


PV voltages as measured by the classic and my old standalone logging.


Battery voltage as reported by the classic.


PV in and Battery output current as reported by the classic.


Temperatures as reported by the classic


Classic current state (MSB of register 4120) and Reason for Resting (register 4275)
3600W on 6 tracking arrays.
7200W on 2 fixed array.
Midnite Classic 150
Outback Flexmax FM80
16 x LiFePO4 600AH cells
16 x LiFePO4 300AH cells
Selectronics SP-PRO 481 5kW inverter
Fronius 6kW AC coupled inverter
Home-brew 4-cyl propane powered 14kVa genset
2kW wind turbine

boB


I definitely see the classic reset its kW-hours and amp-hours and the other jumps at
around 13:00 in the afternoon.  Yes, looks like the sun finally came out around 13:00

On the bottom graph, MSB of stage/stage is 4 which is StatDispBulkMppt.

Also looks like the Classic never goes to Resting except when it reset the
one time.

so, the clock is set and auto-restart is not enabled in tweaks ?

Definitely weird.

Lots of good information there, Ross !
boB
K7IQ 🌛  He/She/Me

RossW

Quote from: boB on May 26, 2013, 05:08:58 AM

so, the clock is set and auto-restart is not enabled in tweaks ?

Definitely weird.


# ./classicmodbus -t `cat classic.addr`               
ID Solar2
ClassicTime 19:19:13 26/05/2013
SystemTime 19:19:45 26/05/2013


Clock is definately "close enough", and I confirm auto-restart is definitely *off*
3600W on 6 tracking arrays.
7200W on 2 fixed array.
Midnite Classic 150
Outback Flexmax FM80
16 x LiFePO4 600AH cells
16 x LiFePO4 300AH cells
Selectronics SP-PRO 481 5kW inverter
Fronius 6kW AC coupled inverter
Home-brew 4-cyl propane powered 14kVa genset
2kW wind turbine

aroxburgh

#85
Hi Bob et al:

I have been noticing some very interesting clues to do with the Classic resets, and what I'm seeing explains a lot  of the reports I'm reading on the forum.

I have been restoring my PC due to an impending hard drive failure. As a result, I have been doing a lot of software downloads, many of which (e.g., Microsoft Office with many hundreds of MB) max-out my (admittedly rather puny) 1.3 Mb/s Internet bandwidth (see attached pictures, "First" and "Second", taken at two widely-spaced moments during continued network congestion when my Classic was "resetting like crazy"...see the total bytes count, all at 1.3 Mb/s...i.e., hours of congestion on my Internet connection).

In what follows I was not using the Local App, which I have not yet re-installed on my (restored) PC.

This morning while doing the downloads, I started noticing resets occurring just at short intervals, often only a minute or two between. By the time a realized what was happening, the downloads were mostly done, so starting my stopwatch I only got three successive lap times (approximate time between resets), as follows (mm:ss):

00:00    <---stopwatch started just after a reset
10:13    <--- 20 packets and then a reset
02:24    <---3 or 4 packets and then a reset
01.19    <---2 packets and then a reset

When the "saturation" of my Internet connection relaxed to more usual levels, the resets stopped happening.

Later-on, maxing out the connection caused resets to resume.

So I got to thinking: I think know what's going on...the Classic is dropping packets in its connection to Ryan's mymidnite data collection project...I'll bet they aren't using a handshake to protect the data!

So, I made Wireshark my next re-install, and low and behold, on 192.168.1.111 (the fixed IP address on my Classic on
my local network), I could see UDP packets.

Now, as is well known, TCP uses a handshake to protect the data transfer (an automatic retransmission takes place in the DataLink and Network layers whenever there is a checksum error)  which is why it works so well for http. UDP, however, does not. UDP can still be protected by an application layer error check, but typically that is not done, since there is no timely retransmission capability provided in the lower network layers. Therefore, TCP is the protocol of choice for highly reliable, mission critical, data transfer. UDP often sees use for audio and video streaming, because of lower data integrity requirements, since it is relatively easy to mask (not the same as correction) data errors.

I'll bet also, that the Classic's Ethernet interface and your s/w diver are not faring well when the wire is near saturated, perhaps the Ethernet interface gives bogus status under conditions of heavy network congestion. If the Classic software continues to shove out UDP packets regardless of congestion, then there may be a lot of data loss in the logging software that Ryan is using, which will also have its consequences (although that is unlikely to have any influence on the Classic resetting behavior).

From what I've read on the forum, it sounds entirely possible that the Ethernet interface status you are reading in the Classic network driver could be different depending on local network equipment (as I've said previously, my Classic Ethernet data passes through Netgear Pro switch ---> Ethernet port on Netgear Universal Dual Band WiFi Range Extender (WN2500RP) ---> 108 Mb/s 802.11n connection to Linksys WRT610n router  --> Ethernet connection to a D-Link
router ---> Ethernet connection to a ZyXEL 1.3 Mb/s DSL interface service...as you can see, there is a fair collection of diverse gear between my Classic and the Internet, with the bottleneck being the DSL connection.

Please seriously consider changing from UDP to TCP in the future, although you'll then have to create a mechanism for dealing with possible out-of-order packets. There would seem to be plenty of time for TCP retransmission caused by adverse network conditions, since after-all, your packets from the Classic are 30 seconds apart...

Here's one of my Classic's 60 byte packets sent to MidNite, of which Wireshark informs me, 6 bytes or 48 bits are actual payload:

0000  ff ff ff ff ff ff 60 1d  0f 00 1b 59 08 00 45 00   ......`. ...Y..E.
0010  00 22 02 31 00 00 40 11  b6 83 c0 a8 01 6f ff ff   .".1..@. .....o..
0020  ff ff 06 9a 12 12 00 0e  6c f5 c0 a8 01 6f f6 01   ........ l....o..
0030  00 00 00 00 00 00 00 00  00 00 00 00


Broken down, approximately as follows, by Wireshark:

ff ff ff ff ff ff       Destination Address = Broadcast
60 1d  0f 00 1b 59      Source Address = MidNite Solar Classic
08 00          Packet type contained in Ethernet frame = IP
45         Header length (20 bytes)
00            Explicit congestion notification: Not-ECT (Not ECN-Capable Transport) <<<<<<<???
00 22          Total length 34
02 31         ID
00 00          Fragment offset = 0
40         Time to live = 64
11         Protocol = UDP 
b6 83          Header checksum (correct)
c0 a8 01 6f         Source: 192.168.1.111
ff ff ff ff         Destination address
06 9a          Source port 1687 (also see 06 97)
12 12          Destination port 4626
00 0e          Length 14
6c f5          Bad checksum <<<<<<<<<<<<<<<<<<<<<<<<<<<<<???
c0 a8 01 6f         Source address: 192.168.1.111
c0 a8 01 6f f6 01     Data payload (6 bytes)
00 00 00 00 00 00 00 00  00 00 00 00  Padding 


I'm not sure about the significance of the two items marked with <<<<<<<<<<<<<<, although the effect of that would be moot if the packet doesn't actually get sent out due to congestion, and gets overwritten in the Classic transmit buffer...a situation possibly exacerbated by faulty network hardware or drivers in the Classic. Clearly there should be some sort of congestion monitoring and control, irrespective of whether the protocol is UDP or TCP.

From what I've seen in my own Classic, and from reading other' posts, the random resets have nothing to do with the amount of sunlight, whether the IP address is static or using DHCP. It seems that they have everything to do with network congestion (even though I don't fully understand the local ~100 Mbps fast network flow control) caused by
the 1.3 Mbps limitation of the DHL modem).

-Al Roxburgh
aj4rf

Surveyor SV-235 travel trailer with 1.2 kW PV (6 x Grape Solar GS-3-195, Unirac Solarmount); MidNite Classic 150, MNBCM; 410 Ah @ 12 V (two Trojan L16RE-B); Magnum MS2812 2800 W pure sine inverter, ME-ARC50, BMK; Magnite E-Panel; power transfer cam switch; Dometic 459530 High Effiency Aircon

boB

Al,  ain't Wireshark great ??!!!!!!?

The UDP packets you are seeing are Classic advertising packets.
Those are what is used to let the local app (and router ?) know where the
Classic is so it can find it.

The advertise broadcast should happen around every 10 seconds on port 0x1212

"ff ff ff ff ff ff       Destination Address = Broadcast"
and
"12 12          Destination port 4626"

Not sure why the bad checksum though.

Everything else is TCP/IP I'm pretty sure.

I know that UDP is used for streaming etc. where you don't care so much.

The Local App only reads the Classic once every 2 seconds these days
and when calling into My Midnite it is even less often.

I wonder though if there might be something to UDP and TCP/IP packets
lining up and "walking" on top of each other ?  Pretty far fetched but ?

I tend to agree that power and sunlight do not have much, if anything,
to do with the resets although I can't rule anything out yet.

Please keep up the good detective work though !  It can only help.

boB
K7IQ 🌛  He/She/Me

aroxburgh

#87
Bob:

Interesting! The fact that I'm seeing the broadcast UDP packets but not seeing the TCP packets proves that WS is not putting the Netgear switch shared by my PC and the Classic into "hub" mode...I no longer have the old 100 Mb/s Ethernet hub that I formerly used for such purposes. Next time I'm in the "Freecycle" store I'll see if I can pick one up, hopefully at a bargain price. Last time I was doing this kind of packet sniffing seriously, years ago, was in the Wi-Fi space, where the actual space (pun intended) necessarily acts like a hub, since there can only be one packet "in the air" so to speak, at a time (Linksys USB Wi-Fi dongle and Backtrack Linux...great package that comes with WS and lots of other network tools).

Back to my posted observations...TCP packets...hmmm...that explains why I was seeing so little packet-to-packet change in the 6-byte payload of the UDP packets. Clearly, with a hub I would have seen a lot more...

Anyhow, it does look like a case of extreme congestion in the DSL connection is having a weird effect on the TCP packets (that I cannot see in my current setup). Nevertheless, next time I notice a reset, I'll make sure I notice what is happening on the Network Meter gadget's traffic history graph...

Of course, irrespective of congestion on the network, the main problem is the way that the Classic responds to it (and to any other relevant events) that is the basic problem. Check everything including any hardware errata that the chip manufacturer may have published. Talk to them too, if possible, and to the chip designers, in particular. Also have your Ethernet drivers reviewed by the same people.... 

Thank you for the further input on the resetting problem. From my side, as an end-user, I'll continue to investigate. (I'll also promise to keep my naive optimism better under control!  ;) )

Al
aj4rf   
Surveyor SV-235 travel trailer with 1.2 kW PV (6 x Grape Solar GS-3-195, Unirac Solarmount); MidNite Classic 150, MNBCM; 410 Ah @ 12 V (two Trojan L16RE-B); Magnum MS2812 2800 W pure sine inverter, ME-ARC50, BMK; Magnite E-Panel; power transfer cam switch; Dometic 459530 High Effiency Aircon

boB

Al, I also noticed that when using wireshark but with the Local App, I could NOT see the outgoing packets TO the Classic but only the return packets FROM the Classic.  I'm not sure why that is ?  i.e.  most all of the destination packets were for my laptop. I even tried changing over to wired Ethernet.  No change.
WS does know the return packets are modbus and disassembles those packets correctly.

I bet that the same thing happens with My Midnite packets.  Let us know if you figure out how to make wireshark see the outgoing packets.
I have looked online for an answer but not intensely.

Thanks for the CSI work !
boB
K7IQ 🌛  He/She/Me

aroxburgh

#89
Quote from: boB on June 02, 2013, 02:38:41 PM
Let us know if you figure out how to make wireshark see the outgoing packets.
I have looked online for an answer but not intensely.

boB:

I looked on amazon.com and found that there is still at least one hub on the market!!!
Netgear EN104TP 4-Port 10 Mbps Ethernet Hub RJ-45 with Uplink Button
Price:   $108.00

If you hook up your  PC running wireshark, your Classic, and a wire to your Internet router/switch, you should be able to see all packets coming and going from the Classic. This approach keeps things simple, without any of that "promiscuous mode" hokey-pokey required if a managed switch is used instead of a hub. Should not matter that it does not support 100 Mbps.

This item is too spendy for me at present, so I'd encourage you to get one and try it. I'll continue to look for a hub from  "Freecycle" or similar.

It is definitely a hub, and indeed costs several the price of a similar-looking Netgear 4-port switch (high price = low sales volume). Although people sometimes confuse "hub" with "switch", even more so nowadays since switches now dominate, whether we need the additional performance offered by a switch, or not. Some of the amazon reviewers confirm it is an actual hub:

"I needed another Ethernet Hub for a software development project I'm working on. Hubs are difficult to find these days as low cost network switches are a dime a dozen, and provide isolation between network ports. But for network hardware development you do not want this isolation when you are trying to monitor network traffic between devices. That is where an Ethernet Hub comes in. Yes you can use "managed" network switches but those are more expensive than the garden variety network switch. This Hub was available at a fair price, and works fine. If I need another one I would purchase again."

and

"When you need a hub and not a switch, this is what you need. It is perfect for running Wireshark. At this price I should have bought an extra one. The one I received works great and looks brand new with no scratches or scuffs."

-Al
aj4rf
Surveyor SV-235 travel trailer with 1.2 kW PV (6 x Grape Solar GS-3-195, Unirac Solarmount); MidNite Classic 150, MNBCM; 410 Ah @ 12 V (two Trojan L16RE-B); Magnum MS2812 2800 W pure sine inverter, ME-ARC50, BMK; Magnite E-Panel; power transfer cam switch; Dometic 459530 High Effiency Aircon