The blackbox project

Started by zoneblue, September 15, 2013, 08:48:04 PM

Previous topic - Next topic

zoneblue

Update, i added the extra switch, reduced the cron interval to 2 mins and took off the extra network load, and it hasnt lost a connect since. So one of, or all three of those factors are involved.

load average is now 0.13.  Ill let it run for the day, then tomorow undo each of the three changes, one at a time.
6x300W CSUN, ground mount, CL150Lite, 2V/400AhToyo AGM,  Outback VFX3024E, Steca Solarix PL1100
http://www.zoneblue.org/cms/page.php?view=off-grid-solar

RossW

Quote from: zoneblue on September 19, 2013, 03:29:55 PM
Ross, do you have a list of return values for newmodbus?

It always returns either a 0, or a 1.
It will return a 1 for anything I considered to be a "bus-related error".
Failed to connect, failed to resolve a host, data length error, modbus command response error, write error, etc.

It will return a 0 any other time - displaying a help message, normal results, required parameter missing etc.


Quote
Ross, your pattern of non problems matches when i replicate this whole setup on a bigger computer. I had assumed yours was also a "bigger computer" but if its 1.2G maybe not much so, what is it, how many cores etc.

FreeBSD 8.1-RELEASE #0: Sat Sep 18 17:20:47 EST 2010
CPU: VIA Esther processor 1200MHz (1200.01-MHz 686-class CPU)
  Origin = "CentaurHauls"  Id = 0x6a9  Family = 6  Model = a  Stepping = 9
  Features=0xa7c9baff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,APIC,SEP,MTRR,PGE,CMOV,PAT,CLFLUSH,ACPI,MMX,FXSR,SSE,SSE2,TM,PBE>
  Features2=0x181<SSE3,EST,TM2>
  VIA Padlock Features=0x3fcc<RNG,AES,AES-CTR,SHA1,SHA256,RSA>
real memory  = 1073741824 (1024 MB)

It's a single-core device. Basically a carputer that I've got running as my home gateway/firewall/file-store/etc

Quote
Also, are you running it the same way as Tom, newmodbus -wi ? Youre interval is longer if 5mins.

# grep class /etc/crontab
# Get data from classic
*/5   *   *   *   *   rossw   ~rossw/classic.sh

and classic.sh contains:


#! /bin/sh

# Address is captured from broadcasts by the classic.
# Run this:
# tcpdump -nl udp and port 4626|awk '{sub("\.[0-9]*$","",$3); print $3 > "classic.addr"; close("classic.addr")}' &

./newmodbus -w `cat classic.addr` 4115-4118/10 4119 4121-4122/10 4125 4132-4134/10 4120\>8 4275 4131  > classicdata.tmp
if [ -s classicdata.tmp ]; then mv classicdata.tmp classicdata; fi

wdt=`grep "^4131 " classicdata | /usr/local/bin/gawk '{print and($2, 128)}'`
if [ $wdt -eq 128 ]; then
        mail -s "Midnight watchdog reset $$" (page-me-email-address) < classicdata
        ./newmodbus  `cat classic.addr` 4131=12800
fi

3600W on 6 tracking arrays.
7200W on 2 fixed array.
Midnite Classic 150
Outback Flexmax FM80
16 x LiFePO4 600AH cells
16 x LiFePO4 300AH cells
Selectronics SP-PRO 481 5kW inverter
Fronius 6kW AC coupled inverter
Home-brew 4-cyl propane powered 14kVa genset
2kW wind turbine

zoneblue

Ross, interesting, so basically the same way im calling it.  Interesting also on the baord. VIA x86. I have a really old one of those laying around here someplace a 400Mhz eden model. Might try it. At this point trying to narrow the factors involved. This might tell us something about x86 v arm.

Tom, heres something for you to try . Increase the cron interval from one minute to two minutes. Ive found thats made a huge difference. Only one dropped connection yesterday.  Be interested to know what happens on the rPi. Also watch out for any single refused connections, as opposed to the ethernet lockup.
6x300W CSUN, ground mount, CL150Lite, 2V/400AhToyo AGM,  Outback VFX3024E, Steca Solarix PL1100
http://www.zoneblue.org/cms/page.php?view=off-grid-solar

TomW

Some possibly pertinent system details:

Info for both units.

Web Access is disabled (mymidnite)
Both use port :502

Classic 150V (rev 4)   
Firmware:     
- Classic Rev: 1401
MNGP rev 1370 04/08/2013
Solar is #1871
Wind is #7805

I have some more data on the communication lockouts I am getting:

These are all the Solar Classic locking out communication.
##########
2013-09-17 03:14:

Solar stopped logging. No solar in, battery at 25.5V or so.
Middle of the night so no heavy network activity. Reset ~6:30AM with power cycle.

2013-09-17 unknown time.
no data gathered.

2013-09-18 16:34:

Lockout during low power FLOAT/EQ no direct sun, some bursts from wind.
Volts about 28.8 / 29

2013-09-20 21:08
No solar in batteries ~24.5

#######

All of these required a power cycle to get the Classic back online. Apparently still functions fine as a controller.

Has anyone tried using a different port? I have not. Might be worth a try. Sure can't hurt.

Both Classic 150's are on the same 4 port switch which is directly connected to a Dlink router and wireless AP. The Solar Classic has a short cat5 to the switch. The Wind Classic has a longer cat5 cable to the switch.

My gut feeling here is that there is an issue with my Solar Classic just not playing well with the logging App because the Wind Classic is fine. The recent data I collected while limited seems to disprove one of my older theories that it was related to high power throughput somehow. Most occurred during low or no power times.

I think Ryan is going to swap a Classic he has been logging from for mine to see if he can reproduce the fault. 

Just an update.



Do NOT mistake me for any kind of "expert".

( ͡° ͜ʖ ͡°)


24 Trina 310 watt modules, SMA SunnyBoy 7.7 KW Grid Tie inverter.

I thought that they were angels, but much to my surprise, We climbed aboard their starship and headed for the skies

TomW

Ok, I switched ports on both of them to :555. Edited and recompiled newmodbus for port 555.

Note, I could not do this with the Local App. Well, there is an option to do it but it just does not take. I did it on the MNGP "Misc" menu.

See if it changes the behaviour.

Just another change to see what happens.

Tom
Do NOT mistake me for any kind of "expert".

( ͡° ͜ʖ ͡°)


24 Trina 310 watt modules, SMA SunnyBoy 7.7 KW Grid Tie inverter.

I thought that they were angels, but much to my surprise, We climbed aboard their starship and headed for the skies

boB


Tom, I think I ~may~ have a fix for Classics that stop communicating and you have to reset the
Classic to get it back up and online.  ~maybe~...  Hopefully !!

There are a couple of little things to add regarding the Whizbang Junior and I would like to
also include those.  This should be just a couple of days.  I would be very interested to
see if this beta code fixes your lock-up problem.

boB
K7IQ 🌛  He/She/Me

TomW

Quote from: boB on September 21, 2013, 03:18:42 PM
  I would be very interested to
see if this beta code fixes your lock-up problem.
boB

boB;

Glad to give it a whirl. I just today reset both to use port 555 as it was one thing nobody mentioned and easy to try. So far so good after a few hours.

Running fine so far but typically I get a reset every 24 hours or so.

Send me a link or attachment and I will stick it on the problem Classic.

Tom
Do NOT mistake me for any kind of "expert".

( ͡° ͜ʖ ͡°)


24 Trina 310 watt modules, SMA SunnyBoy 7.7 KW Grid Tie inverter.

I thought that they were angels, but much to my surprise, We climbed aboard their starship and headed for the skies

boB

Quote from: TomW on September 21, 2013, 03:28:27 PM
Quote from: boB on September 21, 2013, 03:18:42 PM
  I would be very interested to
see if this beta code fixes your lock-up problem.
boB

boB;

Glad to give it a whirl. I just today reset both to use port 555 as it was one thing nobody mentioned and easy to try. So far so good after a few hours.

Running fine so far but typically I get a reset every 24 hours or so.

Send me a link or attachment and I will stick it on the problem Classic.

Tom


OK, so not fully knowing everything I understand about all of this, it MAY be also that
this fix MAY have something to do with the resets.  I just now realized that possibility
but am yet not quite assured of it.

I know you believe you understand what you think I said but I'm not sure you realize
that what you heard is not necessarily what I meant.  Or something like that...

boB

K7IQ 🌛  He/She/Me

TomW

Quote from: boB on September 21, 2013, 03:35:17 PM

OK, so not fully knowing everything I understand about all of this, it MAY be also that
this fix MAY have something to do with the resets.  I just now realized that possibility
but am yet not quite assured of it.

I know you believe you understand what you think I said but I'm not sure you realize
that what you heard is not necessarily what I meant.  Or something like that...

boB

Absitively, posolutely!

Never meant to imply I understood what you were mumbling about.  :o

Tom

Do NOT mistake me for any kind of "expert".

( ͡° ͜ʖ ͡°)


24 Trina 310 watt modules, SMA SunnyBoy 7.7 KW Grid Tie inverter.

I thought that they were angels, but much to my surprise, We climbed aboard their starship and headed for the skies

zoneblue

#24
You know how i (unscientifically) changed three things at once. Will ive now isolated those by  progressively undoing the changes. Increasing the sample period from one min to 2 mins was the thing that helped. Today there have been no lockouts, and the refused connections down to less than 2 per day. 

Bob, good news on the firmware. Ideally i want to get the sample rate down to about 15 seconds, while still allowing the local app in. Also looking forward to the WB jr. So Blackbox can tell the complete story of production and consumption, exciting.

Tom: which leads to think that its the close proximty of your two connection requests that is possibly your issue. have you tried reversing the order that you call them to see if the problem child swaps or not?
6x300W CSUN, ground mount, CL150Lite, 2V/400AhToyo AGM,  Outback VFX3024E, Steca Solarix PL1100
http://www.zoneblue.org/cms/page.php?view=off-grid-solar

TomW

Quote from: zoneblue on September 22, 2013, 01:53:41 AM

Tom: which leads to think that its the close proximty of your two connection requests that is possibly your issue. have you tried reversing the order that you call them to see if the problem child swaps or not?

blue;

Switching ports to 555 has gotten me a good 24 + hour run of no lockouts.

I will let this run awhile and if / when I get another lock out I will try switching the order or inserting a sleep delay in the script. One thing at a time. Currently it polls the wind unit first and that one has no issues (lately) so there could be something in the order / proximity of the poll requests.

If it turns out to be the port it will be a palm to forehead moment of why didn't I try that simple change before? If it keeps chugging along with the port change I will swap it back to an rPi and check it on that.

Just stumbling  along early on a Sunday.

Tom


Do NOT mistake me for any kind of "expert".

( ͡° ͜ʖ ͡°)


24 Trina 310 watt modules, SMA SunnyBoy 7.7 KW Grid Tie inverter.

I thought that they were angels, but much to my surprise, We climbed aboard their starship and headed for the skies

TomW

Well, got a lock out about noon so swapping the port didn't help.

Switched the order in which they run and left the port alone. See where it goes. If it changes which one that locks me out I will add a sleep delay in the script.

Just an update in case anyone is watching.

Tom
Do NOT mistake me for any kind of "expert".

( ͡° ͜ʖ ͡°)


24 Trina 310 watt modules, SMA SunnyBoy 7.7 KW Grid Tie inverter.

I thought that they were angels, but much to my surprise, We climbed aboard their starship and headed for the skies

zoneblue

Having slept on this theres one other possibly significant difference betw our 'works' cases and 'not works': the arm boards typically have flash mass storage. Writes to common kinds of flash are slow.

I think my old 400Mhz VIA board will be a good test. From memory it has a 2.5 inch hard drive in it, with debian etch or thereabouts. Ive also ordered a faster uSD card to try. Lastly cubie has a sata port, and that coulld be tested.

Alas its monday again, at least here.
6x300W CSUN, ground mount, CL150Lite, 2V/400AhToyo AGM,  Outback VFX3024E, Steca Solarix PL1100
http://www.zoneblue.org/cms/page.php?view=off-grid-solar

TomW

zoneblue;

I had a lock out today on the Solar (same Classic as before I switched polling order), so it is apparently NOT the order it polls them.

At a loss what to try next?

Just FYI.

Tom
Do NOT mistake me for any kind of "expert".

( ͡° ͜ʖ ͡°)


24 Trina 310 watt modules, SMA SunnyBoy 7.7 KW Grid Tie inverter.

I thought that they were angels, but much to my surprise, We climbed aboard their starship and headed for the skies

zoneblue

You try 2 minute intervals? Guess we wait for the next firmware.

Cubie is still going, no lockouts for several days now. And its rendering graphs, doing database querys with several 10s of K records in the table, serving pages to sometimes as many as four computers.

Not bad for a wee lil board drawing little more than 1W !

I spent a while on the outback forum yesterday seeing what those guys are doing in terms of monitoring. That forum is pretty much dominated by a certain pair of commercial operators. Not having any outback gear im gona need some help to write a black box module for the mate 3. Its probably a ways off yet, as the focus is to get a 0.3 zip release out sometime. 
6x300W CSUN, ground mount, CL150Lite, 2V/400AhToyo AGM,  Outback VFX3024E, Steca Solarix PL1100
http://www.zoneblue.org/cms/page.php?view=off-grid-solar