Local App / MyMidnite

Started by Tons001, January 13, 2014, 10:09:22 AM

Previous topic - Next topic

Tons001

After being okay for months, my classic keeps losing the network which keeps the local app and mymidnite from functioning. Oddly enough it appears to happening more and more frequently in the last 3-4 weeks. The only fix is to power down the classic and restart which dumps all of the settings from 12:00am on. I tried switching it from DHCP to static and the problem got no better. Any thoughts?
8 Sopray SR-90 panels, MN Classic 150 w/ WBjr, Sunxtender 12v/305ah, Trimetric 2025a, Morningstar SureSine Inverters & RelayDriver, IOTA DLS-55

TomW

tons;

First question will be.. What firmware version are you using?

Second will be a description of your network setup, hubs,router, etc..

I would suggest you set the Classic for a static IP, not DHCP and some folks need to set the secondary DNS (DNS Override) to 0.0.0.0. That is where I would start.

Good luck with it.

Tom
Do NOT mistake me for any kind of "expert".

( ͡° ͜ʖ ͡°)


24 Trina 310 watt modules, SMA SunnyBoy 7.7 KW Grid Tie inverter.

I thought that they were angels, but much to my surprise, We climbed aboard their starship and headed for the skies

atop8918

Have you changed anything on your network recently?

Also important: are you using any services like Dropbox or a bit torrent client? Those seem to play havoc with networks and the Classic doesn't play well with them at the moment.


Tons001

Thanks to both of you.

Tom ... I am running firmware 1609. Classic is hardwired to a Netgear gigabit switch which is connected to a AT&T uverse modem. The Classic was currently in static with the DNS override the last time it happened.

Atop ... Nothing changed recently. I do use DropBox frequently. Odd that would mess with the Classic. I don't use any bit torrent clients and never have.

I guess I will just keep resetting it when it happens. Funny enough, it always happens when I am out of town which leaves me frustrated because I can't check that everything is cool. Maybe my wife is pretending to be an IT wizard and messing with stuff while I am out of town......  ;D
8 Sopray SR-90 panels, MN Classic 150 w/ WBjr, Sunxtender 12v/305ah, Trimetric 2025a, Morningstar SureSine Inverters & RelayDriver, IOTA DLS-55

atop8918

Well, it's not so much that Dropbox messes with the classic -- it messes with the network. My wife ran it on her macbook and it kept crashing our WRT router because of the amount of traffic. The Classic has a pretty dainty TCP/IP stack so it can be very sensitive to lots of traffic on the network. If you are suddenly transferring bigger files then it might be the problem. Just to test, maybe disable DropBox for an hour or two to see if it affects the Classic on MyMidNite at all.
Another solution might be to stick some network appliance between the Classic and the rest of your network. A $10 switch might do the trick or a $25 wireless bridge?
Still a 3rd solution might be for me to fix the damn networking stack...it's still on the list!

zoneblue

#5
Quote from: atop8918 on January 14, 2014, 10:23:10 AM
Well, it's not so much that Dropbox messes with the classic -- it messes with the network. My wife ran it on her macbook and it kept crashing our WRT router because of the amount of traffic.

Well weve all had sh*ty routers at one point or another, that get royally confused when the number of threads gets much over a couple hundred. But unless the OPs router is actually crashing, im not sure this is his issue.

QuoteThe Classic has a pretty dainty TCP/IP stack so it can be very sensitive to lots of traffic on the network. If you are suddenly transferring bigger files then it might be the problem. ...... Another solution might be to stick some network appliance between the Classic and the rest of your network. A $10 switch might do the trick or a $25 wireless bridge?

With the switched ports in the router, the classic isnt actually going to see any of this traffic, and adding a swithc inst going to help. Ive tried that, and the shorter cables you often recomend, neither help, because neither addresses the problem.

Quote
Still a 3rd solution might be for me to fix the damn networking stack...it's still on the list!

NOW we are talking!

Andrew, its got to be something to do with new conenctions timing, becasue you can run it at 1 sec intervals with a single connection no problems at all, but the stack will every time shut up shop within 14 hours if you open and close a connection once per minute. Its not like the bug is difficult to reproduce, right?
6x300W CSUN, ground mount, CL150Lite, 2V/400AhToyo AGM,  Outback VFX3024E, Steca Solarix PL1100
http://www.zoneblue.org/cms/page.php?view=off-grid-solar

dgd

Quote from: zoneblue on January 14, 2014, 05:50:39 PM
Quote from: atop8918 on January 14, 2014, 10:23:10 AM
Still a 3rd solution might be for me to fix the damn networking stack...it's still on the list!

NOW we are talking!

Andrew, its got to be something to do with new conenctions timing, becasue you can run it at 1 sec intervals with a single connection no problems at all, but the stack will every time shut up shop within 14 hours if you open and close a connection once per minute. Its not like the bug is difficult to reproduce, right?

Seems you have hit the nail square on the head here.
The Classic's TCP management has always been the blocker that stopped any meaningful black box development or user developed application programs that connect to the Classic via TCP.

I never thought TCP stack management was akin to rocket science so I have always been confused why it was such an issue for the Classic.

dgd
Classic 250, 150,  20 140w, 6 250w PVs, 2Kw turbine, MN ac Clipper, Epanel/MNdc, Trace SW3024E (1997), Century 1050Ah 24V FLA (1999). Arduino power monitoring and web server.  Off grid since 4/2000
West Auckland, New Zealand

RossW

Quote from: dgd on January 15, 2014, 01:30:06 AM
I never thought TCP stack management was akin to rocket science so I have always been confused why it was such an issue for the Classic.

I have to stand up for the midnite guys here - the stack isn't "rocket science" to those of us who have heaps of resources, but the processor in the classic is (as I understand it) a fairly modest device. And trying to coax all the goodies out of it that everybody wants, in the microscopic times we're prepared to let it have, is NOT actually trivial, as anyone who has tried to shoehorn complex realtime code into a teeny tiny device will attest.

Is there room for improvement? Sure.
Could it be done better? Little doubt.
Is it trivial? Not at all!

I hope the information provided by everyone (zoneblue, dgd, myself, tomw and others) helps Andrew get on top of it, but we must be realistic and not expect a full-blown stack worthy of a server-class FreeBSD box!
3600W on 6 tracking arrays.
7200W on 2 fixed array.
Midnite Classic 150
Outback Flexmax FM80
16 x LiFePO4 600AH cells
16 x LiFePO4 300AH cells
Selectronics SP-PRO 481 5kW inverter
Fronius 6kW AC coupled inverter
Home-brew 4-cyl propane powered 14kVa genset
2kW wind turbine

atop8918

#8
Thanks, Ross.
Everyone is absolutely right here -- as customers you are absolutely right to expect a device like this to work as advertised and be reliable. As developers we only have a fixed amount of time and if we work on every single feature that customers want immediately and without pause we would soon be out of business, so we must balance features, bug fixes, and new products.
The stack works very well within certain parameters. For the most part the Local App works very well with the Classic in 95% of use cases which amounts to most customers' networks and requirements.

I have done extensive testing on the stack with a small number of different network topologies with a variety of standard network devices. I have a small Java program, which if I can dig up I'll release to the community here, which submits the Classic to repeated open-poll-close requests via a number of different threads. I run this program for a day against the Classic as part of each release of the network firmware so to answer your question, Zoneblue, the bug seems VERY difficult to reproduce -- I have yet to see a network problem with the Classic. I'm not claiming there is no problem, I'm saying that I cannot reproduce many of the issues I've heard complaints about. There are thousands of different network topologies any number of which produce different results. boB has a Verizon DSL router that used to crash the stack. I had the same Verizon router -- stack was fine every time.

The WRT54g Router I mentioned that crashed due to Dropbox was (and still is) one of the most popular routers to date. Whether or not it was a crap router is irrelevant, it was one of the most deployed units out there and you could argue that it was Dropbox's fault for not testing their software with it or Linksys's fault for not verifying their router with high-volume, constant traffic. Who's at fault? Does it matter since at the end of the day the software crashes the router so is the software more important to you or the router? I stopped using Dropbox, some other folks would switch out their router.

Putting a switch between the router and the Classic would definitely reduce traffic to the Classic -- many routers opt for a simple hub on their LAN connectors as it simplifies and reduces cost of the device. So what appears on one port appears on all ports. A switch's sole purpose is to learn your network topology at least at the L2 layer so that traffic is routed to where it was intended to go as opposed to tying up the whole network.

As Ross and Dgd mentioned, TCP/IP networking isn't rocket science sure, but that doesn't mean it's simple. I am working on a number of different projects at the moment and have already done some further development on the network stack. Once I have a chunk of time to dedicate to networking I will be sure to address as many issues as I can. To say that the Classic networking is broken and useless though is a bit of an overstatement. It works very well most of the time which one could also say of almost any software or hardware out there.

zoneblue

I am genuinely wanting to help, so if you need help reproducing the bug. PM me for an email.

6x300W CSUN, ground mount, CL150Lite, 2V/400AhToyo AGM,  Outback VFX3024E, Steca Solarix PL1100
http://www.zoneblue.org/cms/page.php?view=off-grid-solar

dgd

Quote from: RossW on January 15, 2014, 01:41:25 AM

I have to stand up for the midnite guys here - the stack isn't "rocket science" to those of us who have heaps of resources, but the processor in the classic is (as I understand it) a fairly modest device. And trying to coax all the goodies out of it that everybody wants, in the microscopic times we're prepared to let it have, is NOT actually trivial, as anyone who has tried to shoehorn complex realtime code into a teeny tiny device will attest.

Well, I didn't think it was a fairly modest device. Last time I closely inspected a Classic 150 there was an LPC236 series cpu in there, can't remember if it was the 4 or 6 version but in any case it was well endowed with IO channels, AD conversion, voice recognition,ethernet, USB and flash memory. Again I seem to remember there were ethernet stack coding examples included in the cpu tech data and the architecture had a dedicated ethernet bus/comms channel. Memory addressing out to 4M but the Classic didn't seem to have any extra memory installed so there was just the meagre amount on the cpu  - about 256k ram and 512k flash..
Anyway, it will be nice if Andrew gets the time to investigate the stack issues - and resolve them
dgd
Classic 250, 150,  20 140w, 6 250w PVs, 2Kw turbine, MN ac Clipper, Epanel/MNdc, Trace SW3024E (1997), Century 1050Ah 24V FLA (1999). Arduino power monitoring and web server.  Off grid since 4/2000
West Auckland, New Zealand

atop8918

Yes it's the 2366 but your specs are incorrect -- there is no external memory interface so the addressing out to 4M is incorrect (incidentally I think you meant 4Gig), there is only 32k RAM available and an additional 16k RAM block for the Ethernet. The TCP/IP stack uses only the 16kB area for its purposes leaving the Classic application the full 64k for its own uses. There is 256k flash. The Network is also only allocated one 32k sector of Flash so we have room for the Application and the bootloader and various production values.

It is also true that there are a lot of examples incorporating TCP/IP stacks for this chip. These are EXAMPLES. They are not production code ready. Please feel free to take the example code and program it into a 2366 and run it in your own application. You may notice the warnings and notices that the example code is example code and not production code. You may notice that the simple example code works for the simple examples. Now try to build a DHCP engine around it. Oh, there's a bug that popped up. Ah look at that, the example code didn't take into account the large size of the BOOTP/DHCP packet. Well, that's okay, let's go back and increase the size of the buffers. Huh, look at that. The Ethernet DMA engine can't use packets of that size. We'll have to tweak it. Oh no, the buffers have to be power-of-two, that means I can have 4k, or 8k, but then I can only use one buffer. Now, should I use buffer sharing or should I use multiple small buffers to handle the data. Buffer sharing won't work with this implementation since we don't have full access to the processor. So we'll have to write a little memory allocation library. Okay, no that's written and tested. Now the buffering works. Now the DHCP engine works. That's great! Oh no, a beta tester says the DHCP is not working on their router. Hmm. Looks like the router doesn't follow the DHCP RFCs. Lets debug to see how it deviates. That's odd, they aren't using the lease timers. Why wouldn't they use the lease timers, that's the sole method for determining how to timeout on the address lease? There is no default supplied by the RFC. I suppose we'll put in one day default. Okay that is now working on that customer's router. Huh, another tester says the DHCP engine isn't working on his network. Huh, this router doesn't follow the spec either. Ah, no, this one's my fault: I'm using the broadcast address to get my address, I should be using the previously assigned one. Okay, no problem, I'll add some way for the Application to save our last IP address and then tell it to use when we restart. Oh, there's no EEPROM driver. Okay, let's write an EEPROM driver. okay, that's tested and works. Now the app saves the IP and writes it back to us. Ok, that's working. Huh, now when I have two classics that boot at the same time they swap addresses, or one doesn't get one. That's interesting. Okay, I'll take a look at what's happening on the scope. Reading through 50k lines of TCP/IP packets....I'd better write a script to help me parse this stuff. Ok the scipt is done, let's see what we have....aha! The BOOTP header starts with the same XID for every classic. That's the problem. Okay, we'll use a unique number for the XID for each unit. Let's use the serial number or the device ID or something. Aha! That was it it's working again. Hmm. Now sometimes the packets are getting mixed if I have a device behind this router. I wonder what that could be. Let's run it through the scope again. Wait a minute. The IP header is totally wrong. OH NO! The example code is not handling the IP flags properly. I guess I'll go and read the RFC's for the IP header flags.  Now, let's go back and rewrite that portion of the example code to handle the header flags correctly. Hmm, that's working better but still not right. AHA! We're using the same number for out port allocations on each unit. It should be a random number otherwise things get mixed up. Well there's no source of randomness on the unit. I guess I'll write a LFSR-based generator. Okay, that's done, lets seed it with something "random". Well those numbers aren't very random. Let's try something else. Aha, that's working pretty well. These calculations show that we should be statistically good with up to ~1000 units on the same network before there is a collision now. Okay, DHCP is done and working. Let's test for reliability. I'll install WANem in between the device and the network. No, no, no this isn't working at all. Any disruption to the packet flow totally breaks the stack. Let's debug. Aha, the EXAMPLE CODE isn't handling out-of-order packets. We'll have to add that in. Hmm, need some more buffering in here, the EXAMPLE CODE just uses one buffer, I'll have change it to use the buffer allocator instead. Well, a couple of packets are getting through but it looks like the EXAMPLE CODE doesn't handle repeat packets very well. Ah, this is why, it's not freeing the buffer for the original packet. Hmmm, that's a doozy, we'll have to re-write this part of the EXAMPLE CODE. Okay, now we've got the address allocation working. Let's add a listening socket to the code. Argh, the EXAMPLE CODE doesn't do listening sockets. I'll have to write that in. Okay, got some code here along with the unit tests. Looks like it's working. I can open this socket and connect from a PC then write data back to the PC as received. That looks pretty good. We're running low on resources here though. 16k isn't much memory when we've got all this stuff going on and I still have to do the mymidnite and security engine. Ok, we'll leave a single connection socket for the moment and see how our resources look when we're finished with the rest. Now let's integrate the stack with the rest of the application. Huh, the stack isn't working now. Ah the application timing isn't quite correct. I'll add some timers to the stack and call it more often from the application. Hmmm, it's working now but not as well as it was before. Ah, the linker is putting it in the wrong place and we're sharing the stack. Okay, let's go in to the linker script and make sure we're using the right sector. Hmmm, there's still something scribbling over my memory ... aha! The EXAMPLE CODE doesn't put the Ethernet RAM into the Ethernet RAM space. Let's go back and make sure the linker script puts Ethernet RAM in the Ethernet block. Hmmm, that means I'm going to have to wrestle with the compiler a bit to make it put Ethernet RAM into a different segment. Let's bring up the manual and see how to do that. Well, seems straightforward. Ah, that's working. Aha, scribbler is gone, timing is right, looks like it's working. Oh no! When I go into <solar mode> it stops communicating on this router. Ah, the timing is changing. We'll have to add the network callback into this module as well. Ah, that's working now. It's time to start the DNS module now, now the mymidnite connection, now the advertise connection....
Keep in mind that all the while and at the same time there are also internal projects going on, production support, MyMidNite development and testing, Local Application development and bug fixes (and features), new products popping up (MNLP), and a host of other things I'm not allowed to talk about yet :)! I'm not trying to be facetious (well maybe a little) but honestly, there is a lot more going on here than just "fixing bugs" in between sips of Margaritas on my deck chair overlooking the ocean! Hmmm. I need a deck chair. And a Margarita. And an ocean....

dgd

#12
Andrew,
Thanks for the detailed reply, its been a while since I looked up the cpu specs and the development programming examples  but I do remember thinking it was a good choice of ARM cpu for the Classic even thought it was limited memory size. I must have misunderstood the spec or looked at the wrong cpu but I thought there was the ability to address external memory up to 4gig. There must be no address and data buses, even multiplexed, as pin connections to the cpu.  :(

Maybe if MN gets to another hardware revision of the Classic it will include another cpu and resources to have a multiple connection tcp stack and perhaps a web server and data pages..

Despite all the workload you have it was good to read you plan to get time to revisit the Classic ethernet code.
dgd



Classic 250, 150,  20 140w, 6 250w PVs, 2Kw turbine, MN ac Clipper, Epanel/MNdc, Trace SW3024E (1997), Century 1050Ah 24V FLA (1999). Arduino power monitoring and web server.  Off grid since 4/2000
West Auckland, New Zealand

Tons001

Looks like I need to try and add that small ethernet switch between the 16 port switch and the classic. MyMidnite stopped working on Wednesday and now the local app can't find the classic. Kind of a bummer.
8 Sopray SR-90 panels, MN Classic 150 w/ WBjr, Sunxtender 12v/305ah, Trimetric 2025a, Morningstar SureSine Inverters & RelayDriver, IOTA DLS-55

ClassicCrazy

Quote from: Tons001 on January 19, 2014, 11:11:26 AM
Looks like I need to try and add that small ethernet switch between the 16 port switch and the classic. MyMidnite stopped working on Wednesday and now the local app can't find the classic. Kind of a bummer.

I was just having all kinds of weird stuff going on with my Local App . What I found was that I had been playing around with a Raspberry Pi plugged into the router . Everything worked but then I  switched some cables around in the router . When I looked at the DHCP list in the router the Classic wasn't showing up . Seemed like it had the same address as my laptop was given. I switched the Cat 5 cables around to different socket and all of a sudden the Classic showed up . And that is after I had restarted everything. So as said above it seems like the routers can get kind of confused in DHCP mode. Now if I can figure out how I blew my Raspberry Pi ! But that is another story.
system 1
Classic 150 , 5s3p  Kyocera 135watt , 12s Soneil 2v 540amp lead crystal for 24v pack , Outback 3524 inverter
system 2
 5s 135w Kyocero , 3s3p 270w Kyocera  to Classic 150 ,   8s Kyocera 225w to Hawkes Bay Jakiper 48v 15kwh LiFePO4 , Outback VFX 3648 inverter
system 3
KID / Brat portable