Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fast succession of 'all on' - 'all off' breaks system #57

Open
floesche opened this issue Mar 8, 2022 · 6 comments
Open

Fast succession of 'all on' - 'all off' breaks system #57

floesche opened this issue Mar 8, 2022 · 6 comments

Comments

@floesche
Copy link
Collaborator

floesche commented Mar 8, 2022

Sending the 'all on' (0x01 0xFF) and 'all off' (0x01 0x00) several times in fast succession leads the G4 system in an undefined state and slows down the execution of commands in the range of seconds (factor ~5000).

In a script I sent 15 times an 'all on' and then an 'all off' as fast as possible. The error happens each time I run the script, usually between the 4th and 10th iteration.

At some point (in the example below at the 8th "all off"), the response gets delayed significantly, in this example by around 15 seconds, typically between 6 and 18 seconds (median around 9.2s). Once this happened, all following 'all on' commands get delayed by an amount in the same range while 'all off' are executed immediately.

This is apparently an error on the Main Host side: If I send the commands without waiting for a response my script has long finished while the commands are still being executed (with the delay). This suggests, the commands are stuck somewhere on the Main Host input queue.

image

The output inside the main host window contains the following text:

03/08/2022 15:46:38.319 :  Root Directory Path - C:\Program Files (x86)\HHMI G4\Support files
03/08/2022 15:46:38.321 :  PC Name - reiser-ww10.hhmi.org, IP Address - 10.102.40.39, TCP Port - 62222
03/08/2022 15:46:38.398 :  TCP Connection Established
03/08/2022 15:46:38.404 :  All-On received
03/08/2022 15:46:38.416 :  All-Off received
03/08/2022 15:46:38.664 :  All-On received
03/08/2022 15:46:38.668 :  All-Off received
03/08/2022 15:46:38.681 :  All-On received
03/08/2022 15:46:38.690 :  All-Off received
03/08/2022 15:46:38.703 :  All-On received
03/08/2022 15:46:38.704 :  All-Off received
03/08/2022 15:46:38.718 :  All-On received
03/08/2022 15:46:38.720 :  All-Off received
03/08/2022 15:46:38.734 :  All-On received
03/08/2022 15:46:38.735 :  All-Off received
03/08/2022 15:46:38.737 :  All-On received
03/08/2022 15:46:38.737 :  All-Off received
03/08/2022 15:46:53.991 :  All-On received
03/08/2022 15:46:53.991 :  All-Off received
03/08/2022 15:47:03.219 :  All-On received
03/08/2022 15:47:03.219 :  All-Off received
03/08/2022 15:47:45.610 :  All-On received
03/08/2022 15:47:45.610 :  All-Off received
03/08/2022 15:47:54.841 :  All-On received
03/08/2022 15:47:54.841 :  All-Off received
03/08/2022 15:47:58.034 :  All-On received
03/08/2022 15:47:58.034 :  All-Off received
03/08/2022 15:48:01.235 :  All-On received
03/08/2022 15:48:01.235 :  All-Off received
03/08/2022 15:48:19.498 :  All-On received
03/08/2022 15:48:19.498 :  All-Off received
03/08/2022 15:48:31.741 :  All-On received
03/08/2022 15:48:31.741 :  All-Off received

@floesche
Copy link
Collaborator Author

floesche commented Mar 8, 2022

There is no problem when just sending 100s of 'all on' or 100s of 'all off' commands in fast succession.

@floesche
Copy link
Collaborator Author

floesche commented Mar 16, 2022

Adding a delay between sending an 'all on' and an 'all off' command decreases the chance of creating this problem. For example, adding an additional pause of 1ms triggers the problem only after around 15..20 iterations, increasing the pause to 1.5ms means the problem only appears reliably after 25..30 iterations and so on. The more I increase the delay, the less likely the system breaks, but I have seen it break for 2ms delay after 2, 7, 85, or iterations, for a 3ms delay after 89, 447, 856, for 4ms delay after 3147 or 7831 iterations.

The delays are introduced on the script side. According to the output of the Main Host, the actual commands are sent further apart. If the time log there is to be trusted, then most commands are received with more than 10ms delay, but whenever the script executes faster for whatever reason and the delay is less than 10ms, the Main Host goes into the unrecoverable state.

Possibly this is related to #48 and maybe more detailed logging, as suggested in #53 can help identifying the issue.

floesche added a commit to floesche/LED-Display_G4_Display-Tools that referenced this issue Mar 31, 2022
- add a test case for OnOff with delays for JaneliaSciComp#57
- add set active AI channels
- add start log
- add stop log
@floesche
Copy link
Collaborator Author

floesche commented Apr 1, 2022

(response via email 2022-04-01T09:24):

I think Issue #57 looks to be related to the issue #21 with the All-off command essentially being the same thing as a stop-display. I am modifying it to actually send a blank frame instead of a stop display command.

@floesche
Copy link
Collaborator Author

floesche commented Apr 1, 2022

Changing the all-off command to just turn off the LEDs would be good – my reading of your description in the TCP Commands.xlsx was, that the all-on, all-off, and fullscreen grayscale (0x02 0x05) would be faster versions of sending an actual frame in streaming mode because the communication overhead is smaller. Most likely these three commands would even be faster than setting a pattern by ID, because technically DMA access is not necessary. Going forward and after your change to the all off command, will that be a correct assumption?

@floesche
Copy link
Collaborator Author

floesche commented Apr 1, 2022

(response via email 2022-04-01T11:46):

Regarding the all-off command, the current software actually turns it off, but what I am suggesting is continuously streaming a blank frame instead so the transition between changing patterns is more seamless. The current all-off command is sending a stop-display so that why we are seeing similar issues to the quick start and stop display test you were doing in #21. There is less tcp overhead, but the DMA is still being used to steam the data. I am also removing the fullscreen grayscale command because that was used with the old arena where we could control different grayscale levels. Now we are only do 2 and 16.

@floesche
Copy link
Collaborator Author

floesche commented Apr 1, 2022

Sending a blank frame for all off sounds good, that is how I read the description in your document. I don't need to understand why a DMA read is necessary at that point since that would basically be reading a large array with known values (0xFF, 0x00 for on / off, 0xii for the intensity ii)… But even then the three commands should be as quick as sending a pattern ID.

I thought the Fullscreen Grayscale was a useful command - we could quickly test the different brightness levels or provide a specific illumination during the experiment. If the command is working I wouldn't remove it. I understand that this is only useful in the 16 grayscale mode, not the 2 grayscale mode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant