We are experiencing random errors on CB-series robots with Polyscope version 3.13.x and 3.14.x. The scenario is the following:
- the main program is performing some relatively heavy calculations of the next waypoints, while another thread is moving the robot
- the thread writes a log message before and after every movel and movej command (for debugging)
- sometimes (once or twice a day) there is one movej that starts but never ends (thread crashing?)
- there is absolutely no error message anywhere
- the robot movement stops, and the main thread detects a timeout after some time (but no error message from the system)
There are no race conditions, no critical sections either. This is a clean structure but still there is an error somewhere.
Are there any known restrictions in controlling the robot from a thread, or any changes in the latest firmware versions 3.13.x and 3.14.x that may cause this to happen?
Our code structure has not changed in the past 2 years and suddenly we are getting this error from more than one CB robot installations, however, we haven’t heard about the same issue happening on E-series.
@Ebbe do you have any suggestions?
I have no knowledge of an issue like this have been introduced. But if it is related to controller timing execution, thing might have changed marginally.
- Is the timeout detected by own logic or Polyscope?
- What kind of synchronization do you have between the different threads? And are other of the threads controlling the robot movements?
- Have you diagnosed if the thread is still running or not by using join in one well functioning thread?
One simple try could be to create a critical section for each of your movements.
Hi @Ebbe thanks for the quick reply.
The timeout is only detected by the program logic.
The main program is calculating waypoints, and the other thread is performing the movements. There is no other thread trying to control the robot at that time. The communication between the threads is only done through some variables: the main program sets a variable (move target position) and the thread sets another variable (movement status, integer) - a little bit more complex, but something like this.
I suspected it could be a deadlock in the system when using get_inverse_kin, is_within_safety_limits, or get_inverse_kin_has_solution in the main program that is doing the calculations while a thread is moving the robot. We use these functions in our calculations and probably the move() commands use them internally as well.
The error occurs on robots in production, so we haven’t been able to diagnose the thread is still running or not, and have not used the join command either. I’ll try to create a sample program on a test robot to reproduce the error and upload it as soon as possible.