Controller goes from running -> stopped, and requires UR driver restart

I have a robot setup with two UR5e arms running with a single ROS1 UR driver. The (repetitive) task being performed by the robot involves a set of 25 distinct actions between the two arms. Things work well for multiple iterations; at some point scaled_pos_joint_trajectory_controllers moves to stopped state, and cannot be restarted. I have tried to restart/respawn the controllers via controller_manager but none of these succeed; only a driver restart gets the controllers working again. Is there a way to restart controllers when they go to stopped state, that doesn’t involve driver restart?

$ rosrun controller_manager controller_manager list __ns:=left_arm
‘joint_state_controller’ - ‘hardware_interface::JointStateInterface’ ( running )
‘scaled_pos_joint_traj_controller’ - ‘scaled_controllers::ScaledPositionJointInterface’ ( stopped )
‘force_torque_sensor_controller’ - ‘hardware_interface::ForceTorqueSensorInterface’ ( running )

$ rosrun controller_manager controller_manager list __ns:=right_arm
‘joint_state_controller’ - ‘hardware_interface::JointStateInterface’ ( running )
‘scaled_pos_joint_traj_controller’ - ‘scaled_controllers::ScaledPositionJointInterface’ ( stopped )
‘force_torque_sensor_controller’ - ‘hardware_interface::ForceTorqueSensorInterface’ ( running )

$ rosrun controller_manager controller_manager start scaled_pos_joint_traj_controller __ns:=left_arm
Error when starting [‘scaled_pos_joint_traj_controller’] and stopping
$ rosrun controller_manager controller_manager start scaled_pos_joint_traj_controller __ns:=right_arm
Error when starting [‘scaled_pos_joint_traj_controller’] and stopping

$ rosrun controller_manager controller_manager unload scaled_pos_joint_traj_controller __ns:=left_arm
Unloaded ‘scaled_pos_joint_traj_controller’ successfully
$ rosrun controller_manager controller_manager spawn scaled_pos_joint_traj_controller __ns:=left_arm
Loaded ‘scaled_pos_joint_traj_controller’
Error when starting [‘scaled_pos_joint_traj_controller’] and stopping

Looking at the driver logs, the controller stops are correlated with socket send failure on reverse interface. I however don’t run the driver in a VM (unlike others who have reported this reverse interface issue); it runs in a docker container on Ubuntu 20.04. I have tried tuning TCP buffers, but this doesn’t fix the socket send failure.

Any help with resolving the controller restart issue and/or reverse interface issue is much appreciated. Please let me know if I need to provide additional information to help get to the bottom of this.

It is advised to use a PREEMPT_RT or lowlatency kernel in conjunction with the driver. See Universal_Robots_ROS_Driver/real_time_benchmarking.md at 60f08359f6484d1671e12bab6e01fc49eb0fadc8 · UniversalRobots/Universal_Robots_ROS_Driver · GitHub for details.