"No Controller" error on real robot

We found that in case of “bad programming” on the UR side the UR program throw one of the following exceptions:

  • XMLRPC: Failed with exception: Unable to transport XML to server and get XML response back. libcurl failed to execute the HTTP POST transaction, explaining: Recv failure: Connection reset by peer
  • XMLRPC: Failed with exception: Unable to transport XML to server and get XML response back. libcurl failed to execute the HTTP POST transaction, explaining: Empty reply from server

One way to reproduce this error is to play a program that makes calls to the XMLRPC server “as fast as it can”, for example:

  • XMLRPC call
  • if(condition == TRUE) Do something

Imagine a program like this played with option “program loops forever” and with the “condition” always set to FALSE.

We have a customer that is experiencing this kind of problem every couple of hours: we recommended him to change the program or at least to insert a “wait” instruction between the XMLRPC node and the “if” node; we found that with a wait of 200 ms the frequency of the error is reduced from an error every couple of hours to an error every 12 hours but the problem is still present and the customer struggles to believe that it is a system problem and not a device problem.

What is the xmlrpc server that you use?
“Connection reset by peer” sounds like the problem on the server side.

Our daemon is written in python 2.7, we import xmlrpclib and from SimpleXMLPRCServer we import SimpleXMLPRCServer.
We register some functions with the “register_function” method and then we call the “serve_forever” method

Thanks, I’ll try to reproduce this issue.

Hello guys!

We’re currently developing a URCap and we’re facing this problem, too.
After a couple of hours we’re getting a “XMLRPC: Failed with exception: Unable to transport XML to server and get XML response back. libcurl failed to execute the HTTP POST transaction, explaining: Recv failure: Connection reset by peer” error.

The daemon side is written in java - out of desperation we developed a little python proxy which proxies the XMLRPC request but the error still remains.

Best regards,

Andi

Any news about your tests? Did you reproduce the issue?

Yes, I was able to reproduce the issue, and it’s queued to be fixed.
It is however difficult to reproduce the issue consistently. I’ll keep you posted if any workaround is possible.

I am very happy to see, UR is going to solve this issue. Many thanks to all of this community for supporting this thread.

Here is some useful information coming from our tests.

Issue description:
Calling the XMLRPC function from URScript client randomly invokes an error. Robot program is stopped with the message “XMLRPC Failed with exception: Unable to transport XML to server and get XML response back. libcurl failed to execute the HTTP POST trasaction, explaining Recv failure Connectionn reset by peer”.

Versions affected:
Probably all Polyscope versions supporting XMRPC. Tested on 3.4., 3.8, 3.9, 3.10, 3.12, 5.1, 5.2, 5.3, 5.6 (Release notes 5.6 contains information about this issue is solved - unfortunately not yet). Other versions mentioned in this thread.

Observations:

  • XMLRPC server implemented and tested in C++ AND in Python. The issue can be invoked on both servers.

  • Using the CB-Series makes the probability of invoking this issue LOWER (slower timing)

  • If the called XMLRPC function takes some time for processing, the probability of invoking this issue is HIGHER.

  • If the URScript client calls XMLRPC function from more than one thread, the probability of invoking this issue is HIGHER.

  • A combination of both aspects demonstrates the simple example located HERE. This issue is usually invoked within two days.

  • Unfortunately, one thread is enough to invoke this issue. It only takes more time (up to 12 days).

  • There is no functional workaround except for avoiding XMLRPC use.

Conclusion:
Using XMLRPC interface in URScript is not a reliable solution and should not be used in the real application. Java supports exception construction, so it is possible to handle this issue from Java. It would be great to expand URScript language with exception construction as well as resolve the issue.

My private opinion:
Stopping the robot program because of losing response from ONE (of millions) XMLRPC call is really not an industry approach. All industrial communication protocols (filedbuses) are ignoring broken packets and continue the processing of the next message.

I am having this issue constantly. I can replicate this issue by opening multiple sockets (4) in a short time span.

If I have a RTDE, a XMLRPC, a Socket (30004) sending an error pop up, and a socket sending some commands to a different device, my CB series (SV 3.10) will get the NO CONTROLLER Fault. I did notice that running with my GUI cause the issue to be more likely to happen.

Is there a temporary work around while is getting fixed?

Hi vjanskosky,

Thank you for the detailed information. This is a fundamental functionality and we take it very seriously. I tried your example and noticed that the CPU load was ~160%.

For investigation I modified your Example. It have now made over 160.000.000 successive calls without a single failure. So I think there is more into the issue than just a fixed probability of failure.

We are still accepting that this is a serious issue and are looking into finding the root cause and a fix. In the mean time the recommendation is to reduce the load on the system.

A post was split to a new topic: Controller shuts down while urcap development

Hello @Ebbe,
During the development of a script, I ran into “Disconnected from controller” issue due to a bug in my URScript. I simplified the script to only regenerate the issue. The bug is “nested critical sections” because of a function return in the middle of a critical section.

  1. When the script is all wrapped in a function: the issue of “Disconnected from controller” happens shortly after running the program.
  2. When the script runs with (Zero indentation): The issue of “nested critical sections” is discovered before the script runs, I guess by your syntax checker.

Here is the first case:

def main_function():
    global some_condition = True
    global another_condition = True

    def some_function():
        enter_critical
        if another_condition:
            textmsg("nested critical because of return: ", 1)
            # exit_critical
            return False
        end
        exit_critical
    end

    thread compute_thread():
        while some_condition:
            some_function()
            sync()
        end
    end

    myThread = run compute_thread()
    while some_condition:
        enter_critical
        textmsg("main : ", 1)
        exit_critical
        sync()
    end
    join myThread
end
main_function()

and that is the second case:

global some_condition = True
global another_condition = True

def some_function():
    enter_critical
    if another_condition:
        textmsg("nested critical because of return: ", 1)
        # exit_critical
        return False
    end
    exit_critical
end

thread compute_thread():
    while some_condition:
        some_function()
        sync()
    end
end

myThread = run compute_thread()
while some_condition:
    enter_critical
    textmsg("main : ", 1)
    exit_critical
    sync()
end
join myThread

Perhaps, some URCaps have such scripts!

@r4p, @raul.castillo.cruces, @tal, @ad1, @Ebbe
We have found solution for python server, and probably also for C++ http server.

There was a configuration issue in our examples where python server was closing connection after each request. If program (or urcap) was making a lot of requests, then linux os was running out of available ports. That in turn stopped program due to failing rpc call.
Problem is more apparent in eSeries, due to higher control loop frequency.

C++ example on the website is not affected, because it uses pstream protocol, instead of http.

C++ example in urcap-sdk is using default configuration that closes connection after 30 requests. It wasn’t causing problems in R&D conditions, but in theory it can also exhaust system resources, so configuration was updated.

Customers, and UR+ developers have to fix it on their side.

Python server SimpleXMLRPCServer handler should have following property set:
server.RequestHandlerClass.protocol_version = “HTTP/1.1”

There is minor caveat when using python server this way - only one connection can be made since by default SimpleXMLRPCServer is single threaded. To make it work with multiple simultaneous connections (i.e. when java part of urcap is doing RPC requests at the same time as program running on controller) then server has to be configured with multi threading. Following code can be used in place of standard SimpleXMLRPCServer:
class MultithreadedSimpleXMLRPCServer(ThreadingMixIn, SimpleXMLRPCServer):
pass

server = MultithreadedSimpleXMLRPCServer((“”, server_port))

C++ Abyss server should add following property in server constructor:
keepaliveMaxConn(UINT_MAX)

Customers using C++ or Python pstream servers should not be affected.

Example code on the support site was updated, and urcap sdk will be updated in one of future releases.
https://www.universal-robots.com/how-tos-and-faqs/how-to/ur-how-tos/xml-rpc-communication-16326/

2 Likes

Dear @mmi
I’ve found something strange: I am testing the daemon on the simulator 5.5.1 and if I add the line server.RequestHandlerClass.protocol_version = “HTTP/1.1” in the SimpleXMLRPCServer handler I am no longer able to call the daemon functions through the UR script.
To be more clear: I am still able to call the daemon functions registered in the XMLRPC server through Java calls, but when comes the time of the UR script, the program node that make the call hangs there forever.
If I remove the line server.RequestHandlerClass.protocol_version = “HTTP/1.1” everything works fine.
Am I missing something? Do I need to switch to a newer version of Polyscope to implement the bugfix?

I’ve edited above post. When you’re connecting from multiple clients to rpc server, then you need multi threaded implementation.
For python new MultithreadedSimpleXMLRPCServer class should be created in place of SimpleXMLRPCServer (look at example above)

Hello everyone,

I have a simular problem but with another background. I developed a URCap which repeatedly asks the acual status of the DigitalIOs when the programnodeview is open and sends out some Strings to a PLC when the view is opened or a button is pressed. I also use an CPP-Daemon to generate some parts of the script which is also transfered via TCP-Sockets.
The problem is, that when I add an URCap program node to the Robot Programm and got to installation --> Safety --> Robot limits and edit the safety settings, everytime I press confirm settings the “No Controller” error occurs and only a restart of the robot can fix this issue.
Does anyone have an idea where this issue comes from? (SW Versions tested 5.8.0, 5.8.2, 5.9.4)

Thanks for your help in advance.

Hello everyone,

I had this error type “No Controller” when I exceeded the maximum allowed number of threads (which is 50) in a program.
This happens by using to many Until-conditions in motion commands.

This was on a e-Series UR16 running with URSoftware 5.9.4.1031232

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.