Struggling with UR Robot Faults and Protective Stops

I keep seeing the same issue come up with Universal Robots setups, so I wanted to sanity-check with people who work with these day to day.

When a UR robot goes into a protective stop / fault that’s intermittent, how do you usually figure out what led up to it?

For example:
Something runs fine for hours or days. Then suddenly faults. Logs are there, but it’s hard to reconstruct the sequence of robot state, IO, forces, program context, etc. right before the stop

In practice, do you:
Scrape logs manually?
Add ad-hoc script logging?
Reproduce by trial-and-error?
Just wait for it to happen again?

I’m especially curious:
What’s the most annoying fault you’ve had to debug recently?
How much time does this kind of issue usually cost you (or your customer)?
I am just genuinely trying to understand how people deal with this today and whether I’m missing something obvious.