As far as I understand it, the flow is something like this:
There should be a cache/buffer on the server side as well. When the ATEM has a network hiccup, the stream to your users should be sent from that buffer. When the ATEM then has enough bandwidth again, it will send more of its cache to the server, which then re-fills the server side cache / buffer.
If the serverside cache is emptied completely, the stream will pause.
If it fills up, I would expect it to either drop frames from the back or the front of the buffer, which would create jumps and skips in the stream.
If there is practically no buffer, I'd expect to see your issue, where all the frames the server receives are pushed out immediately. But this is just speculation