so the best and cheap way is put a camera behind the ATEM.
get the audio from panel and plug into the camera then get the audio via HDMi.
Correct?
The cheapest way is to input the audio at a camera, yes. But of course the camera is in front of the ATEM, and not behind it.
When video is going trough there is some delay at the camera. With professional camera's that can be genlocked there will be much less delay compared with semi or non-professional camera's that have no genlock.
The ATEM will add a delay upto one frame to that also, because it needs to use it's frame synchronizers to get the video sources in sync with the internal clock.
When you input the audio at the final stage of the ATEM, the video will have had a much longer path with multiple frames delay, while the audio had a minimum delay and a short path. So the audio will be ahead of the video that way and needs to be delayed to get in sync.
When you input the audio into one of your camera's and put that trough the internal audio mixer to the program out, the audio travels the same path as the video, also gets frame synchronized, and will get in sync on the program out. It can differ within one frame of latency from other camera's than the one you did input it to, but that will be close enough to look lip-sync anyway. At least when you use the same camera's. Using different semi or non-pro camera's like this could cause a problem where every camera has another amount of frames delay.