Depends on how much flexibility you need.
If you know well in advance all of the graphics you'll need, then one method would be to render out the background animation as an image sequence with alpha channel (pre-multiplied) via After Effects. Then import that in to one of your ATEM media pools. From there you would create a bunch of text only lower thirds in something like Photoshop and export those to the other media pool. When you need to bring up a graphic, simply select both media player 1 and media player 2 as keys and it should composite them on-air.
The problem with this technique is that you can't build the text on with the animation nor can you easily change graphics on the fly.
For a more professional setup you'll want to run your graphics through a real character generator. I'm a pretty big fan of CasparCG and what they are doing. It isn't for beginners though. You'll need a Windows server with an SDI card (or two) in there linked to your ATEM burning up at minimum 2 inputs. Then you will need to export the animation to a flash template and build a CasparCG template file. But once this is done you can not only animate the text on-screen with the graphics, but you can also change the text on the fly, drive the text from external sources and even have multiple elements on the screen at the same time!
With a CasparCG system taking care of your graphics, you could even use your empty media players on the ATEM to have that nifty animated wife get generated (sting) and have nifty transitions between critical scenes.
All depends on how much time you're willing to invest in the solution, how much gear you already have and how many inputs on your ATEM you are willing to burn up.