Are you describing something like this?
You'll likely want to animate text. To do this, you can use your video footage and use the text as a mask to reveal the footage. However, since text first enters from off-screen, you can't simply use the footage as a background. The text needs to reveal the footage as it slides in. Therefore, you'll need a separate background for when the footage is not visible, i.e., when the text is off-screen.
You should also animate the footage's opacity, so it appears as the text appears and remains visible as the text zooms out. And probably you want to be optimizing composition for decent performance. For a bit more polish look you can activate motion blur for the text mask.
I can share my node setup, but how useful it is depends on your familiarity with Fusion so you can adapt it to your needs.