RCModelReviews wrote:How will you get the Z dimension from two two-dimensional footage?
The same way you get it for camera. Technically there is no difference between cam and object track, it is only a question of reference coordinate system. Either cam moves and object/world is stationary or object moves and cam is stationary. One is the inverse transformation of the other. In practice you would still get moving camera when doing it with cam tracker, so that object is stationary and cam moves. It should be possible to invert the movement and apply the transformation to an object too, needs some matrices juggling.
All constraints still apply, you need enough parallax in (object) motion, object should be relatively big in frame (otherwise errors are magnified) etc. Imagine it as a camera track done on limited scene, where only the part covered with object is what you can track from the entire environment. Helps wrap head around why it is harder to do.
Depending on what you want to do, cam tracking object might be enough. For projections etc it makes little difference what exactly is moving. When you want to render stuff with lighting etc, then moving vs static makes a visual difference as locations of lights (when stationary) either move in relation to object or they don't. If entire world is parented to camera then again it makes no difference.
Compound of cam and object track is where both move but these are solved by solving the cam first (from static environment), then solving object using camera path as constraint. Not sure how this should be approached in Fusion.