Probably because video resolution and aspect are different than photos. If you tried to use the full sensor (as you do with pictures) you would have to do some fancy interpolating of pixels, which would take more CPU time, probably reduce the quality, and even reduce your maximum frame rate. Trying to compute a 4.28:1 ratio (made up number) is taxing. and tends towards unwanted artifacts. Especially if it's a different ratio both horizontal and vertical; which it would be if you didn't crop the sensor in at least one direction.
So instead you use less of the sensor, and less sensor means a zoomed picture (because the lens is fixed and can not change zoom levels). It's probably not 1:1. Instead I'm guessing the iPad switches to an an exact multiple of the video's resolution both horizontal and vertical (2:1 or 4:1) to reduce the CPU time used while increasing the amount of light/data used for each pixel in the final file.
I could probably grab the resolutions of each media format and do some math to make sure I'm making sense, but it's past my bedtime.
Night.