

What I think you are also seeing is AI sucking at some things and doing better than humans in others.
AI is pretty great at adding unit tests to code, for example, where humans do a just-OK job. Or in writing code for a very direct well scoped small problem.
AI is just OK at understanding product nuance and choices during larger implementations, or getting end to end coding right for any complex use cases.

Could be a lot of reasons. A big one i see working at a large company myself is that AI needs to draw from a lot of data to do its work. A huge amount of contextual data too. A company like MSFT inevitably needs to provide AI with a walled-off curated set of data, and prevent any of it from leaking. Its AIs will not have the same amount of data an AI can draw from outside MSFT.