There are certainly some variables that will validate against ERA5, but it is not the end goal. We want to be able to predict future observations, which should be a set of in-situ observations and level 2 data products.
ERA5 is also probably not a good baseline for an observation forecast too, we know it does bad in many areas.
It's hard
The NeurIPS best paper, Visual Autoregressive Modeling, outperforms diffusion image generation and is well suited for multi-scale spatio-temporal Earth science data. openreview.net/pdf?id=gojL6...
I'll be at AGU next week to present our efforts on AI data assimilation. Please reach out if you'd like to meet up and learn more.
agu.confex.com/agu/agu24/me...
If a transformer is not training, it is probably the data.