Martin Gauch's Avatar

Martin Gauch

@gauchm.bsky.social

Deep learning & earth science @ Google Research

62 Followers  |  110 Following  |  8 Posts  |  Joined: 05.03.2025  |  1.8088

Latest posts by gauchm.bsky.social on Bluesky

Post image

Congratulations to Frederik Kratzert on winning this year's Arne Richter Award for outstanding research by an early career scientist. Fantastic presentation at #EGU25 this afternoon!

01.05.2025 12:17 β€” πŸ‘ 29    πŸ” 5    πŸ’¬ 1    πŸ“Œ 0
Post image

I can't compete with @kratzert.bsky.social's swag game, but I'll contribute a few old NeuralHydrology stickers that I found recently :)

22.04.2025 17:18 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Now on HESSD for open discussion: egusphere.copernicus.org/preprints/20...

They even let us keep the paper title (for now?!) πŸ™„

07.04.2025 11:48 β€” πŸ‘ 10    πŸ” 1    πŸ’¬ 1    πŸ“Œ 1

NeuralHydrology just got a little better, especially if you're building custom models :)

18.03.2025 09:00 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
GitHub - neuralhydrology/neuralhydrology: Python library to train neural networks with a strong focus on hydrological applications. Python library to train neural networks with a strong focus on hydrological applications. - neuralhydrology/neuralhydrology

Paper: eartharxiv.org/repository/v...

Code to reproduce is part of NeuralHydrology 1.12.0 which we just released: github.com/neuralhydrol...

Code to analyze results: github.com/gauchm/missi...

15.03.2025 12:50 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Median NSE and KGE across 531 basins at different amounts of missing input time steps. The dotted horizontal line provides the baseline of a model that cannot deal with missing data but is trained to ingest all three forcing groups at every time step. The dashed line represents the baseline of a model that uses the worst individual set of forcings (NLDAS). The shaded areas indicate the spread between minimum and maximum values across three seeds; the solid lines represent the median.

Median NSE and KGE across 531 basins at different amounts of missing input time steps. The dotted horizontal line provides the baseline of a model that cannot deal with missing data but is trained to ingest all three forcing groups at every time step. The dashed line represents the baseline of a model that uses the worst individual set of forcings (NLDAS). The shaded areas indicate the spread between minimum and maximum values across three seeds; the solid lines represent the median.

All of the approaches work pretty well! Masked mean tends to perform a little better, but it's often quite close.

More details, experiments, figures, etc. in the paper.

All of this is joint work with @kratzert.bsky.social, @danklotz.bsky.social, Grey Nearing, Debby Cohen, and Oren Gilon.

15.03.2025 12:50 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Illustration of the attention embedding strategy. Each forcing provider is projected to the same size through its own embedding network. The resulting embedding vectors become the keys and values. The static attributes, together with a binary flag for each provider, serve as the query. The attention-weighted average of embeddings is passed on to the LSTM.

Illustration of the attention embedding strategy. Each forcing provider is projected to the same size through its own embedding network. The resulting embedding vectors become the keys and values. The static attributes, together with a binary flag for each provider, serve as the query. The attention-weighted average of embeddings is passed on to the LSTM.

3) Attention: A more general variant of the masked mean that uses an attention mechanism to dynamically weight the embeddings in the average based on additional information, e.g., the basins' static attributes.

15.03.2025 12:50 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Illustration of the masked mean strategy. Each forcing provider is projected to the same size through its own embedding network. The resulting embeddings of valid providers are averaged and passed on to the LSTM.

Illustration of the masked mean strategy. Each forcing provider is projected to the same size through its own embedding network. The resulting embeddings of valid providers are averaged and passed on to the LSTM.

2) Masked mean: Embed each group of inputs separately (a group being the inputs from one data provider) and average the embeddings that are available at a given time step. This is what we currently do in Google's operational flood forecasting model.

15.03.2025 12:50 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Illustration of the input replacing strategy. NaNs in the input data for a given time step are replaced by zeros, all forcings are concatenated, together with one binary flag for each forcing group which indicates whether that group was NaN or not. The resulting vector is passed through an embedding network to the LSTM.

Illustration of the input replacing strategy. NaNs in the input data for a given time step are replaced by zeros, all forcings are concatenated, together with one binary flag for each forcing group which indicates whether that group was NaN or not. The resulting vector is passed through an embedding network to the LSTM.

In the paper we present and compare three ways to deal with those situations:
1) Input replacing: Just replace NaNs with some fixed value and concatenate the inputs with a flag to indicate missing data.

15.03.2025 12:50 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Different scenarios for missing input data: outages at individual time steps (top), data products starting at different points in time (middle), and local data products that are not available for all basins (bottom). All of these scenarios reduce the number of training samples for models that cannot cope with missing data (yellow, small box), while the models presented in this paper can be trained on all samples with valid targets (purple, large box).

Different scenarios for missing input data: outages at individual time steps (top), data products starting at different points in time (middle), and local data products that are not available for all basins (bottom). All of these scenarios reduce the number of training samples for models that cannot cope with missing data (yellow, small box), while the models presented in this paper can be trained on all samples with valid targets (purple, large box).

Starting on bsky with a new preprint: "How to deal w___ missing input data"
doi.org/10.31223/X50...

Missing input data is a very common challenge in deep learning for hydrology: weather providers have outages, some data products start later than others, some only exist for certain regions, etc.

15.03.2025 12:50 β€” πŸ‘ 19    πŸ” 5    πŸ’¬ 1    πŸ“Œ 3

@gauchm is following 20 prominent accounts