We'll share the DOI when the dataset is approved!
04.03.2026 10:53 β π 3 π 0 π¬ 0 π 0@graemeday.bsky.social
Professor, Head of Digital and Data-Driven Chemistry, School of Chemistry and Chemical Engineering at @unisouthampton.bsky.social Associate Editor at Chemical Science (@roysocchem.bsky.social) structure prediction, materials discovery
We'll share the DOI when the dataset is approved!
04.03.2026 10:53 β π 3 π 0 π¬ 0 π 0
Thanks to @aichemyhub.bsky.social EPSRC @erc.europa.eu for funding.
Work led by Chris Taylor, Roohollah Hafizi and Hannah Gittins.
We're grateful to ARCHER2 and Southampton Uni HPC for the computing required for this project.
@unisouthampton.bsky.social
10/10
In short: the MACE-CSP models offer near-DFT quality CSP at a fraction of the cost
We're making the dataset (40+ million structures, 2.24 million DFT datapoints, MACE-CSP models) available. These models cover C, N, O, H and F. We will extend the element coverage in updates.
#compchemsky
9/10
Histogram of RMSD in atomic positions between predicted and experimentally determined crystal structures, with peak just below 0.2 Angstrom.
Geometries of the predicted structures reproduce experimentally determined (mainly X-ray diffraction) structures, typically to within 0.2 Angstrom RMSD in atomic positions.
This includes 'extrapolation' molecules, that were unseen during training. The models generalise well.
8/10
Histogram of energy ranking of predicted structures corresponding to experimentally determined crystal structures, with most structures at the global energy minimum.
Using the largest MACE-CSP model, 970 (58%) of 1674 target (experimentally known) are located at the global energy minimum (or as next lowest energy structure, where the global minimum is another observed structure), and 1452 (87%) are found within 2 kJ/mol of the global energy minimum.
7/10
Testing the MACE-CSP machine learned models.
We apply the trained models, along with D3 dispersion correction, to the task of CSP: how well do they perform when added as a final stage in the structure prediction workflow?
The results are excellent, in terms of energies and geometries.
6/10
The ML potentials:
We train 3 machine learned potentials on the CSP-25 DFT dataset, using the MACE architecture. These three models offer a range of cost vs accuracy.
The models reproduce DFT energies and forces accurately.
5/10
The DFT dataset:
Planewave PBE calculations performed on 2.24 million crystal structures, covering the full low energy regions of the crystal structure landscapes + selected higher energy structures.
This is a structurally and chemically diverse set, created for ML potential training.
4/10
The dataset: crystal structure prediction performed on a diverse set of 1,600 organic molecules (examples in image).
The dataset quality is validated against the geometries or all known crystal structures of these molecules, and also shown to rank them at or near the global energy minimum.
3/10
The 2 contributions that we've very excited to share reported here are:
1. A dataset of 40+ million predicted organic molecular crystal structures
2. Machine learned potentials trained on DFT on 2.24 million unique crystal structures, suitable for modelling the organic molecular solid state
2/10
We're pleased to share this preprint on @chemrxiv.org
Towards Foundation Models Trained from Crystal Structure Prediction of the Organic Molecular Solid State
40+ million predicted crystal structures + accurate ML potentials for organic molecular crystals
chemrxiv.org/doi/full/10....
1/10
We are looking forward to receiving your digital chemistry Lectureship nominations!
Please see here for details: rsc.li/chemsci-lectu...
#CompChem #MLChem #AIChem #MachineLearning
It's great to see Pedro's work on this week's cover of @chemicalscience.rsc.org
20.02.2026 20:52 β π 2 π 0 π¬ 1 π 0
π₯ New and HOT in Chemical Science!
βExciton trapping with a twistβ by Eric Vauthey et al. from the University of Geneva.
Read it for free here: pubs.rsc.org/doi/D5S...
SAUCE = sensible asymmetric units for crystal exploration
These methods transfer structural features from shorter or smaller crystal structure prediction calculations into the process of structure generation for more complex searches. Effectively, this lowers the dimensionality of the search space.
It's great to see this preprint out. doi.org/10.26434/che...
This work is a step towards making crystal structure prediction more affordable for complex molecular materials where the unit cell contains multiple symmetry-independent molecules.
Congratulations @stochasticchemist.bsky.social
The Chemical Science team welcomes Xianfeng Li from the Dalian Institute of Chemical Physics, Chinese Academy of Sciences, China as an Associate Editor!
Professor Li will be handling research on electrochemical energy storage and batteries.
This is great, @jennieemartin.bsky.social. Thanks for putting the time and work into this.
@unisouthampton.bsky.social
So, apart from the evolutionary method that we have developed, the work has produced a large, valuable dataset of crystal structures, their calculated energies and properties.
9/9
We search a moderately sized chemical space of approximately 136,000 aza-substituted polycyclic aromatic hydrocarbons for the best molecules. Through parameter testing and evaluation of the method, we have performed CSP on over 9000 unique molecules.
8/n
The approach will have broad applicability for materials discovery, wherever the property of interest is computable from the crystal structure. Here, we address electron mobility in organic semiconductors, where intermolecular electronic coupling depends strongly on crystal structure.
7/n
This is what we have done: CSP performed on-the-fly for an evolving population of molecules. We have recently shown that we can perform crystal structure prediction at large scale (doi.org/10.1039/D4FD...), so we're now making use of this capability.
6/n
The problem that we tackle here is that materials properties can depends strongly on the crystal structure. So, to evaluate the fitness of molecules in an evolving population, we need to predict their most probably crystal structures.
5/n
Generative ML methods are getting a lot of attention, but evolutionary methods are also effective: create a population of molecules and let them evolve towards a target property of set of properties, through mutations and cross-over operations on the chemical structures.
4/n
With improving reliability of CSP, we want to make better use of these methods to accelerate the discovery of functional materials. We have had success in applying CSP to sets of molecules designed from chemical intuition; now we want approaches that search more broadly for new molecules.
3/n
This paper, led by @jayjohal.bsky.social, presents a major development in a long-term project: integrating crystal structure prediction (CSP) methods for organic molecules into an evolutionary method for exploring chemical space.
2/n
Schematic of an evolutionary algorithm for generating new organic molecules, with crystal structure prediction integrated into the fitness function calculation.
I'm excited to share the latest paper from our team, just published in Nature Communications: rdcu.be/eRTSs
"Exploring organic chemical space for materials discovery using crystal structure prediction-informed evolutionary optimisation"
#compchemsky #chemsky
1/n
Our best method reaches a top-1 accuracy of 47% and 90% when top 5 space groups are selected. That's very good, given what we know about polymorphism and the tight energetic spacing of structures with different space groups from crystal structure prediction studies.
#compchemsky #machinelearning #ML
A new preprint from our team @unisouthampton.bsky.social
Can machine learning predict the space group preference of organic molecules?
Work by Hannah Gittins exploring random forest and graph neural network models to predict space group preferences of organic molecules.
doi.org/10.26434/che...
Congratulations @jennieemartin.bsky.social on this publication.
This work develops a similarity kernel for comparing molecular crystal structures, with evaluation on several ML tasks applied to CSP.
It's great to see this out now in Crystal Growth & Design @acs.org.
#chemSky #compChemSky