Skip to content

Science

AI for Crystal Materials - models and benchmarks

# AI for Crystal Materials: models and benchmarks Here we have collected papers with the theme of "AI for crystalline materials" that have appeared at top machine learning conferences and journals (ICML, ICLR, NeurIPS, AAAI, NPJ, NC, etc.) in recent years. See https://arxiv.org/abs/2408.08044 for details. We will keep this page updated.

Crystalline Material Physicochemical Property Prediction

Method Paper
SchNet Schnet: A continuous-filter convolutional neural network for modeling quantum interactions (NeurIPS2017) Paper(https://github.com/atomistic-machine-learning/schnetpack)]
CGCNN Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties (Physical Review Letters, 2018) Paper(https://github.com/txie-93/cgcnn)]
MEGNET Graph networks as a universal machine learning framework for molecules and crystals (Chemistry of Materials, 2019) Paper(https://github.com/materialsvirtuallab/megnet)]
GATGNN Graph convolutional neural networks with global attention for improved materials property prediction (Physical Chemistry Chemical Physics, 2020) Paper(https://github.com/superlouis/GATGNN)]
ALIGNN Atomistic line graph neural network for improved materials property predictions (npj Computational Materials, 2021) Paper(https://github.com/usnistgov/alignn)]
ECN Equivariant networks for crystal structures (NeurIPS2022) Paper(https://github.com/oumarkaba/equivariant_crystal_networks)]
PotNet Efficient Approximations of Complete Interatomic Potentials for Crystal Property Prediction (ICML2023) Paper(https://github.com/divelab/AIRS/tree/main/OpenMat/PotNet)]
CrysGNN Crysgnn: Distilling pre-trained knowledge to enhance property prediction for crystalline materials (AAAI2023) Paper(https://github.com/kdmsit/crysgnn)]
ETGNN A general tensor prediction framework based on graph neural networks (The Journal of Physical Chemistry Letters, 2023) [Paper]
SCANN Towards understanding structure–property relations in materials with interpretable deep learning (npj Computational Materials, 2023) Paper(https://github.com/sinhvt3421/scann--material)]
FAENet FAENet: Frame Averaging Equivariant GNN for Materials Modeling (ICML2023) [Paper]
DTNet Dielectric tensor prediction for inorganic materials using latent information from preferred potential (npj Computational Materials, 2024) Paper(https://github.com/pfnet-research/dielectric-pred)]
GMTNet A Space Group Symmetry Informed Network for O(3) Equivariant Crystal Tensor Prediction (ICML2024) Paper(https://github.com/divelab/AIRS/tree/main/OpenMat/GMTNet)]
CEGANN CEGANN: Crystal Edge Graph Attention Neural Network for multiscale classification of materials environment (npj Computational Materials, 2023) Paper(https://github.com/sbanik2/CEGANN)]
ComFormer Complete and Efficient Graph Transformers for Crystal Material Property Prediction (ICLR2024) Paper(https://github.com/divelab/AIRS/tree/main/OpenMat/ComFormer)]
Crystalformer Crystalformer: infinitely connected attention for periodic structure encoding (ICLR2024) Paper(https://github.com/omron-sinicx/crystalformer)]
Crystalformer Conformal Crystal Graph Transformer with Robust Encoding of Periodic Invariance (AAAI2024) [Paper]
E(3)NN Direct prediction of phonon density of states with Euclidean neural networks (Advanced Science, 2021) Paper(https://github.com/zhantaochen/phonondos_e3nn)]
DOSTransformer Density of States Prediction of Crystalline Materials via Prompt-guided Multi-Modal Transformer (NeurIPS2023) Paper(https://github.com/HeewoongNoh/DOSTransformer)]
Matformer Periodic Graph Transformers for Crystal Material Property Prediction (NeurIPS2022) Paper(https://github.com/YKQ98/Matformer)]
CrysDiff A Diffusion-Based Pre-training Framework for Crystal Property Prediction (AAAI2024) [Paper]
MOFTransformer A multi-modal pre-training transformer for universal transfer learning in metal-organic frameworks (Nature Machine Intelligence, 2023) Paper(https://github.com/hspark1212/MOFTransformer)]
- Examining graph neural networks for crystal structures: Limitations and opportunities for capturing periodicity (Science Advances, 2023) Paper(https://github.com/shenggong1996/examining-GNN-for-crystal-periodicity/tree/master)]
Uni-MOF A comprehensive transformer-based approach for high-accuracy gas adsorption predictions in metal-organic frameworks (Nature Communications, 2024) Paper(https://github.com/dptech-corp/Uni-MOF)]
SODNet Learning Superconductivity from Ordered and Disordered Material Structures (NeurIPS2024) Paper(https://github.com/pincher-chen/SODNet)]
ChargE3Net Higher-order equivariant neural networks for charge density prediction in materials (npj Computational Materials, 2024) Paper(https://github.com/AIforGreatGood/charge3net)]
MD-HIT MD-HIT: Machine learning for material property prediction with dataset redundancy control (npj Computational Materials, 2024) Paper(https://github.com/usccolumbia/MD-HIT)]
ECSG Predicting thermodynamic stability of inorganic compounds using ensemble machine learning based on electron configuration (Nature Communications, 2025) Paper(https://github.com/Haozou-csu/ECSG)]
CrystalFramer Rethinking the role of frames for SE(3)-invariant crystal structure modeling (ICLR2025) [Paper] [Code]
ct-UAE Transformer-generated atomic embeddings to enhance prediction accuracy of crystal properties with machine learning (Nature Communications, 2025) [Paper] [Code]
- Cross-scale covariance for material property prediction (npj Computational Materials, 2025) [Paper] [Code]
AdsMT A multi-modal transformer for predicting global minimum adsorption energy (Nature Communications, 2025) [Paper] [Code]
DPF A Denoising Pre-training Framework for Accelerating Novel Material Discovery (AAAI2025) [Paper]
CrysCo Accelerating materials property prediction via a hybrid Transformer Graph framework that leverages four body interactions (npj Computational Materials, 2025) [Paper] [Code]
E2T Advancing extrapolative predictions of material properties through learning to learn using extrapolative episodic training (Communications Materials, 2025) [Paper] [Code]
- Probing out-of-distribution generalization in machine learning for materials (Communications Materials, 2025) [Paper] [Code]
- A machine learning model with minimize feature parameters for multi-type hydrogen evolution catalyst prediction (npj Computational Materials, 2025) [Paper] [Code]
- Automatic identification of slip pathways in ductile inorganic materials by combining the active learning strategy and NEB method (npj Computational Materials, 2025) [Paper]
BETE-NET Accelerating superconductor discovery through tempered deep learning of the electron-phonon spectral function (npj Computational Materials, 2025) [Paper] [Code]
Rep-CodeGen Code-Generated Graph Representations Using Multiple LLM Agents for Material Properties Prediction (ICML2025) [Paper]

Crystalline Material Synthesis

Method Paper
G-SchNet Symmetry-adapted generation of 3d point sets for the targeted discovery of molecules (NeurIPS2019) Paper(https://github.com/atomistic-machine-learning/G-SchNet)]
CubicGAN High-throughput discovery of novel cubic crystal materials using deep generative neural networks (Advanced Science, 2021) Paper(https://github.com/MilesZhao/CubicGAN)]
CDVAE Crystal Diffusion Variational Autoencoder for Periodic Material Generation (ICLR2022) Paper(https://github.com/txie-93/cdvae)]
LCOMs Latent Conservative Objective Models for Data-Driven Crystal Structure Prediction (NeurIPS2023 Workshop) [Paper]
DiffCSP Crystal structure prediction by joint equivariant diffusion on lattices and fractional coordinates (NeurIPS2023) Paper(https://github.com/jiaor17/DiffCSP)]
SyMat Towards symmetry-aware generation of periodic materials (NeurIPS2023) Paper(https://github.com/divelab/AIRS/tree/main/OpenMat/SyMat)]
EMPNN Equivariant Message Passing Neural Network for Crystal Material Discovery (AAAI2023) Paper(https://github.com/aklipf/pegnn)]
PGCGM Physics guided deep learning for generative design of crystal materials with symmetry constraints (npj Computational Materials, 2023) Paper(https://github.com/MilesZhao/PGCGM)]
PCVAE PCVAE: A Physics-informed Neural Network for Determining the Symmetry and Geometry of Crystals (IJCNN2023) Paper(https://github.com/zjuKeLiu/PCVAE)]
Govindarajan Behavioral Cloning for Crystal Design (ICLR2023 Workshop) Paper()]
CHGFlowNet Hierarchical GFlownet for Crystal Structure Generation (NeurIPS2023 Workshop) [Paper]
LM-CM,LM-AC Language models can generate molecules, materials, and protein binding sites directly in three dimensions as xyz, cif, and pdb files (Arxiv, 2023) Paper(https://github.com/danielflamshep/xyztransformer)]
SLI2Cry An invertible, invariant crystal representation for inverse design of solid-state materials using generative deep learning (Nature Communications, 2023) Paper(https://github.com/xiaohang007/SLICES/tree/main)]
GNoME Scaling deep learning for materials discovery (Nature, 2023) Paper(https://github.com/google-deepmind/materials_discovery)]
ipcsp Optimality guarantees for crystal structure prediction (Nature, 2023) Paper(https://github.com/lrcfmd/ipcsp)]
DiffCSP-SC Learning Superconductivity from Ordered and Disordered Material Structures (NeurIPS2024) Paper(https://github.com/pincher-chen/DiffCSP-SC)]
EquiCSP Equivariant Diffusion for Crystal Structure Prediction (ICML2024) Paper(https://github.com/EmperorJia/EquiCSP)]
GemsDiff Vector Field Oriented Diffusion Model for Crystal Material Generation (AAAI2024) Paper(https://github.com/aklipf/gemsdiff)]
UniMat Scalable Diffusion for Materials Generation (ICLR2024) Paper(https://unified-materials.github.io/unimat/)]
DiffCSP++ Space Group Constrained Crystal Generation (ICLR2024) Paper(https://github.com/jiaor17/DiffCSP-PP)]
FlowMM FlowMM: Generating Materials with Riemannian Flow Matching (ICML2024) Paper(https://github.com/facebookresearch/flowmm)]
CrystaLLM Crystal structure generation with autoregressive large language modeling (Nature Communications, 2024) Paper(https://github.com/lantunes/CrystaLLM)]
Con-CDVAE Con-CDVAE: A method for the conditional generation of crystal structures (Computational Materials Today, 2024) Paper(https://github.com/cyye001/Con-CDVAE)]
Cond-CDVAE Deep learning generative model for crystal structure prediction (npj Computational Materials, 2024) Paper(https://github.com/ixsluo/cond-cdvae)]
CrystalFormer Space Group Informed Transformer for Crystalline Materials Generation (Arxiv, 2024) Paper(https://github.com/deepmodeling/CrystalFormer)]
Gruver Fine-Tuned Language Models Generate Stable Inorganic Materials as Text (ICLR2024) Paper(https://github.com/facebookresearch/crystal-text-llm)]
FlowLLM FlowLLM: Flow Matching for Material Generation with Large Language Models as Base Distributions (NeurIPS2024) Paper(https://github.com/facebookresearch/flowmm)]
Mat2Seq Invariant Tokenization of Crystalline Materials for Language Model Enabled Generation (NeurIPS2024) [Paper]
FlowDPO 3D Structure Prediction of Atomic Systems with Flow-Based Direct Preference Optimization (NeurIPS2024) [Paper]
GenMS Generative Hierarchical Materials Search (NeurIPS2024) [Paper]
ChemReasoner CHEMREASONER: Heuristic Search over a Large Language Model’s Knowledge Space using Quantum-Chemical Feedback (ICML2024) [Paper] [Code]
a²c Predicting emergence of crystals from amorphous precursors with deep learning potentials (Nature Computational Science, 2024) Paper(https://github.com/jax-md/jax-md/tree/main/jax_md/a2c)]
- Rapid prediction of molecular crystal structures using simple topological and physical descriptors (Nature Communications, 2024) [Paper]
ShotgunCSP Shotgun crystal structure prediction using machine-learned formation energies (npj Computational Materials, 2024) [Paper] [Code]
MatterGen A generative model for inorganic materials design (Nature, 2025) Paper(https://github.com/microsoft/mattergen)]
SymmCD SymmCD: Symmetry-Preserving Crystal Generation with Diffusion Models (ICLR2025) Paper(https://github.com/sibasmarak/SymmCD)]
MatExpert MatExpert: Decomposing Materials Discovery By Mimicking Human Experts (ICLR2025) [Paper]
- Designing Mechanical Meta-Materials by Learning Equivariant Flows (ICLR2025) [Paper]
MOFFlow MOFFlow: Flow Matching for Structure Prediction of Metal-Organic Frameworks (ICLR2025) [Paper]
TGDMat Periodic Materials Generation using Text-Guided Joint Diffusion Model (ICLR2025) [Paper]
CrysBFN A Periodic Bayesian Flow for Material Generation (ICLR2025) [Paper] [Code]
OSDAs OSDA Agent: Leveraging Large Language Models for De Novo Design of Organic Structure Directing Agents (ICLR2025) [Paper]
MAGUS Efficient crystal structure prediction based on the symmetry principle (Nature Computational Science, 2025) [Paper]
Target XXXI A robust crystal structure prediction method to support small molecule drug development with large scale validation and blind study (Nature Communications, 2025) [Paper]
active-csp Accelerating crystal structure search through active learning with neural networks for rapid relaxations (npj Computational Materials, 2025) Paper(https://github.com/stefaanhessmann/active-csp)]
Chemeleon Exploration of crystal chemical space using text-guided generative artificial intelligence (Nature Communications, 2025) Paper(https://github.com/hspark1212/chemeleon/)]
MAGECS Inverse design of promising electrocatalysts for CO2 reduction via generative models and bird swarm algorithm (Nature Communications, 2025) Paper(https://github.com/szl666/CO2RR-inverse-design)]
PGH-VAEs Inverse design of catalytic active sites via interpretable topology-based deep generative models (npj Computational Materials, 2025) [Paper] [Code]
WyFormer Wyckoff Transformer: Generation of Symmetric Crystals (ICML2025) [Paper]
KLDM Kinetic Langevin Diffusion for Crystalline Materials Generation (ICML2025) [Paper]
WyckoffDiff WyckoffDiff -- A Generative Diffusion Model for Crystal Symmetry (ICML2025) [Paper]
OMG Open Materials Generation with Stochastic Interpolants (ICML2025) [Paper]

Aiding Characterization

Method Paper
- Insightful classification of crystal structures using deep learning (Nature Communications, 2018) [Paper]
- Advanced steel microstructural classification by deep learning methods (Scientific Reports, 2018) [Paper]
- Neural network for nanoscience scanning electron microscope image recognition (Scientific Reports, 2017) [Paper]
- Deep Learning-Assisted Quantification of Atomic Dopants and Defects in 2D Materials (Advanced Science, 2021) [Paper]
- Classification of crystal structure using a convolutional neural network (IUCrJ,2017) [Paper]
- Synthesis, optical imaging, and absorption spectroscopy data for 179072 metal oxides (Scientific Data, 2019) [Paper]
- Adaptively driven X-ray diffraction guided by machine learning for autonomous phase identification (npj Computational Materials, 2023) Paper(https://github.com/njszym/AdaptiveXRD)]
- Automated classification of big X-ray diffraction data using deep learning models (npj Computational Materials, 2023) Paper(https://github.com/AGI-init/XRDs)]
XRD-AutoAnalyzer Integrated analysis of X-ray diffraction patterns and pair distribution functions for machine-learned phase identification (npj Computational Materials, 2024) Paper(https://github.com/njszym/XRD-AutoAnalyzer)]
CrystalNet Towards end-to-end structure determination from x-ray diffraction data using deep learning (npj Computational Materials, 2024) Paper(https://github.com/gabeguo/deep-crystallography-public)]
- Construction and Application of Materials Knowledge Graph in Multidisciplinary Materials Science via Large Language Model (NeurIPS2024) Paper(https://github.com/MasterAI-EAM/Material-Knowledge-Graph)]
MatDuck Zero-Shot Learning for Materials Science Texts: Leveraging Duck Typing Principles (AAAI2025) Paper(https://github.com/xinzcode/MatDuck)]
- Unsupervised identification of crystal defects from atomistic potential descriptors (npj Computational Materials, 2025) [Paper]
PAGL Learning to predict rare events: the case of abnormal grain growth (npj Computational Materials, 2025) [Paper]
PXRDnet Ab initio structure solutions from nanocrystalline powder diffraction data via diffusion models (Nature Materials, 2025) Paper(https://github.com/gabeguo/cdvae_xrd)]
SBC Automated identification of bulk structures, two-dimensional materials, and interfaces using symmetry-based clustering (npj Computational Materials, 2025) [Paper] [Code]

Accelerating Theoretical Computation

Method Paper
BPNN Generalized neural-network representation of high-dimensional potential-energy surfaces (Physical Review Letters, 2007) [Paper]
- Gaussian approximation potentials: The accuracy of quantum mechanics, without the electrons (Physical Review Letters, 2010) [Paper]
NequIP E (3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials (Nature Communications, 2022) Paper(https://github.com/mir-group/nequip)]
Cormorant Cormorant: Covariant molecular neural networks (NeurIPS2019) Paper(https://github.com/risilab/cormorant)]
MACE MACE: Higher order equivariant message passing neural networks for fast and accurate force fields (NeurIPS2022) Paper(https://github.com/ACEsuit/mace)]
DimeNet Directional Message Passing for Molecular Graphs (ICLR2020) Paper(https://github.com/gasteigerjo/dimenet)]
M3GNet A universal graph deep learning interatomic potential for the periodic table (Nature Computational Science, 2022) Paper(https://github.com/materialsvirtuallab/m3gnet)]
- Injecting domain knowledge from empirical interatomic potentials to neural networks for predicting material properties (NeurIPS2022) Paper(https://github.com/shuix007/EIP4NNPotentials)]
CHGNet CHGNet as a pretrained universal neural network potential for charge-informed atomistic modelling (Nature Machine Intelligence, 2023) Paper(https://github.com/CederGroupHub/chgnet)]
- Forces are not Enough: Benchmark and Critical Evaluation for Machine Learning Force Fields with Molecular Simulations (Transactions on Machine Learning Research, 2023) [Paper]
DeepH-E3 General framework for E (3)-equivariant neural network representation of density functional theory Hamiltonian (Nature Communications, 2023) [Paper] [Code]
AdsorbDiff AdsorbDiff: Adsorbate Placement via Conditional Denoising Diffusion (ICML2024) [Paper] [Code]
DeepRelax Scalable crystal structure relaxation using an iteration-free deep generative model with uncertainty quantification (Nature Communications, 2024) [Paper] [Code]
AssembleFlow AssembleFlow: Rigid Flow Matching with Inertial Frames for Molecular Assembly (ICLR2025) [Paper]
- Machine learning Hubbard parameters with equivariant neural networks (npj Computational Materials, 2025) [Paper] [Code]

Benchmark

Method Paper
MatBench Benchmarking materials property prediction methods: the Matbench test set and Automatminer reference algorithm (npj Computational Materials, 2020) Paper(https://github.com/materialsproject/matbench)]
M² Hub M²Hub: Unlocking the Potential of Machine Learning for Materials Discovery (NeurIPS2023) Paper(https://github.com/yuanqidu/M2Hub)]
JARVIS-Leaderboard JARVIS-Leaderboard: a large scale benchmark of materials design methods (npj Computational Materials, 2024) Paper(https://github.com/usnistgov/jarvis_leaderboard)]
- Structure-based out-of-distribution (OOD) materials property prediction: a benchmark study (npj Computational Materials, 2024) Paper(https://github.com/sadmanomee/OOD_Materials_Benchmark)]
SimXRD SimXRD-4M: Big Simulated X-ray Diffraction Data and Crystalline Symmetry Classification Benchmark (ICLR2025) [Paper] [Code]
ECD ECD: A Machine Learning Benchmark for Predicting Enhanced-Precision Electronic Charge Density in Crystalline Inorganic Materials (ICLR2025) [Paper]

Common Dataset

Dataset Description URL
Materials Project Materials Project encompasses over 120,000 materials, each accompanied by a comprehensive specification of its crystal structure and important physical properties. Materials Project
JARVIS-DFT JARVIS-DFT encompasses data for approximately 40,000 materials and includes around one million calculated properties. JARVIS-DFT
OQMD OQMD is a repository of thermodynamic and structural properties of inorganic materials, derived from high-throughput DFT calculations. OQMD
Perov-5 Perov-5 is a specialized dataset of perovskite crystal materials, containing 18,928 different perovskite materials. Perov-5
Carbon-24 Carbon-24 is a specialized dataset of carbon materials, containing over 10,000 different carbon structures. Carbon-24
Crystallography Open Database Crystallography Open Database is a crystallography database that specializes in collecting and storing crystal structure information for inorganic compounds, small organic molecules, metal-organic compounds, and minerals. Crystallography Open Database
Raman Open Database Raman Open Database is an open database that specializes in collecting and storing Raman spectroscopy data. Raman Open Database
Inorganic Crystal Structure Database Inorganic Crystal Structure Database is the world's largest database for completely identified inorganic crystal structures. Inorganic Crystal Structure Database
Open Catalyst Project The goal of Open Catalyst Project is to utilize artificial intelligence to simulate and discover new catalysts for renewable energy storage. Open Catalyst Project
Python Materials Genomics Python Materials Genomics is a robust, open-source Python library for materials analysis, offering a range of modules for handling crystal structures, band structures, phase diagrams, and material properties. Python Materials Genomics
Phonon DOS Dataset Phonon DOS Dataset contains approximately 1,500 crystalline materials whose phonon DOS is calculated from DFPT. Phonon DOS Dataset
Carolina Materials Database CMD primarily consists of ternary and quaternary materials generated by some AI methods. Carolina Materials Database
Alexandria Database Alexandria Database includes a large quantity of hypothetical crystal structures generated by ML methods or other algorithmic methodologies. Alexandria Database
Materials Project Trajectory Dataset MPtrj contains 1,580,395 atomic configurations, corresponding energies, 7,944,833 magnetic moments, 49,295,660 forces, and 14,223,555 stress values. Materials Project Trajectory Dataset
Quantum MOF QMOF is a dataset of over 20K metal-organic frameworks and coordination polymers derived from DFT. Quantum MOF
Open Materials 2024 OMat24 contains over 110 million DFT calculations focused on structural and compositional diversity. Open Materials 2024
SuperCon3D SuperCon3D contains 1,578 superconductor materials (includes 83 distinct elements), each with both Tc and crystal structure data. SuperCon3D
Atomly The Atomly database provides an extensive collection of material data generated through high-throughput first-principles calculations. This includes 320,000 inorganic crystal structures, 310,000 bandgap and density of states profiles, 12,000 dielectric constant tensors, and 16,000 mechanical tensors. Atomly

Source

Materials & Chemistry Datasets

# Awesome Materials & Chemistry Datasets

A curated list of the most useful datasets in materials science and chemistry for training machine learning and AI foundation models. This includes experimental, computational, and literature-mined datasets—prioritizing open-access resources and community contributions.

This project aims to: - Catalog the best datasets by domain, type, quality, and size - Support reproducible research in AI for chemistry and materials - Provide a community-driven resource with contributions from researchers and developers


Table of Contents


How to Use

  • Explore datasets by domain or data type using the tables below
  • Click the access links to explore or download the data
  • Sort/filter by quality, size, and suitability for ML models
  • Fork the repo and submit a pull request to add new datasets

Contributing

Want to add a new dataset or improve metadata?

  1. Fork the repository
  2. Edit the appropriate dataset list or add a new entry
  3. Submit a pull request with a brief description and source
  4. Use the following fields:
  5. Dataset Name
  6. Domain
  7. Type (Computational, Experimental, Literature-mined)
  8. Size
  9. Access (Open/Restricted/Proprietary)
  10. Format (JSON, CSV, CIF, HDF5, SMILES, etc.)
  11. License
  12. Access Link
  13. Notes or Use Cases

Datasets

Computational Datasets

Dataset Domain Size Type Format License Access
OMat24 (Meta) Inorganic crystals 110M DFT entries Computational JSON/HDF5 CC BY 4.0 Open
OMol25 (Meta) Molecular chemistry 100M+ DFT calculations Computational LMDB CC BY 4.0 Open
Materials Project (LBL) Inorganic crystals 500k+ compounds Computational JSON/API CC BY 4.0 Open
Open Catalyst 2020 (OC20) Catalysis (surfaces) 1.2M relaxations Computational JSON/HDF5 CC BY 4.0 Open
AFLOW Inorganic materials 3.5M materials Computational REST API Open Open
OQMD Inorganic solids 1M+ compounds Computational SQL/CSV Open Open
JARVIS-DFT (NIST) 3D/2D materials 40k+ entries Computational JSON/API Open Open
Carolina Materials DB Hypothetical crystals 214k structures Computational JSON CC BY 4.0 Open
NOMAD Various DFT/MD >19M calculations Computational JSON CC BY 4.0 Open
MatPES DFT Potential Energy Surfaces ~400,000 structures from 300K MD simulations Computational JSON Open
Vector-QM24 Small organic and inorganic molecules 836k conformational isomers Computational JSON Placeholder Open
AIMNet2 Dataset Non-metallic compounds 20M hybrid DFT calculations Computational JSON Open Open
RDB7 Barrier height and enthalpy for small organic reactions 12k CCSD(T)-F12 calculations Computational CSV Open Open
RDB19-Rad ΔG of activation and of reaction for organic reactions in 40 common solvents 5.6k DFT + COSMO-RS calculations Computational CSV Open Open
QCML Small molecules consisting of up to 8 heavy atoms 14.7B Semi-empirical + 33.5M DFT calculations Computational TFDS CC BY-NC 4.0 Open
QM9 Small organic molecules 134k molecules with quantum properties Experimental SDF/CSV CC BY 4.0 Open
QM7/QM7b Small molecules 7k molecules with atomization energies Experimental SDF/CSV CC BY 4.0 Open

Experimental Datasets

Dataset Domain Size Type Format License Access
Crystallography Open Database (COD) Crystal structures ~525k entries Experimental CIF/SMILES CC0 1.0 Open
NIST ICSD (subset) Inorganic structures ~290k structures Experimental CIF Proprietary Restricted
CSD (Cambridge) Organic crystals ~1.3M structures Experimental CIF Proprietary Restricted
opXRD Crystal structures 92552 (2179 labeled) Experimental JSON CC BY 4.0 Open
MDR SuperCon Superconductivity legacy superconductor database w/ material composition, structure, properties, and processes Mixed CC BY 4.0 Open
ChEMBL Bioactive molecules 2.3M+ compounds with bioactivity data Experimental JSON/SDF CC BY-SA 3.0 Open
MoleculeNet Molecular properties 700k+ compounds across 17 datasets Mixed CSV/SDF Various Open
ESOL Aqueous solubility 1,128 compounds with solubility data Experimental CSV Open Open
FreeSolv Hydration free energy 643 molecules with experimental data Experimental CSV CC BY 4.0 Open
Lipophilicity Octanol/water distribution 4,200 compounds with logD values Experimental CSV Open Open
PCBA Bioassay screening 400k+ compounds, 128 bioassays Experimental CSV Open Open
HIV Antiviral screening 41k compounds with HIV inhibition data Experimental CSV Open Open
BACE Beta-secretase inhibitors 1,522 compounds with IC50 data Experimental CSV Open Open
BBBP Blood-brain barrier permeability 2,053 compounds with permeability data Experimental CSV Open Open
Tox21 Toxicity screening 8k compounds, 12 toxicity targets Experimental CSV Open Open
ToxCast High-throughput toxicity 8k compounds, 600+ assays Experimental CSV Open Open
SIDER Drug side effects 1,427 drugs with adverse reactions Experimental CSV Open Open
ClinTox Clinical trial toxicity 1,491 compounds with FDA approval status Experimental CSV Open Open
PDBbind Protein-ligand binding 19k complexes with binding affinities Experimental PDB/SDF Open Open
BindingDB Protein-ligand binding 2.8M+ binding data points Experimental CSV/SDF CC BY 4.0 Open
ProtBENCH Drug-target interactions Protein family-specific datasets Experimental CSV GPL-3.0 Open
PDBench Protein sequence design 595 protein structures, 40 architectures Experimental PDB MIT Open
PDB-Struct Structure-based protein design Comprehensive protein design benchmark Experimental PDB Open Open

LLM Training Datasets

Dataset Domain Size Type Format License Access
ChemPile Chemistry 75B+ tokens LLM Training Mixed Open Open
SmolInstruct Small molecules 3.3M samples LLM Training JSON CC BY 4.0 Open
CAMEL Chemistry 20K problem-solution pairs LLM Training JSON Open Open
ChemNLP Chemistry Extensive, many combined datasets LLM Training JSON Open Open
ChemQA Chemistry Multimodal QA dataset LLM Training JSON Open Open
ChemLLMBench Chemistry 8 chemistry tasks benchmark LLM Training JSON Open Open
ChemistryQA Chemistry 4,500 questions across 200 topics LLM Training JSON Open Open
MaScQA Materials Science 640 QA pairs LLM Training XLSX Open Open
SciCode Research Coding in Physics, Math, Material Science, Biology, and Chemistry 338 subproblems LLM Training JSON Open Open
ChemData 700K Chemistry (9 core tasks) 730K Q-A instruction pairs LLM Training JSON CC BY-NC 4.0 Open
MatSci-Instruct (HoneyBee) Materials science ≈55K verified instructions LLM Training JSON CC BY 4.0 Open
MoleculeQA Molecular properties & safety 62K multiple-choice QA pairs LLM Training JSON MIT Open
BioInstruct 25K Biomedical / biochemistry 25K GPT-4 generated instructions LLM Training JSON MIT Open
Lab-Bench Biology 2,400+ questions for biology agents LLM Training JSON Open Open
ChemBench 4K Chemistry competency benchmark 4,100 single-choice questions LLM Training JSON CC BY-NC 4.0 Open
GPQA Diamond Biology, Physics, Chemistry 448 multiple-choice questions LLM Training JSON Open Open
SciAssess Scientific literature analysis Benchmark for LLMs in science LLM Training JSON Open Open
ZINC20-ML Drug-like molecules (SMILES) ≈1B molecules LLM Training SMILES ZINC License Open
PMC Open Access Subset Biomedical full-text 3.4M+ articles LLM Training XML Various CC Open
MatScholar Task-Schema QA (MatSci-NLP) Materials science (7 NLP tasks) Tens of thousands of examples LLM Training JSON CC BY 4.0 Open
Mol-Instructions Chemistry molecular, protein, and biochemical instructions LLM Training HuggingFace Dataset Open Open
USPTO-LLM Chemical reactions 247K reactions LLM Training JSON/Graph CC BY 4.0 Open

Literature-mined & Text Datasets

Dataset Domain Size Type Format License Access
PubChem Molecules & data 119M compounds Literature SMILES/SDF Public Domain Open
Open Reaction Database (ORD) Synthetic reactions ~1M reactions Experimental/Lit JSON CC BY 4.0 Open
PatCID (IBM) Chemical image data 81M images / 13M mols Literature PNG/SMILES Open Open
MatScholar NLP corpus (materials) 5M+ abstracts Literature JSON/Graph Open Open

Proprietary Datasets (for reference)

Dataset Domain Size Access Use Case Notes
CAS Registry Chemical substances 250M+ substances Proprietary Industry standard for molecule indexing
Reaxys (Elsevier) Reactions & properties Millions of reactions Proprietary Rich curated literature reaction data
Citrine Informatics DB Experimental materials Private Proprietary Materials ML platform w/ industry data
CSD (Cambridge) Organic crystals 1.3M+ Proprietary Gold-standard X-ray structures
PoLyInfo Polymers & properties 500k+ data points / Experimental Proprietary Polymer properties from literature sources

Dataset Resources

  • The Materials Data Facility - Over 100 TB of open materials data. #TODO list some of these in the tables above
  • Foundry-ML search Foundry - 61 structured datasets ready for download through a Python client #TODO list some of these in the tables above

TODO

  • Classify and add CRIPT for polymer data
  • Classify and add Polymer Genome and other datasets from Khazana
  • A dataset on solubilities of gases in polymers (15 000 experimental measurements of 79 gases' uptakes (0.01–50 wt%) in 102 different polymers, pressures from 1 × 10−3 to 7 × 102 bar and temperatures from 233 to 508 K, includes nearly 500 solvent–polymer systems). Optimized structures of various repeating units are included. Should it be of interest for you, it is available here: Data
  • Add Materials Cloud Datasets
  • Classify Atomly. A bit challenging with non-English
  • Look into adding NOMAD for experimental data as well
  • Review Alexandria Materials
  • Add A Quantum-Chemical Bonding Database for Solid-State Materials Part 1: https://zenodo.org/records/8091844 Part 2: https://zenodo.org/records/8092187
  • Add QM datasets. http://quantum-machine.org/datasets/
  • Find link for | ChemRxivQuest | Chemistry literature QA | 970 curated QA pairs | LLM Training | JSON | CC BY 4.0 | Open | ChemRxivQuest |
  • Find new link for USPTO-Reactions | USPTO Reactions | Organic reactions | 1.8M reactions | Literature | RXN/SMILES | Open | Open |


License

This project is licensed under the MIT License. Each dataset listed has its own license, noted in the table. Always check the source's license before using the data in your project.


Acknowledgements

Thanks to the open data and research communities including: - Meta AI FAIR - The Materials Data Facility / Foundry-ML - NIST JARVIS and Materials Project - LBL, MIT, CCDC, FIZ Karlsruhe - Contributors to Open Catalyst, PubChem, ORD, and AFLOW - Developers of open chemistry toolkits (RDKit, Open Babel)


Citation

If this repository was helpful in your work, feel free to cite or star the repo. You can also reference the underlying dataset publications linked above.

Source

Best of Atomistic Machine Learning

Best of Atomistic Machine Learning ⚛️🧬💎

🏆  A ranked list of awesome atomistic machine learning (AML) projects. Updated regularly.

DOI

This curated list contains 510 awesome open-source projects with a total of 220K stars grouped into 23 categories. All projects are ranked by a project-quality score, which is calculated based on various metrics automatically collected from GitHub and different package managers. If you like to add or update projects, feel free to open an issue, submit a pull request, or directly edit the projects.yaml.

The current focus of this list is more on simulation data rather than experimental data, and more on materials rather than drug design. Nevertheless, contributions from other fields are warmly welcome!

How to cite. See the button "Cite this repository" on the right side-bar.

🧙‍♂️ Discover other best-of lists or create your own.

Contents

Explanation

  • 🥇🥈🥉  Combined project-quality score
  • ⭐️  Star count from GitHub
  • 🐣  New project (less than 6 months old)
  • 💤  Inactive project (6 months no activity)
  • 💀  Dead project (12 months no activity)
  • 📈📉  Project is trending up or down
  • ➕  Project was recently added
  • 👨‍💻  Contributors count from GitHub
  • 🔀  Fork count from GitHub
  • 📋  Issue count from GitHub
  • ⏱️  Last update timestamp on package manager
  • 📥  Download count from package manager
  • 📦  Number of dependent projects


Active learning

Back to top

Projects that focus on enabling active learning, iterative learning schemes for atomistic ML.

DP-GEN (🥇23 · ⭐ 340) - The deep potential generator to generate a deep-learning based model of interatomic potential energy and force field. LGPL-3.0 ML-IAP MD workflows - [GitHub](https://github.com/deepmodeling/dpgen) (👨‍💻 69 · 🔀 180 · 📥 1.9K · 📦 8 · 📋 330 - 16% open · ⏱️ 21.02.2025):
git clone https://github.com/deepmodeling/dpgen
- [PyPi](https://pypi.org/project/dpgen) (📥 640 / month · 📦 2 · ⏱️ 21.02.2025):
pip install dpgen
- [Conda](https://anaconda.org/deepmodeling/dpgen) (📥 230 · ⏱️ 25.03.2025):
conda install -c deepmodeling dpgen
FLARE (🥈18 · ⭐ 320) - An open-source Python package for creating fast and accurate interatomic potentials. MIT C++ ML-IAP - [GitHub](https://github.com/mir-group/flare) (👨‍💻 43 · 🔀 70 · 📥 9 · 📦 12 · 📋 220 - 16% open · ⏱️ 22.03.2025):
git clone https://github.com/mir-group/flare
IPSuite (🥈18 · ⭐ 23) - A Python toolkit for FAIR development and deployment of machine-learned interatomic potentials. EPL-2.0 ML-IAP MD workflows HTC FAIR - [GitHub](https://github.com/zincware/IPSuite) (👨‍💻 8 · 🔀 11 · 📦 8 · 📋 180 - 53% open · ⏱️ 19.05.2025):
git clone https://github.com/zincware/IPSuite
- [PyPi](https://pypi.org/project/ipsuite) (📥 380 / month · 📦 4 · ⏱️ 15.05.2025):
pip install ipsuite
DP-GEN2 (🥈14 · ⭐ 39) - 2nd generation of the Deep Potential GENerator. LGPL-3.0 ML-IAP MD workflows - [GitHub](https://github.com/deepmodeling/dpgen2) (👨‍💻 15 · 🔀 31 · 📦 6 · 📋 35 - 34% open · ⏱️ 29.04.2025):
git clone https://github.com/deepmodeling/dpgen2
Bgolearn (🥉13 · ⭐ 91) - [Materials & Design 2024 | NPJ com mat 2024] A Bayesian global optimization package for material design Adaptive.. MIT materials-discovery probabilistic - [GitHub](https://github.com/Bin-Cao/Bgolearn) (👨‍💻 3 · 🔀 15 · 📥 51 · 📋 3 - 33% open · ⏱️ 10.03.2025):
git clone https://github.com/Bin-Cao/Bgolearn
- [PyPi](https://pypi.org/project/Bgolearn) (📥 450 / month · ⏱️ 23.02.2025):
pip install Bgolearn
Finetuna (🥉9 · ⭐ 55 · 💤) - Active Learning for Machine Learning Potentials. MIT - [GitHub](https://github.com/ulissigroup/finetuna) (👨‍💻 11 · 🔀 12 · 📦 1 · 📋 20 - 25% open · ⏱️ 15.05.2024):
git clone https://github.com/ulissigroup/finetuna
Show 3 hidden projects... - flare++ (🥉13 · ⭐ 37 · 💀) - A many-body extension of the FLARE code. MIT C++ ML-IAP - ACEHAL (🥉5 · ⭐ 12 · 💀) - Hyperactive Learning (HAL) Python interface for building Atomic Cluster Expansion potentials. Unlicensed Julia - ALEBREW (🥉4 · ⭐ 21 · 💤) - Official repository for the paper Uncertainty-biased molecular dynamics for learning uniformly accurate interatomic.. Custom ML-IAP MD


Community resources

Back to top

Projects that collect atomistic ML resources or foster communication within community.

🔗 ACE / GRACE support - Support forum for the Atomic Cluster Expansion (ACE) and extensions.

🔗 AI for Science Map - Interactive mindmap of the AI4Science research field, including atomistic machine learning, including papers,..

🔗 ASE ecosystem - This is a list of software packages related to ASE or using ASE. md, ml-iap

🔗 Atomic Cluster Expansion - Atomic Cluster Expansion (ACE) community homepage.

🔗 CrystaLLM - Generate a crystal structure from a composition. language-models generative pretrained transformer

🔗 GAP-ML.org community homepage ML-IAP

🔗 matsci.org - A community forum for the discussion of anything materials science, with a focus on computational materials science..

🔗 Matter Modeling Stack Exchange - Machine Learning - Forum StackExchange, site Matter Modeling, ML-tagged questions.

Best-of Machine Learning with Python (🥇22 · ⭐ 20K) - A ranked list of awesome machine learning Python libraries. Updated weekly. CC-BY-4.0 general-ml Python - [GitHub](https://github.com/ml-tooling/best-of-ml-python) (👨‍💻 53 · 🔀 2.7K · 📋 61 - 44% open · ⏱️ 22.05.2025):
git clone https://github.com/ml-tooling/best-of-ml-python
MatBench Discovery (🥇21 · ⭐ 160) - An evaluation framework for machine learning models simulating high-throughput materials discovery. MIT datasets benchmarking model-repository - [GitHub](https://github.com/janosh/matbench-discovery) (👨‍💻 18 · 🔀 37 · 📦 4 · 📋 59 - 6% open · ⏱️ 21.05.2025):
git clone https://github.com/janosh/matbench-discovery
- [PyPi](https://pypi.org/project/matbench-discovery) (📥 1.5K / month · ⏱️ 11.09.2024):
pip install matbench-discovery
Garden (🥇19 · ⭐ 29) - FAIR AI/ML Model Publishing Framework. MIT model-repository - [GitHub](https://github.com/Garden-AI/garden) (👨‍💻 13 · 🔀 4 · 📦 6 · 📋 340 - 0% open · ⏱️ 01.05.2025):
git clone https://github.com/Garden-AI/garden
- [PyPi](https://pypi.org/project/garden-ai) (📥 1.2K / month · ⏱️ 01.05.2025):
pip install garden-ai
Graph-based Deep Learning Literature (🥈18 · ⭐ 4.9K) - links to conference publications in graph-based deep learning. MIT general-ml rep-learn - [GitHub](https://github.com/naganandy/graph-based-deep-learning-literature) (👨‍💻 12 · 🔀 770 · ⏱️ 22.05.2025):
git clone https://github.com/naganandy/graph-based-deep-learning-literature
OpenML (🥈18 · ⭐ 690) - Open Machine Learning. BSD-3 datasets - [GitHub](https://github.com/openml/OpenML) (👨‍💻 35 · 🔀 95 · 📋 930 - 39% open · ⏱️ 07.12.2024):
git clone https://github.com/openml/OpenML
AI for Science Resources (🥈14 · ⭐ 630) - List of resources for AI4Science research, including learning resources. GPL-3.0 license - [GitHub](https://github.com/divelab/AIRS) (👨‍💻 31 · 🔀 72 · 📋 25 - 4% open · ⏱️ 01.05.2025):
git clone https://github.com/divelab/AIRS
GT4SD - Generative Toolkit for Scientific Discovery (🥈14 · ⭐ 350) - Gradio apps of generative models in GT4SD. MIT generative pretrained drug-discovery model-repository - [GitHub](https://github.com/GT4SD/gt4sd-core) (👨‍💻 20 · 🔀 74 · 📋 120 - 11% open · ⏱️ 19.02.2025):
git clone https://github.com/GT4SD/gt4sd-core
Awesome Materials Informatics (🥈11 · ⭐ 440) - Curated list of known efforts in materials informatics, i.e. in modern materials science. Custom - [GitHub](https://github.com/tilde-lab/awesome-materials-informatics) (👨‍💻 21 · 🔀 93 · ⏱️ 13.05.2025):
git clone https://github.com/tilde-lab/awesome-materials-informatics
Neural-Network-Models-for-Chemistry (🥈11 · ⭐ 130) - A collection of Nerual Network Models for chemistry. MIT rep-learn - [GitHub](https://github.com/Eipgen/Neural-Network-Models-for-Chemistry) (👨‍💻 3 · 🔀 18 · 📋 2 - 50% open · ⏱️ 15.05.2025):
git clone https://github.com/Eipgen/Neural-Network-Models-for-Chemistry
GNoME Explorer (🥈10 · ⭐ 990) - Graph Networks for Materials Exploration Database. Apache-2 datasets materials-discovery - [GitHub](https://github.com/google-deepmind/materials_discovery) (👨‍💻 2 · 🔀 160 · 📋 25 - 84% open · ⏱️ 03.03.2025):
git clone https://github.com/google-deepmind/materials_discovery
DeepModeling Projects (🥈10 · ⭐ 7) - DeepModeling projects. CC-BY-4.0 - [GitHub](https://github.com/deepmodeling/deepmodeling-projects) (👨‍💻 4 · 🔀 2 · ⏱️ 02.05.2025):
git clone https://github.com/deepmodeling/deepmodeling-projects
Awesome-Scientific-Language-Models (🥉9 · ⭐ 580) - A Comprehensive Survey of Scientific Large Language Models and Their Applications in Scientific Discovery (EMNLP24). MIT language-models general-ml pretrained multimodal - [GitHub](https://github.com/yuzhimanhua/Awesome-Scientific-Language-Models) (👨‍💻 9 · 🔀 32 · ⏱️ 26.02.2025):
git clone https://github.com/yuzhimanhua/Awesome-Scientific-Language-Models
Awesome Materials & Chemistry Datasets (🥉9 · ⭐ 130 · 🐣) - A curated list of the most useful datasets in materials science and chemistry for training machine learning and AI.. MIT datasets experimental-data literature-data proprietary - [GitHub](https://github.com/blaiszik/awesome-matchem-datasets) (👨‍💻 5 · 🔀 15 · ⏱️ 21.05.2025):
git clone https://github.com/blaiszik/awesome-matchem-datasets
Awesome Neural Geometry (🥉8 · ⭐ 980) - A curated collection of resources and research related to the geometry of representations in the brain, deep networks,.. Unlicensed educational rep-learn - [GitHub](https://github.com/neurreps/awesome-neural-geometry) (👨‍💻 13 · 🔀 63 · ⏱️ 18.02.2025):
git clone https://github.com/neurreps/awesome-neural-geometry
Awesome-Graph-Generation (🥉8 · ⭐ 340) - A curated list of up-to-date graph generation papers and resources. Unlicensed rep-learn - [GitHub](https://github.com/yuanqidu/awesome-graph-generation) (👨‍💻 4 · 🔀 22 · ⏱️ 04.01.2025):
git clone https://github.com/yuanqidu/awesome-graph-generation
Awesome Neural SBI (🥉8 · ⭐ 120) - Community-sourced list of papers and resources on neural simulation-based inference. MIT active-learning - [GitHub](https://github.com/smsharma/awesome-neural-sbi) (👨‍💻 6 · 🔀 9 · 📋 2 - 50% open · ⏱️ 17.05.2025):
git clone https://github.com/smsharma/awesome-neural-sbi
Charting ML Publications in Science (🥉8 · ⭐ 42) - Literature analysis of ML applications in materials science, chemistry, physics. MIT literature-data general-ml - [GitHub](https://github.com/blaiszik/ml_publication_charts) (👨‍💻 2 · ⏱️ 22.03.2025):
git clone https://github.com/blaiszik/ml_publication_charts
optimade.science (🥉8 · ⭐ 8) - A sky-scanner Optimade browser-only GUI. MIT datasets - [GitHub](https://github.com/tilde-lab/optimade.science) (👨‍💻 8 · 🔀 2 · 📋 26 - 26% open · ⏱️ 17.05.2025):
git clone https://github.com/tilde-lab/optimade.science
The Collection of Database and Dataset Resources in Materials Science (🥉7 · ⭐ 330) - A list of databases, datasets and books/handbooks where you can find materials properties for machine learning.. Unlicensed datasets - [GitHub](https://github.com/sedaoturak/data-resources-for-materials-science) (👨‍💻 2 · 🔀 53 · ⏱️ 21.05.2025):
git clone https://github.com/sedaoturak/data-resources-for-materials-science
AI for Science paper collection (🥉7 · ⭐ 110 · 💤) - List the AI for Science papers accepted by top conferences. Apache-2 - [GitHub](https://github.com/sherrylixuecheng/AI_for_Science_paper_collection) (👨‍💻 5 · 🔀 12 · ⏱️ 14.09.2024):
git clone https://github.com/sherrylixuecheng/AI_for_Science_paper_collection
Awesome-Crystal-GNNs (🥉6 · ⭐ 94) - This repository contains a collection of resources and papers on GNN Models on Crystal Solid State Materials. MIT - [GitHub](https://github.com/kdmsit/Awesome-Crystal-GNNs) (👨‍💻 2 · 🔀 11 · ⏱️ 26.02.2025):
git clone https://github.com/kdmsit/Awesome-Crystal-GNNs
Show 9 hidden projects... - MatBench (🥈18 · ⭐ 160 · 💀) - Matbench: Benchmarks for materials science property prediction. MIT datasets benchmarking model-repository - MoLFormers UI (🥈10 · ⭐ 310 · 💀) - A family of foundation models trained on chemicals. Apache-2 transformer language-models pretrained drug-discovery - MADICES Awesome Interoperability (🥉8 · ⭐ 1) - Linked data interoperability resources of the Machine-actionable data interoperability for the chemical sciences.. MIT datasets - A Highly Opinionated List of Open-Source Materials Informatics Resources (🥉7 · ⭐ 130 · 💀) - A Highly Opinionated List of Open Source Materials Informatics Resources. MIT - Geometric-GNNs (🥉4 · ⭐ 100 · 💀) - List of Geometric GNNs for 3D atomic systems. Unlicensed datasets educational rep-learn - Does this material exist? (🥉4 · ⭐ 18 · 💀) - Vote on whether you think predicted crystal structures could be synthesised. MIT for-fun materials-discovery - LAM Crystal Philately competition 2024 (🥉4 · ⭐ 17) - OpenLAM Challenge crystal structure prediction https://arxiv.org/abs/2501.16358. LGPL-2.1 single-paper datasets structure-prediction materials-discovery ML-IAP UIP - GitHub topic materials-informatics (🥉1) - GitHub topic materials-informatics. Unlicensed - MateriApps (🥉1) - A Portal Site of Materials Science Simulation. Unlicensed


Datasets

Back to top

Datasets, databases and trained models for atomistic ML.

🔗 Alexandria Materials Database - A database of millions of theoretical crystal structures (3D, 2D and 1D) discovered by machine learning accelerated..

🔗 Catalysis Hub - A web-platform for sharing data and software for computational catalysis research!.

🔗 Citrination Datasets - AI-Powered Materials Data Platform. Open Citrination has been decommissioned.

🔗 crystals.ai - Curated datasets for reproducible AI in materials science.

🔗 DeepChem Models - DeepChem models on HuggingFace. model-repository pretrained language-models

🔗 Graphs of Materials Project 20190401 - The dataset used to train the MEGNet interatomic potential. ML-IAP

🔗 HME21 Dataset - High-temperature multi-element 2021 dataset for the PreFerred Potential (PFP).. UIP

🔗 JARVIS-Leaderboard ( ⭐ 68) - A large scale benchmark of materials design methods: https://www.nature.com/articles/s41524-024-01259-w. model-repository benchmarking community-resource educational

🔗 Materials Project - Charge Densities - Materials Project has started offering charge density information available for download via their public API.

🔗 Materials Project Trajectory (MPtrj) Dataset - The dataset used to train the CHGNet universal potential. UIP

🔗 matterverse.ai - Database of yet-to-be-sythesized materials predicted using state-of-the-art machine learning algorithms.

🔗 MPF.2021.2.8 - The dataset used to train the M3GNet universal potential. UIP

🔗 NRELMatDB - Computational materials database with the specific focus on materials for renewable energy applications including, but..

🔗 QM9 Charge Densities and Energies - QM9 molecules calculated with VASP using Atomic Simulation Environment. ML-DFT

🔗 QM40 Dataset - A More Realistic QM Dataset for Machine Learning in Molecular Science https://doi.org/10.1038/s41597-024-04206-y. drug-discovery

🔗 QMugs dataset - Quantum Mechanical Properties of Drug-like Molecules https://doi.org/10.1038/s41597-022-01390-7. drug-discovery

🔗 Quantum-Machine.org Datasets - Collection of datasets, including QM7, QM9, etc. MD, DFT. Small organic molecules, mostly.

🔗 sGDML Datasets - MD17, MD22, DFT datasets.

🔗 MoleculeNet - A Benchmark for Molecular Machine Learning. benchmarking

🔗 ZINC15 - A free database of commercially-available compounds for virtual screening. ZINC contains over 230 million purchasable.. graph biomolecules

🔗 ZINC20 - A free database of commercially-available compounds for virtual screening. ZINC contains over 230 million purchasable.. graph biomolecules

FAIR Chemistry datasets (🥇28 · ⭐ 1.4K · 📈) - Datasets OC20, OC22, etc. Formerly known as Open Catalyst Project. MIT catalysis - [GitHub](https://github.com/facebookresearch/fairchem) (👨‍💻 50 · 🔀 320 · 📋 340 - 4% open · ⏱️ 21.05.2025):
git clone https://github.com/FAIR-Chem/fairchem
- [PyPi](https://pypi.org/project/fairchem-core) (📥 5.9K / month · 📦 10 · ⏱️ 21.05.2025):
pip install fairchem-core
Meta Open Materials 2024 (OMat24) Dataset (🥇27 · ⭐ 1.4K · 📈) - Contains over 100 million Density Functional Theory calculations focused on structural and compositional diversity. CC-BY-4.0 - [GitHub](https://github.com/facebookresearch/fairchem) (👨‍💻 50 · 🔀 320 · 📋 340 - 4% open · ⏱️ 21.05.2025):
git clone https://github.com/FAIR-Chem/fairchem
- [PyPi](https://pypi.org/project/fairchem-core) (📥 5.9K / month · 📦 10 · ⏱️ 21.05.2025):
pip install fairchem-core
OPTIMADE Python tools (🥇26 · ⭐ 77) - Tools for implementing and consuming OPTIMADE APIs in Python. MIT - [GitHub](https://github.com/Materials-Consortia/optimade-python-tools) (👨‍💻 31 · 🔀 46 · 📦 64 · 📋 470 - 21% open · ⏱️ 15.05.2025):
git clone https://github.com/Materials-Consortia/optimade-python-tools
- [PyPi](https://pypi.org/project/optimade) (📥 12K / month · 📦 4 · ⏱️ 21.03.2025):
pip install optimade
- [Conda](https://anaconda.org/conda-forge/optimade) (📥 130K · ⏱️ 22.04.2025):
conda install -c conda-forge optimade
MPContribs (🥇25 · ⭐ 38) - Platform for materials scientists to contribute and disseminate their materials data through Materials Project. MIT - [GitHub](https://github.com/materialsproject/MPContribs) (👨‍💻 27 · 🔀 24 · 📦 51 · 📋 110 - 26% open · ⏱️ 19.05.2025):
git clone https://github.com/materialsproject/MPContribs
- [PyPi](https://pypi.org/project/mpcontribs-client) (📥 3.7K / month · 📦 3 · ⏱️ 28.02.2025):
pip install mpcontribs-client
Open Databases Integration for Materials Design (OPTIMADE) (🥈17 · ⭐ 88) - Specification of a common REST API for access to materials databases. CC-BY-4.0 - [GitHub](https://github.com/Materials-Consortia/OPTIMADE) (👨‍💻 21 · 🔀 37 · 📋 250 - 31% open · ⏱️ 24.04.2025):
git clone https://github.com/Materials-Consortia/OPTIMADE
load-atoms (🥈17 · ⭐ 44) - download and manipulate atomistic datasets. MIT data-structures - [GitHub](https://github.com/jla-gardner/load-atoms) (👨‍💻 4 · 🔀 4 · 📦 8 · 📋 32 - 6% open · ⏱️ 16.12.2024):
git clone https://github.com/jla-gardner/load-atoms
- [PyPi](https://pypi.org/project/load-atoms) (📥 2.8K / month · 📦 2 · ⏱️ 13.12.2024):
pip install load-atoms
OpenQDC (🥈15 · ⭐ 48) - Repository of Quantum Datasets Publicly Available. CC-BY-4.0 - [GitHub](https://github.com/valence-labs/OpenQDC) (👨‍💻 10 · 🔀 3 · 📦 4 · 📋 48 - 18% open · ⏱️ 24.01.2025):
git clone https://github.com/valence-labs/openQDC
- [PyPi](https://pypi.org/project/openqdc) (📥 85 / month · ⏱️ 09.08.2024):
pip install openqdc
- [Conda](https://anaconda.org/conda-forge/openqdc) (📥 1K · ⏱️ 22.04.2025):
conda install -c conda-forge openqdc
MatPES (🥈15 · ⭐ 35 · 🐣) - A foundational potential energy dataset for materials. BSD-3 UIP ML-IAP - [GitHub](https://github.com/materialsvirtuallab/matpes) (👨‍💻 3 · 🔀 4 · ⏱️ 15.05.2025):
git clone https://github.com/materialsvirtuallab/matpes
- [PyPi](https://pypi.org/project/matpes) (📥 120 / month · ⏱️ 10.03.2025):
pip install matpes
QH9 (🥈14 · ⭐ 630) - A Quantum Hamiltonian Prediction Benchmark. CC-BY-NC-SA-4.0 ML-DFT - [GitHub](https://github.com/divelab/AIRS) (👨‍💻 31 · 🔀 72 · 📋 25 - 4% open · ⏱️ 01.05.2025):
git clone https://github.com/divelab/AIRS
OpenKIM (🥈13 · ⭐ 32) - The Open Knowledgebase of Interatomic Models (OpenKIM) aims to be an online resource for standardized testing, long-.. LGPL-2.1 model-repository knowledge-base pretrained - [GitHub](https://github.com/openkim/kim-api) (👨‍💻 27 · 🔀 20 · 📋 37 - 40% open · ⏱️ 29.04.2025):
git clone https://github.com/openkim/kim-api
nablaDFT (🥈12 · ⭐ 210) - nablaDFT: Large-Scale Conformational Energy and Hamiltonian Prediction benchmark and dataset. MIT ML-DFT ML-WFT drug-discovery ML-IAP benchmarking - [GitHub](https://github.com/AIRI-Institute/nablaDFT) (👨‍💻 9 · 🔀 23 · 📋 24 - 25% open · ⏱️ 11.02.2025):
git clone https://github.com/AIRI-Institute/nablaDFT
SPICE (🥈11 · ⭐ 170) - A collection of QM data for training potential functions. MIT ML-IAP MD - [GitHub](https://github.com/openmm/spice-dataset) (👨‍💻 1 · 🔀 9 · 📥 280 · 📋 72 - 26% open · ⏱️ 18.02.2025):
git clone https://github.com/openmm/spice-dataset
MPDS API (🥈11 · ⭐ 27) - Tutorials, notebooks, issue tracker, and website on the MPDS API: the data retrieval interface for the Materials.. CC-BY-4.0 phase-transition - [GitHub](https://github.com/mpds-io/mpds-api) (👨‍💻 5 · 🔀 5 · 📋 26 - 34% open · ⏱️ 16.01.2025):
git clone https://github.com/mpds-io/mpds-api
- [PyPi](https://pypi.org/project/mpds_client) (📥 200 / month · ⏱️ 14.09.2020):
pip install mpds_client
OBELiX (🥉10 · ⭐ 19 · 🐣) - A Curated Dataset of Crystal Structures and Experimentally Measured Ionic Conductivities for Lithium Solid-State.. CC-BY-4.0 experimental-data transport-phenomena - [GitHub](https://github.com/NRC-Mila/OBELiX) (👨‍💻 5 · 🔀 4 · ⏱️ 16.05.2025):
git clone https://github.com/NRC-Mila/OBELiX
- [PyPi](https://pypi.org/project/obelix-data) (📥 170 / month · ⏱️ 16.05.2025):
pip install obelix-data
AIS Square (🥉9 · ⭐ 13) - A collaborative and open-source platform for sharing AI for Science datasets, models, and workflows. Home of the.. LGPL-3.0 community-resource model-repository - [GitHub](https://github.com/deepmodeling/AIS-Square) (👨‍💻 8 · 🔀 8 · 📋 6 - 83% open · ⏱️ 19.05.2025):
git clone https://github.com/deepmodeling/AIS-Square
polyVERSE (🥉8 · ⭐ 19) - polyVERSE is a comprehensive repository of informatics-ready datasets curated by the Ramprasad Group. Custom soft-matter - [GitHub](https://github.com/Ramprasad-Group/polyVERSE) (👨‍💻 7 · 🔀 3 · ⏱️ 14.05.2025):
git clone https://github.com/Ramprasad-Group/polyVERSE
GDB-9-Ex9 and ORNL_AISD-Ex (🥉7 · ⭐ 8) - Distributed computing workflow for generation and analysis of large scale molecular datasets obtained running multi-.. Unlicensed - [GitHub](https://github.com/ORNL/Analysis-of-Large-Scale-Molecular-Datasets-with-Python) (👨‍💻 7 · 🔀 6 · ⏱️ 12.03.2025):
git clone https://github.com/ORNL/Analysis-of-Large-Scale-Molecular-Datasets-with-Python
Show 17 hidden projects... - ATOM3D (🥈19 · ⭐ 310 · 💀) - ATOM3D: tasks on molecules in three dimensions. MIT biomolecules benchmarking - MoleculeNet Leaderboard (🥉9 · ⭐ 98 · 💀) - MIT benchmarking - Materials Data Facility (MDF) (🥉9 · ⭐ 10 · 💀) - A simple way to publish, discover, and access materials datasets. Publication of very large datasets supported (e.g.,.. Apache-2 - 2DMD dataset (🥉9 · ⭐ 7 · 💀) - Code for Kazeev, N., Al-Maeeni, A.R., Romanov, I. et al. Sparse representation for machine learning the properties of.. Apache-2 material-defect - ANI-1 Dataset (🥉8 · ⭐ 97 · 💀) - A data set of 20 million calculated off-equilibrium conformations for organic molecules. MIT - GEOM (🥉7 · ⭐ 220 · 💀) - GEOM: Energy-annotated molecular conformations. Unlicensed drug-discovery - ANI-1x Datasets (🥉6 · ⭐ 63 · 💀) - The ANI-1ccx and ANI-1x data sets, coupled-cluster and density functional theory properties for organic molecules. MIT - COMP6 Benchmark dataset (🥉6 · ⭐ 39 · 💀) - COMP6 Benchmark dataset for ML potentials. MIT - SciGlass (🥉6 · ⭐ 14 · 💀) - The database contains a vast set of data on the properties of glass materials. MIT - The Perovskite Database Project (🥉5 · ⭐ 65 · 💀) - Perovskite Database Project aims at making all perovskite device data, both past and future, available in a form.. Unlicensed community-resource - 3DSC Database (🥉4 · ⭐ 20) - Repo for the paper publishing the superconductor database with 3D crystal structures. Custom superconductors materials-discovery - paper-data-redundancy (🥉4 · ⭐ 11 · 💤) - Repo for the paper Exploiting redundancy in large materials datasets for efficient machine learning with less data. BSD-3 small-data single-paper - Visual Graph Datasets (🥉4 · ⭐ 3) - Datasets for the training of graph neural networks (GNNs) and subsequent visualization of attributional explanations.. MIT XAI rep-learn - OPTIMADE providers dashboard (🥉4 · ⭐ 2) - A dashboard of known providers. Unlicensed - linear-regression-benchmarks (🥉4 · ⭐ 1 · 💀) - Data sets used for linear regression benchmarks. MIT benchmarking single-paper - nep-data (🥉2 · ⭐ 17 · 💀) - Data related to the NEP machine-learned potential of GPUMD. Unlicensed ML-IAP MD transport-phenomena - tmQM_wB97MV Dataset (🥉1 · ⭐ 7 · 💀) - Code for Applying Large Graph Neural Networks to Predict Transition Metal Complex Energies Using the tmQM_wB97MV.. Unlicensed catalysis rep-learn


Data Structures

Back to top

Projects that focus on providing data structures used in atomistic machine learning.

dpdata (🥇23 · ⭐ 210) - A Python package for manipulating atomistic data of software in computational science. LGPL-3.0 - [GitHub](https://github.com/deepmodeling/dpdata) (👨‍💻 63 · 🔀 140 · 📦 140 · 📋 120 - 28% open · ⏱️ 20.03.2025):
git clone https://github.com/deepmodeling/dpdata
- [PyPi](https://pypi.org/project/dpdata) (📥 10K / month · 📦 40 · ⏱️ 20.03.2025):
pip install dpdata
- [Conda](https://anaconda.org/deepmodeling/dpdata) (📥 290 · ⏱️ 25.03.2025):
conda install -c deepmodeling dpdata
Metatensor (🥇23 · ⭐ 73) - Self-describing sparse tensor data format for atomistic machine learning and beyond. BSD-3 ML-IAP MD Rust C-lang C++ Python - [GitHub](https://github.com/metatensor/metatensor) (👨‍💻 30 · 🔀 22 · 📥 45K · 📦 14 · 📋 250 - 27% open · ⏱️ 21.05.2025):
git clone https://github.com/metatensor/metatensor
- [PyPi](https://pypi.org/project/metatensor) (📥 1.1K / month · ⏱️ 26.01.2024):
pip install metatensor
mp-pyrho (🥉17 · ⭐ 40 · 💤) - Tools for re-griding volumetric quantum chemistry data for machine-learning purposes. Custom ML-DFT - [GitHub](https://github.com/materialsproject/pyrho) (👨‍💻 10 · 🔀 9 · 📦 32 · 📋 5 - 40% open · ⏱️ 22.10.2024):
git clone https://github.com/materialsproject/pyrho
- [PyPi](https://pypi.org/project/mp-pyrho) (📥 8.7K / month · 📦 5 · ⏱️ 22.10.2024):
pip install mp-pyrho
dlpack (🥉16 · ⭐ 990) - common in-memory tensor structure. Apache-2 C++ - [GitHub](https://github.com/dmlc/dlpack) (👨‍💻 32 · 🔀 140 · 📋 76 - 36% open · ⏱️ 12.05.2025):
git clone https://github.com/dmlc/dlpack


Density functional theory (ML-DFT)

Back to top

Projects and models that focus on quantities of DFT, such as density functional approximations (ML-DFA), the charge density, density of states, the Hamiltonian, etc.

🔗 IKS-PIML - Code and generated data for the paper Inverting the Kohn-Sham equations with physics-informed machine learning.. neural-operator pinn datasets single-paper

🔗 M-OFDFT - Overcoming the Barrier of Orbital-Free Density Functional Theory in Molecular Systems Using Deep Learning.. transformer single-paper

JAX-DFT (🥇25 · ⭐ 36K) - This library provides basic building blocks that can construct DFT calculations as a differentiable program. Apache-2 - [GitHub](https://github.com/google-research/google-research) (👨‍💻 830 · 🔀 8K · 📋 1.8K - 81% open · ⏱️ 13.05.2025):
git clone https://github.com/google-research/google-research
MALA (🥇20 · ⭐ 90) - Materials Learning Algorithms. A framework for machine learning materials properties from first-principles data. BSD-3 - [GitHub](https://github.com/mala-project/mala) (👨‍💻 47 · 🔀 26 · 📦 2 · 📋 300 - 10% open · ⏱️ 16.05.2025):
git clone https://github.com/mala-project/mala
QHNet (🥇14 · ⭐ 630) - Artificial Intelligence Research for Science (AIRS). GPL-3.0 rep-learn - [GitHub](https://github.com/divelab/AIRS) (👨‍💻 31 · 🔀 72 · 📋 25 - 4% open · ⏱️ 01.05.2025):
git clone https://github.com/divelab/AIRS
SALTED (🥈13 · ⭐ 36) - Symmetry-Adapted Learning of Three-dimensional Electron Densities (and their electrostatic response). GPL-3.0 - [GitHub](https://github.com/andreagrisafi/SALTED) (👨‍💻 25 · 🔀 5 · 📋 7 - 14% open · ⏱️ 21.05.2025):
git clone https://github.com/andreagrisafi/SALTED
DeepH-pack (🥈12 · ⭐ 280 · 💤) - Deep neural networks for density functional theory Hamiltonian. LGPL-3.0 Julia - [GitHub](https://github.com/mzjb/DeepH-pack) (👨‍💻 8 · 🔀 48 · 📋 63 - 34% open · ⏱️ 07.10.2024):
git clone https://github.com/mzjb/DeepH-pack
CiderPress (🥈11 · ⭐ 12) - A high-performance software package for training and evaluating machine-learned XC functionals using the CIDER.. GPL-3.0 ml-functional C-lang - [GitHub](https://github.com/mir-group/CiderPress) (👨‍💻 2 · 🔀 2 · ⏱️ 09.04.2025):
git clone https://github.com/mir-group/CiderPress
- [PyPi](https://pypi.org/project/ciderpress) (📥 55 / month · ⏱️ 13.03.2025):
pip install ciderpress
DeePKS-kit (🥈9 · ⭐ 110) - a package for developing machine learning-based chemically accurate energy and density functional models. LGPL-3.0 ml-functional - [GitHub](https://github.com/deepmodeling/deepks-kit) (👨‍💻 7 · 🔀 36 · 📋 29 - 41% open · ⏱️ 28.04.2025):
git clone https://github.com/deepmodeling/deepks-kit
Q-stack (🥈9 · ⭐ 18) - Stack of codes for dedicated pre- and post-processing tasks for Quantum Machine Learning (QML). MIT excited-states general-tool - [GitHub](https://github.com/lcmd-epfl/Q-stack) (👨‍💻 7 · 🔀 5 · 📋 34 - 29% open · ⏱️ 17.04.2025):
git clone https://github.com/lcmd-epfl/Q-stack
HamGNN (🥈8 · ⭐ 100) - An E(3) equivariant Graph Neural Network for predicting electronic Hamiltonian matrix. GPL-3.0 rep-learn magnetism C-lang - [GitHub](https://github.com/QuantumLab-ZY/HamGNN) (👨‍💻 2 · 🔀 20 · 📋 46 - 84% open · ⏱️ 14.05.2025):
git clone https://github.com/QuantumLab-ZY/HamGNN
dftio (🥈8 · ⭐ 9) - dftio is to assist machine learning communities to transcript DFT output into a format that is easy to read or used by.. LGPL-3.0 data-structures workflows - [GitHub](https://github.com/deepmodeling/dftio) (👨‍💻 4 · 🔀 4 · 📋 3 - 33% open · ⏱️ 22.04.2025):
git clone https://github.com/deepmodeling/dftio
ChargE3Net (🥉7 · ⭐ 57) - Higher-order equivariant neural networks for charge density prediction in materials. MIT rep-learn - [GitHub](https://github.com/AIforGreatGood/charge3net) (👨‍💻 3 · 🔀 15 · 📋 9 - 44% open · ⏱️ 21.02.2025):
git clone https://github.com/AIforGreatGood/charge3net
scdp (scalable charge density prediction) (🥉7 · ⭐ 33) - [NeurIPS 2024] source code for A Recipe for Charge Density Prediction. MIT rep-learn single-paper - [GitHub](https://github.com/kyonofx/scdp) (🔀 10 · 📋 5 - 20% open · ⏱️ 17.12.2024):
git clone https://github.com/kyonofx/scdp
Show 25 hidden projects... - DM21 (🥇20 · ⭐ 14K · 💀) - This package provides a PySCF interface to the DM21 (DeepMind 21) family of exchange-correlation functionals described.. Apache-2 - Grad DFT (🥈10 · ⭐ 100 · 💀) - GradDFT is a JAX-based library enabling the differentiable design and experimentation of exchange-correlation.. Apache-2 - NeuralXC (🥈10 · ⭐ 36 · 💀) - Implementation of a machine learned density functional. BSD-3 - PROPhet (🥈9 · ⭐ 65 · 💀) - PROPhet is a code to integrate machine learning techniques with first-principles quantum chemistry approaches. GPL-3.0 ML-IAP MD single-paper C++ - ACEhamiltonians (🥈9 · ⭐ 16 · 💀) - Provides tools for constructing, fitting, and predicting self-consistent Hamiltonian and overlap matrices in solid-.. MIT Julia - DeepH-E3 (🥉7 · ⭐ 93 · 💀) - General framework for E(3)-equivariant neural network representation of density functional theory Hamiltonian. MIT magnetism - Mat2Spec (🥉7 · ⭐ 28 · 💀) - Density of States Prediction for Materials Discovery via Contrastive Learning from Probabilistic Embeddings. MIT spectroscopy - Libnxc (🥉7 · ⭐ 19 · 💀) - A library for using machine-learned exchange-correlation functionals for density-functional theory. MPL-2.0 C++ Fortran - DeepDFT (🥉6 · ⭐ 76 · 💀) - Official implementation of DeepDFT model. MIT - charge-density-models (🥉6 · ⭐ 13 · 💀) - Tools to build charge density models using [fairchem](https://github.com/FAIR-Chem/fairchem). MIT rep-learn - KSR-DFT (🥉6 · ⭐ 4 · 💀) - Kohn-Sham regularizer for machine-learned DFT functionals. Apache-2 - xDeepH (🥉5 · ⭐ 37 · 💀) - Extended DeepH (xDeepH) method for magnetic materials. LGPL-3.0 magnetism Julia - ML-DFT (🥉5 · ⭐ 26 · 💀) - A package for density functional approximation using machine learning. MIT - InfGCN for Electron Density Estimation (🥉5 · ⭐ 15 · 💀) - Official implementation of the NeurIPS 23 spotlight paper of InfGCN. MIT rep-learn neural-operator - rho_learn (🥉5 · ⭐ 4 · 💀) - A proof-of-concept workflow for torch-based electron density learning. MIT ML-DFT rep-eng - rholearn (🥉5 · ⭐ 3) - Learning and predicting electronic densities decomposed on a basis and global electronic densities of states at DFT.. MIT ML-DFT rep-eng density-of-states - DeepCDP (🥉4 · ⭐ 6 · 💀) - DeepCDP: Deep learning Charge Density Prediction. Unlicensed - MALADA (🥉4 · ⭐ 1) - MALA Data Acquisition: Helpful tools to build data for MALA. BSD-3 - gprep (🥉4 · 💀) - Fitting DFTB repulsive potentials with GPR. MIT single-paper - APET (🥉3 · ⭐ 5 · 💀) - Atomic Positional Embedding-based Transformer. GPL-3.0 density-of-states transformer - CSNN (🥉3 · ⭐ 2 · 💀) - Primary codebase of CSNN - Concentric Spherical Neural Network for 3D Representation Learning. BSD-3 - ofdft_nflows (🥉2 · ⭐ 10 · 💤) - Nomalizing flows for orbita-free DFT. Unlicensed generative - A3MD (🥉2 · ⭐ 8 · 💀) - MPNN-like + Analytic Density Model = Accurate electron densities. Unlicensed rep-learn single-paper - MLDensity (🥉1 · ⭐ 4 · 💀) - Linear Jacobi-Legendre expansion of the charge density for machine learning-accelerated electronic structure.. Unlicensed - kdft (🥉1 · ⭐ 2 · 💀) - The Kernel Density Functional (KDF) code allows generating ML based DFT functionals. Unlicensed


Educational Resources

Back to top

Tutorials, guides, cookbooks, recipes, etc.

🔗 AI for Science 101 community-resource rep-learn

🔗 AL4MS 2023 workshop tutorials active-learning

🔗 Quantum Chemistry in the Age of Machine Learning - Book, 2022.

AI4Chemistry course (🥇13 · ⭐ 180) - EPFL AI for chemistry course, Spring 2023. https://schwallergroup.github.io/ai4chem_course. MIT chemistry - [GitHub](https://github.com/schwallergroup/ai4chem_course) (👨‍💻 7 · 🔀 43 · 📋 4 - 25% open · ⏱️ 30.04.2025):
git clone https://github.com/schwallergroup/ai4chem_course
jarvis-tools-notebooks (🥇11 · ⭐ 82) - A Google-Colab Notebook Collection for Materials Design: https://jarvis.nist.gov/. NIST - [GitHub](https://github.com/JARVIS-Materials-Design/jarvis-tools-notebooks) (👨‍💻 6 · 🔀 33 · ⏱️ 05.05.2025):
git clone https://github.com/JARVIS-Materials-Design/jarvis-tools-notebooks
iam-notebooks (🥈10 · ⭐ 28) - Jupyter notebooks for the lectures of the Introduction to Atomistic Modeling. Apache-2 - [GitHub](https://github.com/ceriottm/iam-notebooks) (👨‍💻 6 · 🔀 5 · ⏱️ 07.01.2025):
git clone https://github.com/ceriottm/iam-notebooks
DSECOP (🥈9 · ⭐ 49) - This repository contains data science educational materials developed by DSECOP Fellows. CCO-1.0 - [GitHub](https://github.com/GDS-Education-Community-of-Practice/DSECOP) (👨‍💻 14 · 🔀 26 · 📋 8 - 12% open · ⏱️ 29.04.2025):
git clone https://github.com/GDS-Education-Community-of-Practice/DSECOP
COSMO Software Cookbook (🥈9 · ⭐ 23) - A collection of simulation recipes for the atomic-scale modeling of materials and molecules. BSD-3 - [GitHub](https://github.com/lab-cosmo/atomistic-cookbook) (👨‍💻 14 · 🔀 2 · 📋 19 - 21% open · ⏱️ 13.05.2025):
git clone https://github.com/lab-cosmo/software-cookbook
DeepModeling Tutorials (🥉7 · ⭐ 15) - Tutorials for DeepModeling projects. Unlicensed - [GitHub](https://github.com/deepmodeling/tutorials) (👨‍💻 11 · 🔀 23 · ⏱️ 03.04.2025):
git clone https://github.com/deepmodeling/tutorials
MACE-tutorials (🥉6 · ⭐ 46 · 💤) - Another set of tutorials for the MACE interatomic potential by one of the authors. MIT ML-IAP rep-learn MD - [GitHub](https://github.com/ilyes319/mace-tutorials) (👨‍💻 2 · 🔀 12 · ⏱️ 16.07.2024):
git clone https://github.com/ilyes319/mace-tutorials
DSM-CORE (🥉6 · ⭐ 15) - Data Science for Materials - Collection of Open Educational Resources. Unlicensed - [GitHub](https://github.com/MatSciEdu/DSM-CORE) (👨‍💻 5 · 🔀 7 · 📋 2 - 50% open · ⏱️ 03.04.2025):
git clone https://github.com/MatSciEdu/DSM-CORE
MLforMaterials (🥉5 · ⭐ 83) - Online resource for a practical course in machine learning for materials research at Imperial College London.. MIT community-resource general-ml rep-eng materials-discovery - [GitHub](https://github.com/aronwalsh/MLforMaterials) (👨‍💻 2 · 🔀 12 · ⏱️ 17.02.2025):
git clone https://github.com/aronwalsh/MLforMaterials
Show 20 hidden projects... - DeepLearningLifeSciences (🥇12 · ⭐ 370 · 💀) - Example code from the book Deep Learning for the Life Sciences. MIT - Deep Learning for Molecules and Materials Book (🥇11 · ⭐ 650 · 💀) - Deep learning for molecules and materials book. Custom - Geometric GNN Dojo (🥇11 · ⭐ 500 · 💀) - New to geometric GNNs: try our practical notebook, prepared for MPhil students at the University of Cambridge. MIT rep-learn - Introduction to AI-driven Science on Supercomputers: A Student Training Series (🥇11 · ⭐ 220) - Unlicensed general-ml rep-learn language-models - OPTIMADE Tutorial Exercises (🥈9 · ⭐ 15 · 💀) - Tutorial exercises for the OPTIMADE API. MIT datasets - RDKit Tutorials (🥈8 · ⭐ 280 · 💀) - Tutorials to learn how to work with the RDKit. Custom - BestPractices (🥈8 · ⭐ 190 · 💀) - Things that you should (and should not) do in your Materials Informatics research. MIT - MAChINE (🥉7 · ⭐ 1 · 💀) - Client-Server Web App to introduce usage of ML in materials science to beginners. MIT - Applied AI for Materials (🥉6 · ⭐ 64 · 💀) - Course materials for Applied AI for Materials Science and Engineering. Unlicensed - ML for catalysis tutorials (🥉6 · ⭐ 9 · 💀) - A jupyter book repo for tutorial on how to use OCP ML models for catalysis. MIT - Data Handling, DoE and Statistical Analysis for Material Chemists (🥉6 · ⭐ 4 · 💀) - Notebooks for workshops of DoE course, hosted by the Computational Materials Chemistry group at Uppsala University. GPL-3.0 - AI4Science101 (🥉5 · ⭐ 90 · 💀) - AI for Science. Unlicensed - Machine Learning for Materials Hard and Soft (🥉5 · ⭐ 37 · 💀) - ESI-DCAFM-TACO-VDSP Summer School on Machine Learning for Materials Hard and Soft. Unlicensed - ML-in-chemistry-101 (🥉4 · ⭐ 77 · 💀) - The course materials for Machine Learning in Chemistry 101. Unlicensed - chemrev-gpr (🥉4 · ⭐ 11 · 💀) - Notebooks accompanying the paper on GPR in materials and molecules in Chemical Reviews 2020. Unlicensed - AI4ChemMat Hands-On Series (🥉4 · ⭐ 1 · 💀) - Hands-On Series organized by Chemistry and Materials working group at Argonne Nat Lab. MPL-2.0 - PiNN Lab (🥉3 · ⭐ 3 · 💀) - Material for running a lab session on atomic neural networks. GPL-3.0 - MLDensity_tutorial (🥉2 · ⭐ 10 · 💀) - Tutorial files to work with ML for the charge density in molecules and solids. Unlicensed - LAMMPS-style pair potentials with GAP (🥉2 · ⭐ 4 · 💀) - A tutorial on how to create LAMMPS-style pair potentials and use them in combination with GAP potentials to run MD.. Unlicensed ML-IAP MD rep-eng - MALA Tutorial (🥉2 · ⭐ 2 · 💀) - A full MALA hands-on tutorial. Unlicensed


Explainable Artificial intelligence (XAI)

Back to top

Projects that focus on explainability and model interpretability in atomistic ML.

exmol (🥇22 · ⭐ 330) - Explainer for black box models that predict molecule properties. MIT - [GitHub](https://github.com/ur-whitelab/exmol) (👨‍💻 9 · 🔀 44 · 📦 25 · 📋 72 - 8% open · ⏱️ 08.05.2025):
git clone https://github.com/ur-whitelab/exmol
- [PyPi](https://pypi.org/project/exmol) (📥 4.2K / month · 📦 3 · ⏱️ 08.05.2025):
pip install exmol
Show 3 hidden projects... - MEGAN: Multi Explanation Graph Attention Student (🥈3 · ⭐ 10) - Minimal implementation of graph attention student model architecture. MIT rep-learn - Linear vs blackbox (🥈3 · ⭐ 2 · 💀) - Code and data related to the publication: Interpretable models for extrapolation in scientific machine learning. MIT XAI single-paper rep-eng - XElemNet (🥉2 · 💤) - Using explainable artificial intelligence (XAI) techniques to analyze ElemNet... Unlicensed rep-eng single-paper


Electronic structure methods (ML-ESM)

Back to top

Projects and models that focus on quantities of electronic structure methods, which do not fit into either of the categories ML-WFT or ML-DFT.

DeePTB (🥇17 · ⭐ 75) - DeePTB: A deep learning package for tight-binding Hamiltonian with ab initio accuracy. LGPL-3.0 ML-DFT - [GitHub](https://github.com/deepmodeling/DeePTB) (👨‍💻 11 · 🔀 19 · 📦 3 · 📋 49 - 38% open · ⏱️ 08.05.2025):
git clone https://github.com/deepmodeling/DeePTB
- [PyPi](https://pypi.org/project/dptb) (📥 410 / month · 📦 2 · ⏱️ 07.05.2025):
pip install dptb
Show 5 hidden projects... - QDF for molecule (🥈8 · ⭐ 220 · 💀) - Quantum deep field: data-driven wave function, electron density generation, and energy prediction and extrapolation.. MIT - QMLearn (🥈5 · ⭐ 12 · 💀) - Quantum Machine Learning by learning one-body reduced density matrices in the AO basis... MIT - q-pac (🥈5 · ⭐ 5 · 💀) - Kernel charge equilibration method. MIT electrostatics - halex (🥈5 · ⭐ 3 · 💀) - Hamiltonian Learning for Excited States https://doi.org/10.48550/arXiv.2311.00844. Unlicensed excited-states - e3psi (🥉3 · ⭐ 7 · 💀) - Equivariant machine learning library for learning from electronic structures. LGPL-3.0


General Tools

Back to top

General tools for atomistic machine learning.

RDKit (🥇38 · ⭐ 2.9K) - BSD-3 C++ cheminformatics - [GitHub](https://github.com/rdkit/rdkit) (👨‍💻 250 · 🔀 890 · 📦 3 · 📋 4K - 16% open · ⏱️ 22.05.2025):
git clone https://github.com/rdkit/rdkit
- [PyPi](https://pypi.org/project/rdkit) (📥 1.2M / month · 📦 1K · ⏱️ 13.05.2025):
pip install rdkit
- [Conda](https://anaconda.org/rdkit/rdkit) (📥 2.6M · ⏱️ 25.03.2025):
conda install -c rdkit rdkit
DeepChem (🥇34 · ⭐ 6K) - Democratizing Deep-Learning for Drug Discovery, Quantum Chemistry, Materials Science and Biology. MIT - [GitHub](https://github.com/deepchem/deepchem) (👨‍💻 260 · 🔀 1.9K · 📦 610 · 📋 2K - 38% open · ⏱️ 21.05.2025):
git clone https://github.com/deepchem/deepchem
- [PyPi](https://pypi.org/project/deepchem) (📥 39K / month · 📦 17 · ⏱️ 21.05.2025):
pip install deepchem
- [Conda](https://anaconda.org/conda-forge/deepchem) (📥 120K · ⏱️ 22.04.2025):
conda install -c conda-forge deepchem
- [Docker Hub](https://hub.docker.com/r/deepchemio/deepchem) (📥 8.6K · ⭐ 5 · ⏱️ 21.05.2025):
docker pull deepchemio/deepchem
Matminer (🥇27 · ⭐ 520 · 💤) - Data mining for materials science. Custom - [GitHub](https://github.com/hackingmaterials/matminer) (👨‍💻 56 · 🔀 200 · 📦 410 · 📋 230 - 13% open · ⏱️ 11.10.2024):
git clone https://github.com/hackingmaterials/matminer
- [PyPi](https://pypi.org/project/matminer) (📥 22K / month · 📦 60 · ⏱️ 06.10.2024):
pip install matminer
- [Conda](https://anaconda.org/conda-forge/matminer) (📥 89K · ⏱️ 22.04.2025):
conda install -c conda-forge matminer
MAML (🥈24 · ⭐ 410) - Python for Materials Machine Learning, Materials Descriptors, Machine Learning Force Fields, Deep Learning, etc. BSD-3 - [GitHub](https://github.com/materialsvirtuallab/maml) (👨‍💻 39 · 🔀 86 · 📦 16 · 📋 76 - 15% open · ⏱️ 05.05.2025):
git clone https://github.com/materialsvirtuallab/maml
- [PyPi](https://pypi.org/project/maml) (📥 550 / month · 📦 3 · ⏱️ 02.04.2025):
pip install maml
QUIP (🥈24 · ⭐ 370) - libAtoms/QUIP molecular dynamics framework: https://libatoms.github.io. GPL-2.0 MD ML-IAP rep-eng Fortran - [GitHub](https://github.com/libAtoms/QUIP) (👨‍💻 86 · 🔀 120 · 📥 750 · 📦 46 · 📋 480 - 23% open · ⏱️ 22.04.2025):
git clone https://github.com/libAtoms/QUIP
- [PyPi](https://pypi.org/project/quippy-ase) (📥 2.9K / month · 📦 4 · ⏱️ 15.01.2023):
pip install quippy-ase
- [Docker Hub](https://hub.docker.com/r/libatomsquip/quip) (📥 10K · ⭐ 4 · ⏱️ 24.04.2023):
docker pull libatomsquip/quip
JARVIS-Tools (🥈23 · ⭐ 340) - JARVIS-Tools: an open-source software package for data-driven atomistic materials design. Publications:.. Custom - [GitHub](https://github.com/usnistgov/jarvis) (👨‍💻 15 · 🔀 130 · 📦 120 · 📋 93 - 51% open · ⏱️ 20.11.2024):
git clone https://github.com/usnistgov/jarvis
- [PyPi](https://pypi.org/project/jarvis-tools) (📥 23K / month · 📦 31 · ⏱️ 20.11.2024):
pip install jarvis-tools
- [Conda](https://anaconda.org/conda-forge/jarvis-tools) (📥 100K · ⏱️ 22.04.2025):
conda install -c conda-forge jarvis-tools
Molfeat (🥈21 · ⭐ 210) - molfeat - the hub for all your molecular featurizers. Apache-2 cheminformatics rep-eng rep-learn generative language-models pretrained - [GitHub](https://github.com/datamol-io/molfeat) (👨‍💻 19 · 🔀 23 · 📦 69 · 📋 56 - 23% open · ⏱️ 27.11.2024):
git clone https://github.com/datamol-io/molfeat
- [PyPi](https://pypi.org/project/molfeat) (📥 3.3K / month · 📦 9 · ⏱️ 14.08.2024):
pip install molfeat
- [Conda](https://anaconda.org/conda-forge/molfeat) (📥 29K · ⏱️ 22.04.2025):
conda install -c conda-forge molfeat
AtomAI (🥈19 · ⭐ 210) - Deep and Machine Learning for Microscopy. MIT computer-vision USL experimental-data - [GitHub](https://github.com/pycroscopy/atomai) (👨‍💻 6 · 🔀 42 · 📦 10 · 📋 20 - 55% open · ⏱️ 21.05.2025):
git clone https://github.com/pycroscopy/atomai
- [PyPi](https://pypi.org/project/atomai) (📥 720 / month · 📦 1 · ⏱️ 30.09.2023):
pip install atomai
Scikit-Matter (🥈19 · ⭐ 82) - A collection of scikit-learn compatible utilities that implement methods born out of the materials science and.. BSD-3 scikit-learn - [GitHub](https://github.com/scikit-learn-contrib/scikit-matter) (👨‍💻 16 · 🔀 22 · 📦 11 · 📋 73 - 20% open · ⏱️ 05.05.2025):
git clone https://github.com/scikit-learn-contrib/scikit-matter
- [PyPi](https://pypi.org/project/skmatter) (📥 2.5K / month · ⏱️ 24.08.2023):
pip install skmatter
- [Conda](https://anaconda.org/conda-forge/skmatter) (📥 3.2K · ⏱️ 22.04.2025):
conda install -c conda-forge skmatter
MAST-ML (🥈18 · ⭐ 120) - MAterials Simulation Toolkit for Machine Learning (MAST-ML). MIT - [GitHub](https://github.com/uw-cmg/MAST-ML) (👨‍💻 19 · 🔀 61 · 📥 140 · 📦 47 · 📋 220 - 14% open · ⏱️ 15.04.2025):
git clone https://github.com/uw-cmg/MAST-ML
QML (🥉17 · ⭐ 200) - QML: Quantum Machine Learning. MIT - [GitHub](https://github.com/qmlcode/qml) (👨‍💻 10 · 🔀 84 · 📋 59 - 64% open · ⏱️ 08.12.2024):
git clone https://github.com/qmlcode/qml
- [PyPi](https://pypi.org/project/qml) (📥 290 / month · ⏱️ 13.08.2018):
pip install qml
MLatom (🥉15 · ⭐ 81) - AI-enhanced computational chemistry. MIT UIP ML-IAP MD ML-DFT ML-ESM transfer-learning active-learning spectroscopy structure-optimization - [GitHub](https://github.com/dralgroup/mlatom) (👨‍💻 5 · 🔀 13 · 📋 7 - 28% open · ⏱️ 21.05.2025):
git clone https://github.com/dralgroup/mlatom
- [PyPi](https://pypi.org/project/mlatom) (📥 2.1K / month · ⏱️ 21.05.2025):
pip install mlatom
Artificial Intelligence for Science (AIRS) (🥉14 · ⭐ 630) - Artificial Intelligence Research for Science (AIRS). GPL-3.0 license rep-learn generative ML-IAP MD ML-DFT ML-WFT biomolecules - [GitHub](https://github.com/divelab/AIRS) (👨‍💻 31 · 🔀 72 · 📋 25 - 4% open · ⏱️ 01.05.2025):
git clone https://github.com/divelab/AIRS
Show 11 hidden projects... - Automatminer (🥉17 · ⭐ 150 · 💀) - An automatic engine for predicting materials properties. Custom autoML - XenonPy (🥉15 · ⭐ 140 · 💀) - XenonPy is a Python Software for Materials Informatics. BSD-3 - AMPtorch (🥉11 · ⭐ 60 · 💀) - AMPtorch: Atomistic Machine Learning Package (AMP) - PyTorch. GPL-3.0 - OpenChem (🥉10 · ⭐ 710 · 💀) - OpenChem: Deep Learning toolkit for Computational Chemistry and Drug Design Research. MIT - JAXChem (🥉7 · ⭐ 79 · 💀) - JAXChem is a JAX-based deep learning library for complex and versatile chemical modeling. MIT - uncertainty_benchmarking (🥉7 · ⭐ 41 · 💀) - Various code/notebooks to benchmark different ways we could estimate uncertainty in ML predictions. Unlicensed benchmarking probabilistic - torchchem (🥉7 · ⭐ 35 · 💀) - An experimental repo for experimenting with PyTorch models. MIT - Equisolve (🥉6 · ⭐ 5 · 💀) - A ML toolkit package utilizing the metatensor data format to build models for the prediction of equivariant properties.. BSD-3 ML-IAP - quantum-structure-ml (🥉3 · ⭐ 3 · 💀) - Multi-class classification model for predicting the magnetic order of magnetic structures and a binary classification.. Unlicensed magnetism benchmarking - ACEatoms (🥉3 · ⭐ 2 · 💀) - Generic code for modelling atomic properties using ACE. Custom Julia - Magpie (🥉3) - Materials Agnostic Platform for Informatics and Exploration (Magpie). MIT Java


Generative Models

Back to top

Projects that implement generative models for atomistic ML.

GT4SD (🥇17 · ⭐ 350) - GT4SD, an open-source library to accelerate hypothesis generation in the scientific discovery process. MIT pretrained drug-discovery rep-learn - [GitHub](https://github.com/GT4SD/gt4sd-core) (👨‍💻 20 · 🔀 74 · 📋 120 - 11% open · ⏱️ 19.02.2025):
git clone https://github.com/GT4SD/gt4sd-core
- [PyPi](https://pypi.org/project/gt4sd) (📥 1.2K / month · ⏱️ 19.02.2025):
pip install gt4sd
synspace (🥇15 · ⭐ 41) - Synthesis generative model. MIT - [GitHub](https://github.com/whitead/synspace) (👨‍💻 2 · 🔀 4 · 📦 34 · 📋 4 - 50% open · ⏱️ 24.04.2025):
git clone https://github.com/whitead/synspace
- [PyPi](https://pypi.org/project/synspace) (📥 3.8K / month · 📦 4 · ⏱️ 24.04.2025):
pip install synspace
SLICES and MatterGPT (🥈14 · ⭐ 110) - SLICES: An Invertible, Invariant, and String-based Crystal Representation [2023, Nature Communications] MatterGPT,.. LGPL-2.1 rep-eng language-models transformer materials-discovery structure-prediction - [GitHub](https://github.com/xiaohang007/SLICES) (👨‍💻 1 · 🔀 42 · 📦 5 · 📋 17 - 23% open · ⏱️ 26.03.2025):
git clone https://github.com/xiaohang007/SLICES
- [PyPi](https://pypi.org/project/slices) (📥 290 / month · 📦 1 · ⏱️ 01.03.2025):
pip install slices
- [Docker Hub](https://hub.docker.com/r/xiaohang07/slices) (📥 570 · ⭐ 1 · ⏱️ 01.03.2025):
docker pull xiaohang07/slices
PMTransformer (🥈14 · ⭐ 100 · 💤) - Universal Transfer Learning in Porous Materials, including MOFs. MIT transfer-learning pretrained transformer - [GitHub](https://github.com/hspark1212/MOFTransformer) (👨‍💻 2 · 🔀 15 · 📦 8 · ⏱️ 20.06.2024):
git clone https://github.com/hspark1212/MOFTransformer
- [PyPi](https://pypi.org/project/moftransformer) (📥 390 / month · 📦 1 · ⏱️ 20.06.2024):
pip install moftransformer
SchNetPack G-SchNet (🥈11 · ⭐ 58) - G-SchNet extension for SchNetPack. MIT - [GitHub](https://github.com/atomistic-machine-learning/schnetpack-gschnet) (👨‍💻 3 · 🔀 10 · ⏱️ 07.11.2024):
git clone https://github.com/atomistic-machine-learning/schnetpack-gschnet
SiMGen (🥈11 · ⭐ 20) - Zero Shot Molecular Generation via Similarity Kernels. MIT viz - [GitHub](https://github.com/RokasEl/simgen) (👨‍💻 4 · 🔀 3 · 📦 2 · 📋 4 - 25% open · ⏱️ 27.04.2025):
git clone https://github.com/RokasEl/simgen
- [PyPi](https://pypi.org/project/simgen) (📥 52 / month · ⏱️ 13.12.2024):
pip install simgen
Show 11 hidden projects... - MoLeR (🥇15 · ⭐ 300 · 💀) - Implementation of MoLeR: a generative model of molecular graphs which supports scaffold-constrained generation. MIT - EDM (🥉9 · ⭐ 500 · 💀) - E(3) Equivariant Diffusion Model for Molecule Generation in 3D. MIT - G-SchNet (🥉8 · ⭐ 140 · 💀) - G-SchNet - a generative model for 3d molecular structures. MIT - bVAE-IM (🥉8 · ⭐ 12 · 💀) - Implementation of Chemical Design with GPU-based Ising Machine. MIT QML single-paper - molecular-vae (🥉7 · ⭐ 65 · 💀) - Pytorch implementation of the paper Automatic Chemical Design Using a Data-Driven Continuous Representation of.. MIT rep-learn cheminformatics single-paper - cG-SchNet (🥉7 · ⭐ 61 · 💀) - cG-SchNet - a conditional generative neural network for 3d molecular structures. MIT - rxngenerator (🥉6 · ⭐ 12 · 💀) - A generative model for molecular generation via multi-step chemical reactions. MIT - COATI (🥉5 · ⭐ 110 · 💀) - COATI: multi-modal contrastive pre-training for representing and traversing chemical space. Apache-2 drug-discovery multimodal pretrained rep-learn - MolSLEPA (🥉5 · ⭐ 5 · 💀) - Interpretable Fragment-based Molecule Design with Self-learning Entropic Population Annealing. MIT XAI - Mapping out phase diagrams with generative classifiers (🥉4 · ⭐ 8 · 💀) - Repository for our ``Mapping out phase diagrams with generative models paper. MIT phase-transition - descriptors-inversion (🥉4 · ⭐ 6 · 💀) - Local inversion of the chemical environment representations. MIT rep-eng single-paper


Interatomic Potentials (ML-IAP)

Back to top

Machine learning interatomic potentials (aka ML-IAP, MLIAP, MLIP, MLP) and force fields (ML-FF) for molecular dynamics.

fairchem (🥇28 · ⭐ 1.4K · 📈) - FAIR Chemistrys library of machine learning methods for chemistry. Formerly known as Open Catalyst Project. MIT pretrained UIP rep-learn catalysis - [GitHub](https://github.com/facebookresearch/fairchem) (👨‍💻 50 · 🔀 320 · 📋 340 - 4% open · ⏱️ 21.05.2025):
git clone https://github.com/FAIR-Chem/fairchem
- [PyPi](https://pypi.org/project/fairchem-core) (📥 5.9K / month · 📦 10 · ⏱️ 21.05.2025):
pip install fairchem-core
NequIP (🥇27 · ⭐ 730) - NequIP is a code for building E(3)-equivariant interatomic potentials. MIT - [GitHub](https://github.com/mir-group/nequip) (👨‍💻 18 · 🔀 160 · 📦 38 · 📋 100 - 7% open · ⏱️ 07.05.2025):
git clone https://github.com/mir-group/nequip
- [PyPi](https://pypi.org/project/nequip) (📥 5.6K / month · 📦 13 · ⏱️ 07.05.2025):
pip install nequip
- [Conda](https://anaconda.org/conda-forge/nequip) (📥 9.7K · ⏱️ 07.05.2025):
conda install -c conda-forge nequip
DeePMD-kit (🥇26 · ⭐ 1.7K) - A deep learning package for many-body potential energy representation and molecular dynamics. LGPL-3.0 MD workflows C++ - [GitHub](https://github.com/deepmodeling/deepmd-kit) (👨‍💻 75 · 🔀 540 · 📥 52K · 📦 33 · 📋 900 - 10% open · ⏱️ 02.03.2025):
git clone https://github.com/deepmodeling/deepmd-kit
- [PyPi](https://pypi.org/project/deepmd-kit) (📥 3.9K / month · 📦 9 · ⏱️ 30.03.2025):
pip install deepmd-kit
- [Conda](https://anaconda.org/deepmodeling/deepmd-kit) (📥 2.4K · ⏱️ 25.03.2025):
conda install -c deepmodeling deepmd-kit
- [Docker Hub](https://hub.docker.com/r/deepmodeling/deepmd-kit) (📥 3.6K · ⭐ 1 · ⏱️ 05.03.2025):
docker pull deepmodeling/deepmd-kit
MACE (🥇23 · ⭐ 720) - MACE - Fast and accurate machine learning interatomic potentials with higher order equivariant message passing. MIT - [GitHub](https://github.com/ACEsuit/mace) (👨‍💻 56 · 🔀 270 · 📋 430 - 22% open · ⏱️ 30.04.2025):
git clone https://github.com/ACEsuit/mace
TorchMD-NET (🥇22 · ⭐ 400) - Training neural network potentials. MIT MD rep-learn transformer pretrained - [GitHub](https://github.com/torchmd/torchmd-net) (👨‍💻 17 · 🔀 84 · 📥 110 · 📋 130 - 33% open · ⏱️ 05.05.2025):
git clone https://github.com/torchmd/torchmd-net
- [Conda](https://anaconda.org/conda-forge/torchmd-net) (📥 460K · ⏱️ 06.05.2025):
conda install -c conda-forge torchmd-net
MatCalc (🥇22 · ⭐ 88) - A python library for calculating materials properties from the PES. BSD-3 workflows benchmarking UIP pretrained model-repository - [GitHub](https://github.com/materialsvirtuallab/matcalc) (👨‍💻 17 · 🔀 21 · 📦 10 · 📋 11 - 9% open · ⏱️ 19.05.2025):
git clone https://github.com/materialsvirtuallab/matcalc
- [PyPi](https://pypi.org/project/matcalc) (📥 6K / month · 📦 6 · ⏱️ 10.05.2025):
pip install matcalc
KLIFF (🥈21 · ⭐ 36) - KIM-based Learning-Integrated Fitting Framework for interatomic potentials. LGPL-2.1 probabilistic workflows - [GitHub](https://github.com/openkim/kliff) (👨‍💻 14 · 🔀 22 · 📦 4 · 📋 57 - 42% open · ⏱️ 22.05.2025):
git clone https://github.com/openkim/kliff
- [PyPi](https://pypi.org/project/kliff) (⏱️ 11.04.2025):
pip install kliff
- [Conda](https://anaconda.org/conda-forge/kliff) (📥 160K · ⏱️ 22.04.2025):
conda install -c conda-forge kliff
janus-core (🥈21 · ⭐ 28) - Tools for machine learnt interatomic potentials. BSD-3 benchmarking workflows structure-optimization MD transport-phenomena - [GitHub](https://github.com/stfc/janus-core) (👨‍💻 8 · 🔀 12 · 📥 140 · 📦 10 · 📋 250 - 16% open · ⏱️ 20.05.2025):
git clone https://github.com/stfc/janus-core
- [PyPi](https://pypi.org/project/janus-core) (📥 650 / month · 📦 3 · ⏱️ 20.05.2025):
pip install janus-core
Allegro (🥈20 · ⭐ 400) - Allegro is an open-source code for building highly scalable and accurate equivariant deep learning interatomic.. MIT - [GitHub](https://github.com/mir-group/allegro) (👨‍💻 7 · 🔀 55 · 📋 43 - 9% open · ⏱️ 16.05.2025):
git clone https://github.com/mir-group/allegro
Metatrain (🥈19 · ⭐ 32) - Training and evaluating machine learning models for atomistic systems. BSD-3 workflows benchmarking rep-eng rep-learn - [GitHub](https://github.com/metatensor/metatrain) (👨‍💻 16 · 🔀 8 · 📥 5 · 📦 6 · 📋 180 - 30% open · ⏱️ 21.05.2025):
git clone https://github.com/metatensor/metatrain
- [PyPi](https://pypi.org/project/metatrain) (📥 4.3K / month · ⏱️ 28.04.2025):
pip install metatrain
apax (🥈19 · ⭐ 19) - A flexible and performant framework for training machine learning potentials. MIT - [GitHub](https://github.com/apax-hub/apax) (👨‍💻 9 · 🔀 3 · 📦 4 · 📋 140 - 11% open · ⏱️ 21.05.2025):
git clone https://github.com/apax-hub/apax
- [PyPi](https://pypi.org/project/apax) (📥 260 / month · ⏱️ 21.01.2025):
pip install apax
Graph-PES (🥈18 · ⭐ 91) - train and use graph-based ML models of potential energy surfaces. MIT rep-learn UIP MD pretrained - [GitHub](https://github.com/jla-gardner/graph-pes) (👨‍💻 3 · 🔀 4 · 📦 2 · 📋 15 - 13% open · ⏱️ 01.05.2025):
git clone https://github.com/jla-gardner/graph-pes
- [PyPi](https://pypi.org/project/graph-pes) (📥 3.5K / month · 📦 1 · ⏱️ 01.05.2025):
pip install graph-pes
Autoplex (🥈17 · ⭐ 76) - Code for automated fitting of machine learned interatomic potentials. GPL-3.0 benchmarking workflows - [GitHub](https://github.com/autoatml/autoplex) (👨‍💻 12 · 🔀 13 · 📦 2 · 📋 120 - 28% open · ⏱️ 13.05.2025):
git clone https://github.com/autoatml/autoplex
- [PyPi](https://pypi.org/project/autoplex) (📥 360 / month · ⏱️ 28.04.2025):
pip install autoplex
Neural Force Field (🥈16 · ⭐ 270) - Neural Network Force Field based on PyTorch. MIT pretrained - [GitHub](https://github.com/learningmatter-mit/NeuralForceField) (👨‍💻 45 · 🔀 55 · 📋 22 - 18% open · ⏱️ 01.05.2025):
git clone https://github.com/learningmatter-mit/NeuralForceField
n2p2 (🥈14 · ⭐ 240) - n2p2 - A Neural Network Potential Package. GPL-3.0 C++ - [GitHub](https://github.com/CompPhysVienna/n2p2) (👨‍💻 13 · 🔀 81 · 📋 150 - 44% open · ⏱️ 17.03.2025):
git clone https://github.com/CompPhysVienna/n2p2
NNPOps (🥈14 · ⭐ 93) - High-performance operations for neural network potentials. MIT MD C++ - [GitHub](https://github.com/openmm/NNPOps) (👨‍💻 10 · 🔀 20 · 📋 57 - 38% open · ⏱️ 28.02.2025):
git clone https://github.com/openmm/NNPOps
- [Conda](https://anaconda.org/conda-forge/nnpops) (📥 470K · ⏱️ 22.04.2025):
conda install -c conda-forge nnpops
Ultra-Fast Force Fields (UF3) (🥈14 · ⭐ 66 · 💤) - UF3: a python library for generating ultra-fast interatomic potentials. Apache-2 - [GitHub](https://github.com/uf3/uf3) (👨‍💻 10 · 🔀 25 · 📦 2 · 📋 51 - 37% open · ⏱️ 04.10.2024):
git clone https://github.com/uf3/uf3
- [PyPi](https://pypi.org/project/uf3) (📥 45 / month · ⏱️ 27.10.2023):
pip install uf3
So3krates (MLFF) (🥈13 · ⭐ 120 · 💤) - Build neural networks for machine learning force fields with JAX. MIT - [GitHub](https://github.com/thorben-frank/mlff) (👨‍💻 4 · 🔀 28 · 📋 13 - 46% open · ⏱️ 23.08.2024):
git clone https://github.com/thorben-frank/mlff
MLIPX - Machine-Learned Interatomic Potential eXploration (🥈13 · ⭐ 81 · 📉) - Machine-Learned Interatomic Potential eXploration (mlipx) is designed at BASF for evaluating machine-learned.. MIT benchmarking viz workflows - [GitHub](https://github.com/basf/mlipx) (👨‍💻 4 · 🔀 7 · 📦 2 · 📋 9 - 33% open · ⏱️ 22.05.2025):
git clone https://github.com/basf/mlipx
- [PyPi](https://pypi.org/project/mlipx) (📥 92 / month · ⏱️ 11.04.2025):
pip install mlipx
PiNN (🥈12 · ⭐ 110) - A Python library for building atomic neural networks. BSD-3 - [GitHub](https://github.com/Teoroo-CMC/PiNN) (👨‍💻 6 · 🔀 35 · 📋 7 - 14% open · ⏱️ 17.02.2025):
git clone https://github.com/Teoroo-CMC/PiNN
- [Docker Hub](https://hub.docker.com/r/teoroo/pinn) (📥 450 · ⏱️ 17.02.2025):
docker pull teoroo/pinn
Pacemaker (🥈12 · ⭐ 83) - Python package for fitting atomic cluster expansion (ACE) potentials. Custom - [GitHub](https://github.com/ICAMS/python-ace) (👨‍💻 7 · 🔀 21 · 📋 59 - 33% open · ⏱️ 20.11.2024):
git clone https://github.com/ICAMS/python-ace
- [PyPi](https://pypi.org/project/python-ace) (📥 12 / month · ⏱️ 24.10.2022):
pip install python-ace
wfl (🥈12 · ⭐ 39) - Workflow is a Python toolkit for building interatomic potential creation and atomistic simulation workflows. GPL-2.0 workflows HTC - [GitHub](https://github.com/libAtoms/workflow) (👨‍💻 19 · 🔀 19 · 📦 2 · 📋 160 - 41% open · ⏱️ 21.02.2025):
git clone https://github.com/libAtoms/workflow
calorine (🥈11 · ⭐ 14 · 💤) - A Python package for constructing and sampling neuroevolution potential models. https://doi.org/10.21105/joss.06264. Custom - [PyPi](https://pypi.org/project/calorine) (📥 1.6K / month · 📦 4 · ⏱️ 25.10.2024):
pip install calorine
- [GitLab](https://gitlab.com/materials-modeling/calorine) (🔀 4 · 📋 95 - 6% open · ⏱️ 25.10.2024):
git clone https://gitlab.com/materials-modeling/calorine
DeepMD-GNN (🥉10 · ⭐ 45) - DeePMD-kit plugin for various graph neural network models. LGPL-3.0 rep-learn MD UIP C++ - [GitHub](https://github.com/deepmodeling/deepmd-gnn) (👨‍💻 5 · 🔀 7 · 📋 6 - 83% open · ⏱️ 29.04.2025):
git clone https://github.com/deepmodeling/deepmd-gnn
Point Edge Transformer (PET) (🥉10 · ⭐ 27) - Point Edge Transformer. MIT rep-learn transformer - [GitHub](https://github.com/spozdn/pet) (👨‍💻 9 · 🔀 6 · ⏱️ 18.03.2025):
git clone https://github.com/spozdn/pet
ACE1.jl (🥉10 · ⭐ 22) - Atomic Cluster Expansion for Modelling Invariant Atomic Properties. Custom Julia - [GitHub](https://github.com/ACEsuit/ACE1.jl) (👨‍💻 9 · 🔀 7 · 📋 46 - 47% open · ⏱️ 15.04.2025):
git clone https://github.com/ACEsuit/ACE1.jl
tinker-hp (🥉9 · ⭐ 87 · 💤) - Tinker-HP: High-Performance Massively Parallel Evolution of Tinker on CPUs & GPUs. Custom - [GitHub](https://github.com/TinkerTools/tinker-hp) (👨‍💻 12 · 🔀 22 · 📋 25 - 20% open · ⏱️ 26.10.2024):
git clone https://github.com/TinkerTools/tinker-hp
ACE.jl (🥉9 · ⭐ 65) - Parameterisation of Equivariant Properties of Particle Systems. Custom Julia - [GitHub](https://github.com/ACEsuit/ACE.jl) (👨‍💻 12 · 🔀 15 · 📋 82 - 29% open · ⏱️ 17.12.2024):
git clone https://github.com/ACEsuit/ACE.jl
ALF (🥉9 · ⭐ 35) - A framework for performing active learning for training machine-learned interatomic potentials. Custom active-learning - [GitHub](https://github.com/lanl/ALF) (👨‍💻 5 · 🔀 12 · ⏱️ 28.03.2025):
git clone https://github.com/lanl/alf
ACEfit (🥉9 · ⭐ 7 · 💤) - MIT Julia - [GitHub](https://github.com/ACEsuit/ACEfit.jl) (👨‍💻 8 · 🔀 8 · 📋 57 - 38% open · ⏱️ 14.09.2024):
git clone https://github.com/ACEsuit/ACEfit.jl
EquiformerV2 (🥉8 · ⭐ 270) - [ICLR 2024] EquiformerV2: Improved Equivariant Transformer for Scaling to Higher-Degree Representations. MIT pretrained UIP rep-learn - [GitHub](https://github.com/atomicarchitects/equiformer_v2) (👨‍💻 2 · 🔀 35 · 📋 24 - 66% open · ⏱️ 11.02.2025):
git clone https://github.com/atomicarchitects/equiformer_v2
PyNEP (🥉8 · ⭐ 54) - A python interface of the machine learning potential NEP used in GPUMD. MIT - [GitHub](https://github.com/bigd4/PyNEP) (👨‍💻 9 · 🔀 17 · 📋 13 - 38% open · ⏱️ 15.12.2024):
git clone https://github.com/bigd4/PyNEP
GAP (🥉8 · ⭐ 41) - Gaussian Approximation Potential (GAP). Custom - [GitHub](https://github.com/libAtoms/GAP) (👨‍💻 13 · 🔀 20 · ⏱️ 22.04.2025):
git clone https://github.com/libAtoms/GAP
TurboGAP (🥉8 · ⭐ 17) - The TurboGAP code. Custom Fortran - [GitHub](https://github.com/mcaroba/turbogap) (👨‍💻 8 · 🔀 11 · 📋 11 - 63% open · ⏱️ 17.12.2024):
git clone https://github.com/mcaroba/turbogap
Asparagus (🥉8 · ⭐ 11) - Program Package for Sampling, Training and Applying ML-based Potential models https://doi.org/10.48550/arXiv.2407.15175. MIT workflows sampling MD - [GitHub](https://github.com/MMunibas/Asparagus) (👨‍💻 9 · 🔀 4 · ⏱️ 09.04.2025):
git clone https://github.com/MMunibas/Asparagus
MLXDM (🥉6 · ⭐ 8) - A Neural Network Potential with Rigorous Treatment of Long-Range Dispersion https://doi.org/10.1039/D2DD00150K. MIT long-range - [GitHub](https://github.com/RowleyGroup/MLXDM) (👨‍💻 7 · 🔀 2 · ⏱️ 12.03.2025):
git clone https://github.com/RowleyGroup/MLXDM
MatML (🥉6 · ⭐ 6 · 🐣) - Full MatML Docker image, including MatGL, MatCalc, MatPES and LAMMPS with ML-GNNP and ML-SNAP. BSD-3 MD UIP rep-learn pretrained - [GitHub](https://github.com/materialsvirtuallab/matml) (👨‍💻 2 · ⏱️ 05.05.2025):
git clone https://github.com/materialsvirtuallab/matml
- [Docker Hub](https://hub.docker.com/r/materialsvirtuallab/matml) (📥 94 · ⭐ 1 · ⏱️ 08.04.2025):
docker pull materialsvirtuallab/matml
TensorPotential (🥉5 · ⭐ 10 · 💤) - Tensorpotential is a TensorFlow based tool for development, fitting ML interatomic potentials from electronic.. Custom - [GitHub](https://github.com/ICAMS/TensorPotential) (👨‍💻 4 · 🔀 5 · ⏱️ 12.09.2024):
git clone https://github.com/ICAMS/TensorPotential
Show 40 hidden projects... - TorchANI (🥇24 · ⭐ 500 · 💀) - Accurate Neural Network Potential on PyTorch. MIT - MEGNet (🥇22 · ⭐ 530 · 💀) - Graph Networks as a Universal Machine Learning Framework for Molecules and Crystals. BSD-3 multifidelity - sGDML (🥈16 · ⭐ 150 · 💀) - sGDML - Reference implementation of the Symmetric Gradient Domain Machine Learning model. MIT - PyXtalFF (🥈15 · ⭐ 90 · 💀) - Machine Learning Interatomic Potential Predictions. MIT - TensorMol (🥈12 · ⭐ 270 · 💀) - Tensorflow + Molecules = TensorMol. GPL-3.0 single-paper - ANI-1 (🥈12 · ⭐ 220 · 💀) - ANI-1 neural net potential with python interface (ASE). MIT - SIMPLE-NN (🥈11 · ⭐ 47 · 💀) - SIMPLE-NN(SNU Interatomic Machine-learning PotentiaL packagE version Neural Network). GPL-3.0 - CCS_fit (🥈11 · ⭐ 9 · 💀) - Curvature Constrained Splines. GPL-3.0 - DimeNet (🥉9 · ⭐ 320 · 💀) - DimeNet and DimeNet++ models, as proposed in Directional Message Passing for Molecular Graphs (ICLR 2020) and Fast and.. Custom - SchNet (🥉9 · ⭐ 250 · 💀) - SchNet - a deep learning architecture for quantum chemistry. MIT - GemNet (🥉9 · ⭐ 200 · 💀) - GemNet model in PyTorch, as proposed in GemNet: Universal Directional Graph Neural Networks for Molecules (NeurIPS.. Custom - MACE-Jax (🥉9 · ⭐ 72 · 💀) - Equivariant machine learning interatomic potentials in JAX. MIT - NNsforMD (🥉9 · ⭐ 10 · 💀) - Neural network class for molecular dynamics to predict potential energy, forces and non-adiabatic couplings. MIT - aiida-mlip (🥉9 · ⭐ 1) - machine learning interatomic potentials aiida plugin. BSD-3 workflows structure-optimization MD - AIMNet (🥉8 · ⭐ 100 · 💀) - Atoms In Molecules Neural Network Potential. MIT single-paper - SIMPLE-NN v2 (🥉8 · ⭐ 42 · 💀) - SIMPLE-NN is an open package that constructs Behler-Parrinello-type neural-network interatomic potentials from ab.. GPL-3.0 - Atomistic Adversarial Attacks (🥉8 · ⭐ 37 · 💀) - Code for performing adversarial attacks on atomistic systems using NN potentials. MIT probabilistic - SNAP (🥉8 · ⭐ 36 · 💀) - Repository for spectral neighbor analysis potential (SNAP) model development. BSD-3 - MEGNetSparse (🥉8 · ⭐ 4 · 💤) - A library imlementing a graph neural network with sparse representation from Code for Kazeev, N., Al-Maeeni, A.R.,.. MIT material-defect - PhysNet (🥉7 · ⭐ 100 · 💀) - Code for training PhysNet models. MIT electrostatics - BPNET (🥉7 · ⭐ 3 · 🐣) - Fast Behler-Parrinello type neural networks in Fortran2008. MIT rep-eng Fortran - MLIP-3 (🥉6 · ⭐ 23 · 💀) - MLIP-3: Active learning on atomic environments with Moment Tensor Potentials (MTP). BSD-2 C++ - testing-framework (🥉6 · ⭐ 11 · 💀) - The purpose of this repository is to aid the testing of a large number of interatomic potentials for a variety of.. Unlicensed benchmarking - PANNA (🥉6 · ⭐ 10 · 💀) - A package to train and validate all-to-all connected network models for BP[1] and modified-BP[2] type local atomic.. MIT benchmarking - NequIP-JAX (🥉5 · ⭐ 23 · 💀) - JAX implementation of the NequIP interatomic potential. Unlicensed - GN-MM (🥉5 · ⭐ 11 · 💀) - The Gaussian Moment Neural Network (GM-NN) package developed for large-scale atomistic simulations employing atomistic.. MIT active-learning MD rep-eng magnetism - Alchemical learning (🥉5 · ⭐ 2 · 💀) - Code for the Modeling high-entropy transition metal alloys with alchemical compression article. BSD-3 rep-eng Defects & Disorder - ACE1Pack.jl (🥉5 · ⭐ 1 · 💀) - Provides convenience functionality for the usage of ACE1.jl, ACEfit.jl, JuLIP.jl for fitting interatomic potentials.. MIT Julia - glp (🥉4 · ⭐ 24 · 💀) - tools for graph-based machine-learning potentials in jax. MIT - Allegro-Legato (🥉4 · ⭐ 20 · 💀) - An extension of Allegro with enhanced robustness and time-to-failure. MIT MD - ACE Workflows (🥉4 · 💀) - Workflow Examples for ACE Models. Unlicensed Julia workflows - PeriodicPotentials (🥉4 · 💀) - A Periodic table app that displays potentials based on the selected elements. MIT community-resource viz JavaScript - Allegro-JAX (🥉3 · ⭐ 21) - JAX implementation of the Allegro interatomic potential. MIT - PyFLAME (🥉3 · 💀) - An automated approach for developing neural network interatomic potentials with FLAME.. Unlicensed active-learning structure-prediction structure-optimization rep-eng Fortran - SingleNN (🥉2 · ⭐ 9 · 💀) - An efficient package for training and executing neural-network interatomic potentials. Unlicensed C++ - mag-ace (🥉2 · ⭐ 3) - Magnetic ACE potential. FORTRAN interface for LAMMPS SPIN package. Unlicensed magnetism MD Fortran - AisNet (🥉2 · ⭐ 3 · 💀) - A Universal Interatomic Potential Neural Network with Encoded Local Environment Features.. MIT - RuNNer (🥉2) - The RuNNer Neural Network Energy Representation is a Fortran-based framework for the construction of Behler-.. GPL-3.0 Fortran - nnp-pre-training (🥉1 · ⭐ 6 · 💀) - Synthetic pre-training for neural-network interatomic potentials. Unlicensed pretrained MD - mlp (🥉1 · ⭐ 1 · 💀) - Proper orthogonal descriptors for efficient and accurate interatomic potentials... Unlicensed Julia


Language Models

Back to top

Projects that use (large) language models (LMs, LLMs) or natural language procesing (NLP) techniques for atomistic ML.

🔗 MaCBench Leaderboard - Leaderboard for multimodal language models for chemistry & materials research. community-resource benchmarking datasets

ChemBench (🥇20 · ⭐ 79) - How good are LLMs at chemistry?. MIT benchmarking multimodal - [GitHub](https://github.com/lamalab-org/chembench) (👨‍💻 13 · 🔀 9 · 📦 2 · 📋 320 - 14% open · ⏱️ 26.03.2025):
git clone https://github.com/lamalab-org/chembench
- [PyPi](https://pypi.org/project/chembench) (📥 5.1K / month · ⏱️ 27.02.2025):
pip install chembench
OpenBioML ChemNLP (🥇17 · ⭐ 160 · 💤) - ChemNLP project. MIT datasets - [GitHub](https://github.com/OpenBioML/chemnlp) (👨‍💻 27 · 🔀 45 · 📋 250 - 44% open · ⏱️ 19.08.2024):
git clone https://github.com/OpenBioML/chemnlp
- [PyPi](https://pypi.org/project/chemnlp) (📥 130 / month · 📦 1 · ⏱️ 07.08.2023):
pip install chemnlp
paper-qa (🥈16 · ⭐ 7.1K) - LLM Chain for answering questions from docs. Unlicensed ai-agent - [GitHub]() (🔀 700):
git clone https://github.com/whitead/paper-qa
- [PyPi](https://pypi.org/project/paper-qa) (📥 7.1K / month · 📦 12 · ⏱️ 27.03.2025):
pip install paper-qa
ChemCrow (🥈16 · ⭐ 750) - Open source package for the accurate solution of reasoning-intensive chemical tasks. MIT ai-agent - [GitHub](https://github.com/ur-whitelab/chemcrow-public) (👨‍💻 3 · 🔀 110 · 📦 10 · 📋 23 - 39% open · ⏱️ 19.12.2024):
git clone https://github.com/ur-whitelab/chemcrow-public
- [PyPi](https://pypi.org/project/chemcrow) (📥 510 / month · ⏱️ 27.03.2024):
pip install chemcrow
AtomGPT (🥈15 · ⭐ 58) - AtomGPT & DiffractGPT : Generative Pretrained Transformer Models for Forward and Inverse Materials Design.. Custom generative pretrained transformer - [GitHub](https://github.com/usnistgov/atomgpt) (👨‍💻 6 · 🔀 13 · 📦 3 · ⏱️ 09.04.2025):
git clone https://github.com/usnistgov/atomgpt
- [PyPi](https://pypi.org/project/atomgpt) (📥 200 / month · 📦 1 · ⏱️ 22.03.2025):
pip install atomgpt
ChatMOF (🥈12 · ⭐ 81) - Predict and Inverse design for metal-organic framework with large-language models (llms). MIT generative - [GitHub](https://github.com/Yeonghun1675/ChatMOF) (👨‍💻 2 · 🔀 17 · 📦 3 · 📋 8 - 12% open · ⏱️ 15.05.2025):
git clone https://github.com/Yeonghun1675/ChatMOF
- [PyPi](https://pypi.org/project/chatmof) (📥 370 / month · ⏱️ 01.07.2024):
pip install chatmof
NIST ChemNLP (🥉11 · ⭐ 75 · 💤) - ChemNLP: A Natural Language Processing based Library for Materials Chemistry Text Data. MIT literature-data - [GitHub](https://github.com/usnistgov/chemnlp) (👨‍💻 2 · 🔀 20 · 📦 4 · ⏱️ 19.08.2024):
git clone https://github.com/usnistgov/chemnlp
- [PyPi](https://pypi.org/project/chemnlp) (📥 130 / month · 📦 1 · ⏱️ 07.08.2023):
pip install chemnlp
LLaMP (🥉8 · ⭐ 82 · 💤) - A web app and Python API for multi-modal RAG framework to ground LLMs on high-fidelity materials informatics. An.. BSD-3 multimodal RAG materials-discovery pretrained JavaScript Python - [GitHub](https://github.com/chiang-yuan/llamp) (👨‍💻 6 · 🔀 13 · 📋 25 - 32% open · ⏱️ 14.10.2024):
git clone https://github.com/chiang-yuan/llamp
crystal-text-llm (🥉6 · ⭐ 100 · 💤) - Large language models to generate stable crystals. CC-BY-NC-4.0 materials-discovery - [GitHub](https://github.com/facebookresearch/crystal-text-llm) (👨‍💻 3 · 🔀 21 · 📋 14 - 85% open · ⏱️ 18.06.2024):
git clone https://github.com/facebookresearch/crystal-text-llm
LLM4Chem (🥉6 · ⭐ 87 · 💤) - Official code repo for the paper LlaSMol: Advancing Large Language Models for Chemistry with a Large-Scale,.. MIT cheminformatics datasets - [GitHub](https://github.com/OSU-NLP-Group/LLM4Chem) (👨‍💻 1 · 🔀 12 · ⏱️ 06.05.2024):
git clone https://github.com/OSU-NLP-Group/LLM4Chem
SciBot (🥉5 · ⭐ 31 · 💤) - SciBot is a simple demo of building a domain-specific chatbot for science. Unlicensed ai-agent - [GitHub](https://github.com/CFN-softbio/SciBot) (👨‍💻 1 · 🔀 10 · 📦 2 · ⏱️ 03.09.2024):
git clone https://github.com/CFN-softbio/SciBot
Cephalo (🥉5 · ⭐ 10 · 💤) - Multimodal Vision-Language Models for Bio-Inspired Materials Analysis and Design. Apache-2 generative multimodal pretrained - [GitHub](https://github.com/lamm-mit/Cephalo) (🔀 1 · ⏱️ 23.07.2024):
git clone https://github.com/lamm-mit/Cephalo
Show 12 hidden projects... - ChemDataExtractor (🥇17 · ⭐ 320 · 💀) - Automatically extract chemical information from scientific documents. MIT literature-data - mat2vec (🥈12 · ⭐ 630 · 💀) - Supplementary Materials for Tshitoyan et al. Unsupervised word embeddings capture latent knowledge from materials.. MIT rep-learn - gptchem (🥈12 · ⭐ 250 · 💀) - Use GPT-3 to solve chemistry problems. MIT - nlcc (🥈12 · ⭐ 45 · 💀) - Natural language computational chemistry command line interface. MIT single-paper - MoLFormer (🥉10 · ⭐ 310 · 💀) - Repository for MolFormer. Apache-2 transformer pretrained drug-discovery - MolSkill (🥉10 · ⭐ 110 · 💀) - Extracting medicinal chemistry intuition via preference machine learning. MIT drug-discovery recommender - chemlift (🥉7 · ⭐ 41 · 💀) - Language-interfaced fine-tuning for chemistry. MIT - LLM-Prop (🥉7 · ⭐ 36 · 💀) - A repository for the LLM-Prop implementation. MIT - BERT-PSIE-TC (🥉6 · ⭐ 15 · 💀) - A dataset of Curie temperatures automatically extracted from scientific literature with the use of the BERT-PSIE.. MIT magnetism - MAPI_LLM (🥉5 · ⭐ 9 · 💀) - A LLM application developed during the LLM March MADNESS Hackathon https://doi.org/10.1039/D3DD00113J. MIT ai-agent dataset - CatBERTa (🥉4 · ⭐ 24 · 💀) - Large Language Model for Catalyst Property Prediction. Unlicensed transformer catalysis - ChemDataWriter (🥉3 · ⭐ 14 · 💀) - ChemDataWriter is a transformer-based library for automatically generating research books in the chemistry area. MIT literature-data


Materials Discovery

Back to top

Projects that implement materials discovery methods using atomistic ML.

SMACT (🥇26 · ⭐ 110) - Python package to aid materials design and informatics. MIT HTC structure-prediction electrostatics - [GitHub](https://github.com/WMD-group/SMACT) (👨‍💻 46 · 🔀 26 · 📦 54 · 📋 62 - 9% open · ⏱️ 07.04.2025):
git clone https://github.com/WMD-group/SMACT
- [PyPi](https://pypi.org/project/smact) (📥 4.6K / month · 📦 5 · ⏱️ 02.04.2025):
pip install smact
- [Conda](https://anaconda.org/conda-forge/smact) (📥 5K · ⏱️ 22.04.2025):
conda install -c conda-forge smact
MatterGen (🥇18 · ⭐ 1.4K · 🐣) - Official implementation of MatterGen -- a generative model for inorganic materials design across the periodic table.. MIT generative structure-prediction pretrained - [GitHub](https://github.com/microsoft/mattergen) (👨‍💻 8 · 🔀 210 · 📋 100 - 4% open · ⏱️ 17.04.2025):
git clone https://github.com/microsoft/mattergen
aviary (🥈14 · ⭐ 56) - The Wren sits on its Roost in the Aviary. MIT - [GitHub](https://github.com/CompRhys/aviary) (👨‍💻 6 · 🔀 13 · 📋 33 - 12% open · ⏱️ 19.04.2025):
git clone https://github.com/CompRhys/aviary
BOSS (🥈11 · ⭐ 22) - Bayesian Optimization Structure Search (BOSS). Apache-2 probabilistic - [PyPi](https://pypi.org/project/aalto-boss) (📥 690 / month · ⏱️ 13.11.2024):
pip install aalto-boss
- [GitLab](https://gitlab.com/cest-group/boss) (🔀 11 · 📋 32 - 6% open · ⏱️ 13.11.2024):
git clone https://gitlab.com/cest-group/boss
Materials Discovery: GNoME (🥈10 · ⭐ 990) - Graph Networks for Materials Science (GNoME) and dataset of 381,000 novel stable materials. Apache-2 UIP datasets rep-learn proprietary - [GitHub](https://github.com/google-deepmind/materials_discovery) (👨‍💻 2 · 🔀 160 · 📋 25 - 84% open · ⏱️ 03.03.2025):
git clone https://github.com/google-deepmind/materials_discovery
AGOX (🥉7 · ⭐ 14 · 💤) - AGOX is a package for global optimization of atomic system using e.g. the energy calculated from density functional.. GPL-3.0 structure-optimization - [PyPi](https://pypi.org/project/agox) (📥 94 / month · ⏱️ 23.10.2024):
pip install agox
- [GitLab](https://gitlab.com/agox/agox) (🔀 7 · 📋 28 - 35% open · ⏱️ 23.10.2024):
git clone https://gitlab.com/agox/agox
CSPML (crystal structure prediction with machine learning-based element substitution) (🥉5 · ⭐ 24) - Original implementation of CSPML. MIT structure-prediction - [GitHub](https://github.com/Minoru938/CSPML) (👨‍💻 1 · 🔀 8 · 📋 3 - 66% open · ⏱️ 22.12.2024):
git clone https://github.com/minoru938/cspml
Show 6 hidden projects... - Computational Autonomy for Materials Discovery (CAMD) (🥉7 · ⭐ 1 · 💀) - Agent-based sequential learning software for materials discovery. Apache-2 - MAGUS (🥉4 · ⭐ 73 · 💀) - Machine learning And Graph theory assisted Universal structure Searcher. Unlicensed structure-prediction active-learning - ML-atomate (🥉4 · ⭐ 6 · 💀) - Machine learning-assisted Atomate code for autonomous computational materials screening. GPL-3.0 active-learning workflows - closed-loop-acceleration-benchmarks (🥉4 · 💀) - Data and scripts in support of the publication By how much can closed-loop frameworks accelerate computational.. MIT materials-discovery active-learning single-paper - SPINNER (🥉3 · ⭐ 13 · 💀) - SPINNER (Structure Prediction of Inorganic crystals using Neural Network potentials with Evolutionary and Random.. GPL-3.0 C++ structure-prediction - sl_discovery (🥉3 · ⭐ 5 · 💀) - Data processing and models related to Quantifying the performance of machine learning models in materials discovery. Apache-2 materials-discovery single-paper


Mathematical tools

Back to top

Projects that implement mathematical objects used in atomistic machine learning.

KFAC-JAX (🥇20 · ⭐ 270) - Second Order Optimization and Curvature Estimation with K-FAC in JAX. Apache-2 - [GitHub](https://github.com/google-deepmind/kfac-jax) (👨‍💻 19 · 🔀 27 · 📦 11 · 📋 30 - 63% open · ⏱️ 20.05.2025):
git clone https://github.com/google-deepmind/kfac-jax
- [PyPi](https://pypi.org/project/kfac-jax) (📥 680 / month · 📦 2 · ⏱️ 20.05.2025):
pip install kfac-jax
cuEquivariance (🥇18 · ⭐ 220) - cuEquivariance is a math library that is a collective of low-level primitives and tensor ops to accelerate widely-used.. Apache-2 rep-learn - [GitHub](https://github.com/NVIDIA/cuEquivariance) (👨‍💻 4 · 🔀 12 · 📋 24 - 29% open · ⏱️ 21.05.2025):
git clone https://github.com/NVIDIA/cuEquivariance
- [PyPi](https://pypi.org/project/cuequivariance) (📥 9.8K / month · 📦 2 · ⏱️ 25.04.2025):
pip install cuequivariance
- [Conda](https://anaconda.org/conda-forge/cuequivariance) (📥 3.9K · ⏱️ 25.04.2025):
conda install -c conda-forge cuequivariance
SpheriCart (🥇18 · ⭐ 84) - Multi-language library for the calculation of spherical harmonics in Cartesian coordinates. MIT - [GitHub](https://github.com/lab-cosmo/sphericart) (👨‍💻 12 · 🔀 15 · 📥 320 · 📦 7 · 📋 48 - 47% open · ⏱️ 20.05.2025):
git clone https://github.com/lab-cosmo/sphericart
- [PyPi](https://pypi.org/project/sphericart) (📥 2.8K / month · ⏱️ 28.04.2025):
pip install sphericart
gpax (🥈17 · ⭐ 220 · 💤) - Gaussian Processes for Experimental Sciences. MIT probabilistic active-learning - [GitHub](https://github.com/ziatdinovmax/gpax) (👨‍💻 6 · 🔀 27 · 📦 5 · 📋 41 - 21% open · ⏱️ 21.05.2024):
git clone https://github.com/ziatdinovmax/gpax
- [PyPi](https://pypi.org/project/gpax) (📥 630 / month · ⏱️ 20.03.2024):
pip install gpax
Polynomials4ML.jl (🥈15 · ⭐ 13) - Polynomials for ML: fast evaluation, batching, differentiation. MIT Julia - [GitHub](https://github.com/ACEsuit/Polynomials4ML.jl) (👨‍💻 12 · 🔀 6 · 📋 56 - 16% open · ⏱️ 13.05.2025):
git clone https://github.com/ACEsuit/Polynomials4ML.jl
OpenEquivariance (🥈11 · ⭐ 57) - OpenEquivariance: a fast, open-source GPU JIT kernel generator for the Clebsch-Gordon Tensor Product. BSD-3 rep-learn - [GitHub](https://github.com/PASSIONLab/OpenEquivariance) (👨‍💻 3 · 🔀 4 · 📋 10 - 10% open · ⏱️ 15.05.2025):
git clone https://github.com/PASSIONLab/OpenEquivariance
GElib (🥈11 · ⭐ 23) - C++/CUDA library for SO(3) equivariant operations. MPL-2.0 C++ - [GitHub](https://github.com/risi-kondor/GElib) (👨‍💻 4 · 🔀 3 · 📋 8 - 50% open · ⏱️ 24.04.2025):
git clone https://github.com/risi-kondor/GElib
cnine (🥉6 · ⭐ 5) - Cnine tensor library. Unlicensed C++ - [GitHub](https://github.com/risi-kondor/cnine) (👨‍💻 6 · 🔀 4 · 📋 2 - 50% open · ⏱️ 15.05.2025):
git clone https://github.com/risi-kondor/cnine
Show 6 hidden projects... - lie-nn (🥉9 · ⭐ 32 · 💀) - Tools for building equivariant polynomials on reductive Lie groups. MIT rep-learn - LapJAX (🥉8 · ⭐ 72 · 💀) - A JAX based package designed for efficient second order operators (e.g., laplacian) computation. MIT - EquivariantOperators.jl (🥉6 · ⭐ 19 · 💀) - This package is deprecated. Functionalities are migrating to Porcupine.jl. MIT Julia - COSMO Toolbox (🥉6 · ⭐ 7 · 💀) - Assorted libraries and utilities for atomistic simulation analysis. Unlicensed C++ - torch_spex (🥉3 · ⭐ 2 · 💀) - Spherical expansions in PyTorch. Unlicensed - Wigner Kernels (🥉1 · ⭐ 2 · 💀) - Collection of programs to benchmark Wigner kernels. Unlicensed benchmarking


Molecular Dynamics

Back to top

Projects that simplify the integration of molecular dynamics and atomistic machine learning.

JAX-MD (🥇23 · ⭐ 1.3K) - Differentiable, Hardware Accelerated, Molecular Dynamics. Apache-2 - [GitHub](https://github.com/jax-md/jax-md) (👨‍💻 39 · 🔀 210 · 📦 72 · 📋 170 - 51% open · ⏱️ 26.11.2024):
git clone https://github.com/jax-md/jax-md
- [PyPi](https://pypi.org/project/jax-md) (📥 3.5K / month · 📦 3 · ⏱️ 09.08.2023):
pip install jax-md
GPUMD (🥇21 · ⭐ 570) - GPUMD is a highly efficient general-purpose molecular dynamic (MD) package and enables machine-learned potentials.. GPL-3.0 ML-IAP C++ electrostatics - [GitHub](https://github.com/brucefan1983/GPUMD) (👨‍💻 49 · 🔀 140 · 📋 230 - 7% open · ⏱️ 22.05.2025):
git clone https://github.com/brucefan1983/GPUMD
mlcolvar (🥈20 · ⭐ 110) - A unified framework for machine learning collective variables for enhanced sampling simulations. MIT sampling - [GitHub](https://github.com/luigibonati/mlcolvar) (👨‍💻 8 · 🔀 27 · 📦 7 · 📋 80 - 20% open · ⏱️ 21.05.2025):
git clone https://github.com/luigibonati/mlcolvar
- [PyPi](https://pypi.org/project/mlcolvar) (📥 300 / month · ⏱️ 19.02.2025):
pip install mlcolvar
FitSNAP (🥈19 · ⭐ 170) - Software for generating machine-learning interatomic potentials for LAMMPS. GPL-2.0 - [GitHub](https://github.com/FitSNAP/FitSNAP) (👨‍💻 25 · 🔀 57 · 📥 13 · 📋 77 - 20% open · ⏱️ 14.04.2025):
git clone https://github.com/FitSNAP/FitSNAP
- [Conda](https://anaconda.org/conda-forge/fitsnap3) (📥 12K · ⏱️ 22.04.2025):
conda install -c conda-forge fitsnap3
TorchSim (🥈18 · ⭐ 220 · 🐣) - Torch-native, batchable, atomistic simulation. MIT HTC UIP ML-IAP structure-optimization - [GitHub](https://github.com/Radical-AI/torch-sim) (👨‍💻 13 · 🔀 26 · 📋 64 - 34% open · ⏱️ 16.05.2025):
git clone https://github.com/Radical-AI/torch-sim
- [PyPi](https://pypi.org/project/torch-sim-atomistic) (📥 330 / month · ⏱️ 02.05.2025):
pip install torch-sim-atomistic
openmm-torch (🥈18 · ⭐ 200) - OpenMM plugin to define forces with neural networks. Custom ML-IAP C++ - [GitHub](https://github.com/openmm/openmm-torch) (👨‍💻 9 · 🔀 29 · 📋 97 - 29% open · ⏱️ 20.02.2025):
git clone https://github.com/openmm/openmm-torch
- [Conda](https://anaconda.org/conda-forge/openmm-torch) (📥 830K · ⏱️ 22.04.2025):
conda install -c conda-forge openmm-torch
Psiflow (🥉15 · ⭐ 140) - scalable molecular simulation. MIT ML-IAP active-learning sampling - [GitHub](https://github.com/molmod/psiflow) (👨‍💻 5 · 🔀 12 · 📋 54 - 20% open · ⏱️ 08.05.2025):
git clone https://github.com/molmod/psiflow
OpenMM-ML (🥉15 · ⭐ 110) - High level API for using machine learning models in OpenMM simulations. MIT ML-IAP - [GitHub](https://github.com/openmm/openmm-ml) (👨‍💻 5 · 🔀 24 · 📋 62 - 40% open · ⏱️ 12.03.2025):
git clone https://github.com/openmm/openmm-ml
- [Conda](https://anaconda.org/conda-forge/openmm-ml) (📥 26K · ⏱️ 22.04.2025):
conda install -c conda-forge openmm-ml
pair_allegro (🥉14 · ⭐ 41) - LAMMPS pair styles for NequIP and Allegro deep learning interatomic potentials. MIT ML-IAP rep-learn - [GitHub](https://github.com/mir-group/pair_nequip_allegro) (👨‍💻 4 · 🔀 8 · 📋 40 - 15% open · ⏱️ 16.05.2025):
git clone https://github.com/mir-group/pair_allegro
DMFF (🥉13 · ⭐ 170) - DMFF (Differentiable Molecular Force Field) is a Jax-based python package that provides a full differentiable.. LGPL-3.0 C++ - [GitHub](https://github.com/deepmodeling/DMFF) (👨‍💻 14 · 🔀 46 · 📋 28 - 39% open · ⏱️ 10.04.2025):
git clone https://github.com/deepmodeling/DMFF
pair_nequip (🥉11 · ⭐ 43) - LAMMPS pair style for NequIP. MIT ML-IAP rep-learn - [GitHub](https://github.com/mir-group/pair_nequip) (👨‍💻 3 · 🔀 13 · 📋 33 - 39% open · ⏱️ 25.04.2025):
git clone https://github.com/mir-group/pair_nequip
PACE (🥉9 · ⭐ 29) - The LAMMPS ML-IAP `pair_style pace`, aka Atomic Cluster Expansion (ACE), aka ML-PACE,.. Custom - [GitHub](https://github.com/ICAMS/lammps-user-pace) (👨‍💻 8 · 🔀 12 · 📋 8 - 25% open · ⏱️ 17.12.2024):
git clone https://github.com/ICAMS/lammps-user-pace
Show 3 hidden projects... - MUSE (🥉5 · ⭐ 4) - A python package for fast building amorphous solids and liquid mixtures from @materialsproject computed structures and.. MIT ML-IAP Defects & Disorder - SOMD (🥉4 · ⭐ 14) - Molecular dynamics package designed for the SIESTA DFT code. AGPL-3.0 ML-IAP active-learning - interface-lammps-mlip-3 (🥉3 · ⭐ 4 · 💀) - An interface between LAMMPS and MLIP (version 3). GPL-2.0


Probabilistic ML

Back to top

Projects that focus on probabilistic, Bayesian, Gaussian process and adversarial methods for atomistic ML, for optimization, uncertainty quantification (UQ), etc.

thermo (🥇5 · ⭐ 16) - Data-driven risk-conscious thermoelectric materials discovery. MIT materials-discovery experimental-data active-learning transport-phenomena - [GitHub](https://github.com/janosh/thermo) (👨‍💻 2 · 🔀 4 · ⏱️ 12.05.2025):
git clone https://github.com/janosh/thermo


Reinforcement Learning

Back to top

Projects that focus on reinforcement learning for atomistic ML.

Show 2 hidden projects... - ReLeaSE (🥇11 · ⭐ 360 · 💀) - Deep Reinforcement Learning for de-novo Drug Design. MIT drug-discovery - CatGym (🥉6 · ⭐ 12 · 💀) - Surface segregation using Deep Reinforcement Learning. GPL


Representation Engineering

Back to top

Projects that offer implementations of representations aka descriptors, fingerprints of atomistic systems, and models built with them, aka feature engineering.

cdk (🥇27 · ⭐ 530) - The Chemistry Development Kit. LGPL-2.1 cheminformatics Java - [GitHub](https://github.com/cdk/cdk) (👨‍💻 170 · 🔀 170 · 📥 26K · 📋 310 - 9% open · ⏱️ 22.04.2025):
git clone https://github.com/cdk/cdk
- [Maven](https://search.maven.org/artifact/org.openscience.cdk/cdk-bundle) (📦 18 · ⏱️ 29.03.2025):
<dependency>
    <groupId>org.openscience.cdk</groupId>
    <artifactId>cdk-bundle</artifactId>
    <version>[VERSION]</version>
</dependency>
DScribe (🥇24 · ⭐ 430 · 💤) - DScribe is a python package for creating machine learning descriptors for atomistic systems. Apache-2 - [GitHub](https://github.com/SINGROUP/dscribe) (👨‍💻 18 · 🔀 92 · 📦 250 · 📋 100 - 11% open · ⏱️ 28.05.2024):
git clone https://github.com/SINGROUP/dscribe
- [PyPi](https://pypi.org/project/dscribe) (📥 12K / month · 📦 35 · ⏱️ 28.05.2024):
pip install dscribe
- [Conda](https://anaconda.org/conda-forge/dscribe) (📥 210K · ⏱️ 22.04.2025):
conda install -c conda-forge dscribe
ChemML (🥇18 · ⭐ 170) - ChemML is a machine learning and informatics program suite for the chemical and materials sciences. BSD-3 cheminformatics active-learning workflows - [GitHub](https://github.com/hachmannlab/chemml) (👨‍💻 15 · 🔀 32 · 📥 14 · 📦 8 · 📋 13 - 53% open · ⏱️ 05.05.2025):
git clone https://github.com/hachmannlab/chemml
- [PyPi](https://pypi.org/project/chemml) (📥 600 / month · 📦 2 · ⏱️ 08.10.2023):
pip install chemml
MODNet (🥇17 · ⭐ 92) - MODNet: a framework for machine learning materials properties. MIT pretrained small-data transfer-learning - [GitHub](https://github.com/ppdebreuck/modnet) (👨‍💻 11 · 🔀 34 · 📦 11 · 📋 63 - 50% open · ⏱️ 02.05.2025):
git clone https://github.com/ppdebreuck/modnet
ElementEmbeddings (🥈16 · ⭐ 43) - Python package to interact with high-dimensional representations of the chemical elements. MIT XAI USL viz - [GitHub](https://github.com/WMD-group/ElementEmbeddings) (👨‍💻 6 · 🔀 4 · 📦 6 · 📋 22 - 22% open · ⏱️ 09.01.2025):
git clone https://github.com/WMD-group/ElementEmbeddings
- [PyPi](https://pypi.org/project/ElementEmbeddings) (📥 1.4K / month · ⏱️ 18.09.2024):
pip install ElementEmbeddings
- [Conda](https://anaconda.org/conda-forge/elementembeddings) (📥 1.7K · ⏱️ 22.04.2025):
conda install -c conda-forge elementembeddings
Featomic (🥈15 · ⭐ 71) - Computing representations for atomistic machine learning. BSD-3 Rust C++ - [GitHub](https://github.com/metatensor/featomic) (👨‍💻 16 · 🔀 15 · 📥 140 · 📋 83 - 50% open · ⏱️ 22.05.2025):
git clone https://github.com/metatensor/featomic
pySIPFENN (🥈15 · ⭐ 24) - Python python toolset for Structure-Informed Property and Feature Engineering with Neural Networks. It offers unique.. LGPL-3.0 material-defect Defects & Disorder pretrained transfer-learning - [GitHub](https://github.com/PhasesResearchLab/pySIPFENN) (👨‍💻 4 · 🔀 5 · 📥 110 · 📦 7 · 📋 6 - 66% open · ⏱️ 25.04.2025):
git clone https://github.com/PhasesResearchLab/pySIPFENN
- [PyPi](https://pypi.org/project/pysipfenn) (📥 280 / month · ⏱️ 06.03.2025):
pip install pysipfenn
- [Conda](https://anaconda.org/conda-forge/pysipfenn) (📥 15K · ⏱️ 22.04.2025):
conda install -c conda-forge pysipfenn
SISSO (🥈12 · ⭐ 280) - A data-driven method combining symbolic regression and compressed sensing for accurate & interpretable models. Apache-2 Fortran - [GitHub](https://github.com/rouyang2017/SISSO) (👨‍💻 3 · 🔀 85 · 📋 77 - 23% open · ⏱️ 21.03.2025):
git clone https://github.com/rouyang2017/SISSO
GlassPy (🥈12 · ⭐ 31 · 💤) - Python module for scientists working with glass materials. GPL-3.0 - [GitHub](https://github.com/drcassar/glasspy) (👨‍💻 2 · 🔀 7 · 📦 7 · 📋 15 - 46% open · ⏱️ 13.10.2024):
git clone https://github.com/drcassar/glasspy
- [PyPi](https://pypi.org/project/glasspy) (📥 610 / month · ⏱️ 05.09.2024):
pip install glasspy
PDynA (🥉11 · ⭐ 41 · 💤) - Python package to analyse the structural dynamics of perovskites. MIT MD - [GitHub](https://github.com/WMD-group/PDynA) (👨‍💻 4 · 🔀 3 · 📦 2 · ⏱️ 11.10.2024):
git clone https://github.com/WMD-group/PDynA
- [PyPi](https://pypi.org/project/pdyna) (📥 49 / month · ⏱️ 23.09.2024):
pip install pdyna
MOLPIPx (🥉8 · ⭐ 37) - Differentiable version of Permutationally Invariant Polynomial (PIP) models in JAX and Rust. Apache-2 Python Rust - [GitHub](https://github.com/ChemAI-Lab/molpipx) (👨‍💻 10 · 🔀 1 · ⏱️ 14.04.2025):
git clone https://github.com/ChemAI-Lab/molpipx
fplib (🥉7 · ⭐ 7) - libfp is a library for calculating crystalline fingerprints and measuring similarities of materials. MIT C-lang single-paper - [GitHub](https://github.com/Rutgers-ZRG/libfp) (👨‍💻 2 · 🔀 1 · 📦 2 · ⏱️ 16.04.2025):
git clone https://github.com/zhuligs/fplib
milad (🥉6 · ⭐ 32 · 💤) - Moment Invariants Local Atomic Descriptor. GPL-3.0 generative - [GitHub](https://github.com/muhrin/milad) (👨‍💻 1 · 🔀 2 · 📦 3 · ⏱️ 20.08.2024):
git clone https://github.com/muhrin/milad
SA-GPR (🥉5 · ⭐ 20) - Public repository for symmetry-adapted Gaussian Process Regression (SA-GPR). LGPL-3.0 C-lang - [GitHub](https://github.com/dilkins/TENSOAP) (👨‍💻 6 · 🔀 17 · 📥 2 · 📋 8 - 37% open · ⏱️ 03.02.2025):
git clone https://github.com/dilkins/TENSOAP
Show 17 hidden projects... - CatLearn (🥈16 · ⭐ 110 · 💀) - GPL-3.0 surface-science - Librascal (🥈13 · ⭐ 80 · 💀) - A scalable and versatile library to generate representations for atomic-scale learning. LGPL-2.1 - CBFV (🥈12 · ⭐ 27 · 💀) - Tool to quickly create a composition-based feature vector. Unlicensed - BenchML (🥈12 · ⭐ 15 · 💀) - ML benchmarking and pipeling framework. Apache-2 benchmarking - cmlkit (🥉11 · ⭐ 34 · 💀) - tools for machine learning in condensed matter physics and quantum chemistry. MIT benchmarking - SkipAtom (🥉11 · ⭐ 26 · 💀) - Distributed representations of atoms, inspired by the Skip-gram model. MIT - ElemNet (🥉7 · ⭐ 95 · 💀) - Deep Learning the Chemistry of Materials From Only Elemental Composition for Enhancing Materials Property Prediction. Unlicensed single-paper - NICE (🥉7 · ⭐ 12 · 💀) - NICE (N-body Iteratively Contracted Equivariants) is a set of tools designed for the calculation of invariant and.. MIT - SOAPxx (🥉6 · ⭐ 7 · 💀) - A SOAP implementation. GPL-2.0 C++ - pyLODE (🥉6 · ⭐ 3 · 💀) - Pythonic implementation of LOng Distance Equivariants. Apache-2 electrostatics - SISSO++ (🥉6 · ⭐ 3 · 💀) - C++ Implementation of SISSO with python bindings. Apache-2 C++ - AMP (🥉6 · 💀) - Amp is an open-source package designed to easily bring machine-learning to atomistic calculations. Unlicensed - soap_turbo (🥉5 · ⭐ 7 · 💀) - soap_turbo comprises a series of libraries to be used in combination with QUIP/GAP and TurboGAP. Custom Fortran - MXenes4HER (🥉5 · ⭐ 6 · 💀) - Predicting hydrogen evolution (HER) activity over 4500 MXene materials https://doi.org/10.1039/D3TA00344B. GPL-3.0 materials-discovery catalysis scikit-learn single-paper - automl-materials (🥉4 · ⭐ 5 · 💀) - AutoML for Regression Tasks on Small Tabular Data in Materials Design. MIT autoML benchmarking single-paper - magnetism-prediction (🥉4 · ⭐ 1) - DFT-aided Machine Learning Search for Magnetism in Fe-based Bimetallic Chalcogenides. Apache-2 magnetism single-paper - ML-for-CurieTemp-Predictions (🥉3 · ⭐ 2 · 💀) - Machine Learning Predictions of High-Curie-Temperature Materials. MIT single-paper magnetism


Representation Learning

Back to top

General models that learn a representations aka embeddings of atomistic systems, such as message-passing neural networks (MPNN).

Deep Graph Library (DGL) (🥇37 · ⭐ 14K) - Python package built to ease deep learning on graph, on top of existing DL frameworks. Apache-2 - [GitHub](https://github.com/dmlc/dgl) (👨‍💻 300 · 🔀 3K · 📦 4K · 📋 2.9K - 18% open · ⏱️ 11.02.2025):
git clone https://github.com/dmlc/dgl
- [PyPi](https://pypi.org/project/dgl) (📥 100K / month · 📦 150 · ⏱️ 13.05.2024):
pip install dgl
- [Conda](https://anaconda.org/dglteam/dgl) (📥 430K · ⏱️ 25.03.2025):
conda install -c dglteam dgl
PyG Models (🥇34 · ⭐ 22K) - Representation learning models implemented in PyTorch Geometric. MIT general-ml - [GitHub](https://github.com/pyg-team/pytorch_geometric) (👨‍💻 540 · 🔀 3.8K · 📦 9.5K · 📋 3.9K - 30% open · ⏱️ 20.05.2025):
git clone https://github.com/pyg-team/pytorch_geometric
e3nn (🥇28 · ⭐ 1.1K) - A modular framework for neural networks with Euclidean symmetry. MIT - [GitHub](https://github.com/e3nn/e3nn) (👨‍💻 36 · 🔀 160 · 📦 500 · 📋 170 - 18% open · ⏱️ 01.05.2025):
git clone https://github.com/e3nn/e3nn
- [PyPi](https://pypi.org/project/e3nn) (📥 190K / month · 📦 46 · ⏱️ 22.03.2025):
pip install e3nn
- [Conda](https://anaconda.org/conda-forge/e3nn) (📥 38K · ⏱️ 22.04.2025):
conda install -c conda-forge e3nn
MatGL (Materials Graph Library) (🥇28 · ⭐ 350) - Graph deep learning library for materials. BSD-3 ML-IAP pretrained multifidelity - [GitHub](https://github.com/materialsvirtuallab/matgl) (👨‍💻 21 · 🔀 73 · 📦 76 · 📋 120 - 5% open · ⏱️ 20.05.2025):
git clone https://github.com/materialsvirtuallab/matgl
- [PyPi](https://pypi.org/project/matgl) (📥 19K / month · 📦 28 · ⏱️ 19.05.2025):
pip install matgl
- [Docker Hub](https://hub.docker.com/r/materialsvirtuallab/matgl) (📥 70 · ⭐ 1 · ⏱️ 08.04.2025):
docker pull materialsvirtuallab/matgl
SchNetPack (🥇27 · ⭐ 840) - SchNetPack - Deep Neural Networks for Atomistic Systems. MIT - [GitHub](https://github.com/atomistic-machine-learning/schnetpack) (👨‍💻 40 · 🔀 220 · 📦 100 · 📋 270 - 2% open · ⏱️ 11.04.2025):
git clone https://github.com/atomistic-machine-learning/schnetpack
- [PyPi](https://pypi.org/project/schnetpack) (📥 1.2K / month · 📦 4 · ⏱️ 05.09.2024):
pip install schnetpack
ALIGNN (🥇20 · ⭐ 270) - Atomistic Line Graph Neural Network https://scholar.google.com/citations?user=9Q-tNnwAAAAJ.. Custom - [GitHub](https://github.com/usnistgov/alignn) (👨‍💻 7 · 🔀 91 · 📦 21 · 📋 75 - 64% open · ⏱️ 02.04.2025):
git clone https://github.com/usnistgov/alignn
- [PyPi](https://pypi.org/project/alignn) (📥 6.4K / month · 📦 11 · ⏱️ 02.04.2025):
pip install alignn
e3nn-jax (🥈19 · ⭐ 200 · 📉) - jax library for E3 Equivariant Neural Networks. Apache-2 - [GitHub](https://github.com/e3nn/e3nn-jax) (👨‍💻 8 · 🔀 20 · 📦 66 · 📋 25 - 12% open · ⏱️ 23.01.2025):
git clone https://github.com/e3nn/e3nn-jax
- [PyPi](https://pypi.org/project/e3nn-jax) (📥 6.5K / month · 📦 13 · ⏱️ 14.08.2024):
pip install e3nn-jax
kgcnn (🥈19 · ⭐ 120) - Graph convolutions in Keras with TensorFlow, PyTorch or Jax. MIT - [GitHub](https://github.com/aimat-lab/gcnn_keras) (👨‍💻 7 · 🔀 31 · 📦 20 · 📋 87 - 14% open · ⏱️ 05.01.2025):
git clone https://github.com/aimat-lab/gcnn_keras
- [PyPi](https://pypi.org/project/kgcnn) (📥 470 / month · 📦 3 · ⏱️ 08.01.2025):
pip install kgcnn
Uni-Mol (🥈17 · ⭐ 860) - Official Repository for the Uni-Mol Series Methods. MIT pretrained - [GitHub](https://github.com/deepmodeling/Uni-Mol) (👨‍💻 20 · 🔀 140 · 📥 18K · 📋 210 - 47% open · ⏱️ 07.05.2025):
git clone https://github.com/deepmodeling/Uni-Mol
escnn (🥈17 · ⭐ 430 · 💤) - Equivariant Steerable CNNs Library for Pytorch https://quva-lab.github.io/escnn/. Custom - [GitHub](https://github.com/QUVA-Lab/escnn) (👨‍💻 10 · 🔀 52 · 📋 77 - 49% open · ⏱️ 31.10.2024):
git clone https://github.com/QUVA-Lab/escnn
- [PyPi](https://pypi.org/project/escnn) (📥 5.5K / month · 📦 6 · ⏱️ 01.04.2022):
pip install escnn
matsciml (🥈17 · ⭐ 170) - Open MatSci ML Toolkit is a framework for prototyping and scaling out deep learning models for materials discovery.. MIT workflows benchmarking - [GitHub](https://github.com/IntelLabs/matsciml) (👨‍💻 12 · 🔀 26 · 📋 67 - 35% open · ⏱️ 24.03.2025):
git clone https://github.com/IntelLabs/matsciml
Graphormer (🥈15 · ⭐ 2.3K · 💤) - Graphormer is a general-purpose deep learning backbone for molecular modeling. MIT transformer pretrained - [GitHub](https://github.com/microsoft/Graphormer) (👨‍💻 14 · 🔀 340 · 📋 160 - 57% open · ⏱️ 28.05.2024):
git clone https://github.com/microsoft/Graphormer
HydraGNN (🥈14 · ⭐ 80) - Distributed PyTorch implementation of multi-headed graph convolutional neural networks. BSD-3 - [GitHub](https://github.com/ORNL/HydraGNN) (👨‍💻 16 · 🔀 29 · 📦 3 · 📋 54 - 31% open · ⏱️ 09.05.2025):
git clone https://github.com/ORNL/HydraGNN
Compositionally-Restricted Attention-Based Network (CrabNet) (🥈14 · ⭐ 16 · 💤) - Predict materials properties using only the composition information!. MIT - [GitHub](https://github.com/sparks-baird/CrabNet) (👨‍💻 6 · 🔀 5 · 📦 15 · 📋 19 - 84% open · ⏱️ 09.09.2024):
git clone https://github.com/sparks-baird/CrabNet
- [PyPi](https://pypi.org/project/crabnet) (📥 1.5K / month · 📦 2 · ⏱️ 10.01.2023):
pip install crabnet
hippynn (🥈13 · ⭐ 78) - python library for atomistic machine learning. Custom workflows - [GitHub](https://github.com/lanl/hippynn) (👨‍💻 17 · 🔀 25 · 📦 2 · 📋 27 - 29% open · ⏱️ 12.05.2025):
git clone https://github.com/lanl/hippynn
GATGNN: Global Attention Graph Neural Network (🥉9 · ⭐ 80) - Pytorch Repository for our work: Graph convolutional neural networks with global attention for improved materials.. MIT - [GitHub](https://github.com/superlouis/GATGNN) (👨‍💻 4 · 🔀 17 · 📋 7 - 57% open · ⏱️ 17.12.2024):
git clone https://github.com/superlouis/GATGNN
Equiformer (🥉8 · ⭐ 240) - [ICLR 2023 Spotlight] Equiformer: Equivariant Graph Attention Transformer for 3D Atomistic Graphs. MIT transformer - [GitHub](https://github.com/atomicarchitects/equiformer) (👨‍💻 2 · 🔀 44 · 📋 20 - 45% open · ⏱️ 11.02.2025):
git clone https://github.com/atomicarchitects/equiformer
graphite (🥉8 · ⭐ 80 · 💤) - A repository for implementing graph network models based on atomic structures. MIT - [GitHub](https://github.com/LLNL/graphite) (👨‍💻 2 · 🔀 11 · 📦 15 · 📋 4 - 75% open · ⏱️ 08.08.2024):
git clone https://github.com/llnl/graphite
GNNOpt (🥉8 · ⭐ 28) - Universal Ensemble-Embedding Graph Neural Network for Direct Prediction of Optical Spectra from Crystal Structures. MIT optical-properties single-paper - [GitHub](https://github.com/nguyen-group/GNNOpt) (🔀 8 · ⏱️ 19.12.2024):
git clone https://github.com/nguyen-group/GNNOpt
T-e3nn (🥉8 · ⭐ 14 · 💤) - Time-reversal Euclidean neural networks based on e3nn. MIT magnetism - [GitHub](https://github.com/Hongyu-yu/T-e3nn) (👨‍💻 26 · 🔀 1 · ⏱️ 29.09.2024):
git clone https://github.com/Hongyu-yu/T-e3nn
AdsorbML (🥉7 · ⭐ 40) - MIT surface-science single-paper - [GitHub](https://github.com/Open-Catalyst-Project/AdsorbML) (👨‍💻 7 · 🔀 6 · 📋 4 - 75% open · ⏱️ 05.02.2025):
git clone https://github.com/Open-Catalyst-Project/AdsorbML
PolyGNN (🥉7 · ⭐ 39) - polyGNN is a Python library to automate ML model training for polymer informatics. MIT soft-matter multitask single-paper - [GitHub](https://github.com/Ramprasad-Group/polygnn) (👨‍💻 4 · 🔀 9 · ⏱️ 05.02.2025):
git clone https://github.com/Ramprasad-Group/polygnn
Graph-Aware-Transformers (🥉6 · ⭐ 59 · 🐣) - Graph-Aware Attention for Adaptive Dynamics in Transformers. Apache-2 transformer graph-data pretrained single-paper - [GitHub](https://github.com/lamm-mit/Graph-Aware-Transformers) (👨‍💻 3 · 🔀 6 · ⏱️ 08.01.2025):
git clone https://github.com/lamm-mit/Graph-Aware-Transformers
Crystalframer (🥉6 · ⭐ 6 · 🐣) - The official code respository for Rethinking the role of frames for SE(3)-invariant crystal structure modeling (ICLR.. MIT transformer single-paper - [GitHub](https://github.com/omron-sinicx/crystalframer) (👨‍💻 2 · ⏱️ 03.05.2025):
git clone https://github.com/omron-sinicx/crystalframer
Show 40 hidden projects... - dgl-lifesci (🥇24 · ⭐ 750 · 💀) - Python package for graph neural networks in chemistry and biology. Apache-2 - NVIDIA Deep Learning Examples for Tensor Cores (🥇20 · ⭐ 14K · 💀) - State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and.. Custom educational drug-discovery - DIG: Dive into Graphs (🥇20 · ⭐ 2K · 💀) - A library for graph deep learning research. GPL-3.0 - benchmarking-gnns (🥈14 · ⭐ 2.6K · 💀) - Repository for benchmarking graph neural networks (JMLR 2023). MIT single-paper benchmarking - xtal2png (🥈14 · ⭐ 37 · 💀) - Encode/decode a crystal structure to/from a grayscale PNG image for direct use with image-based machine learning.. MIT computer-vision - Crystal Graph Convolutional Neural Networks (CGCNN) (🥈13 · ⭐ 740 · 💀) - Crystal graph convolutional neural networks for predicting material properties. MIT - Neural fingerprint (nfp) (🥈12 · ⭐ 60 · 💀) - Keras layers for end-to-end learning with rdkit and pymatgen. Custom - FAENet (🥈11 · ⭐ 34 · 💀) - Frame Averaging Equivariant GNN for materials modeling. MIT - pretrained-gnns (🥈10 · ⭐ 1K · 💀) - Strategies for Pre-training Graph Neural Networks. MIT pretrained - GDC (🥈10 · ⭐ 270 · 💀) - Graph Diffusion Convolution, as proposed in Diffusion Improves Graph Learning (NeurIPS 2019). MIT generative - Atom2Vec (🥈10 · ⭐ 37 · 💀) - Atom2Vec: a simple way to describe atoms for machine learning. MIT - SE(3)-Transformers (🥉9 · ⭐ 520 · 💀) - code for the SE3 Transformers paper: https://arxiv.org/abs/2006.10503. MIT single-paper transformer - ai4material_design (🥉9 · ⭐ 7 · 💀) - Code for Kazeev, N., Al-Maeeni, A.R., Romanov, I. et al. Sparse representation for machine learning the properties of.. Apache-2 pretrained material-defect - molecularGNN_smiles (🥉8 · ⭐ 320 · 💀) - The code of a graph neural network (GNN) for molecules, which is based on learning representations of r-radius.. Apache-2 - UVVisML (🥉8 · ⭐ 31 · 💀) - Predict optical properties of molecules with machine learning. MIT optical-properties single-paper probabilistic - tensorfieldnetworks (🥉7 · ⭐ 160 · 💀) - Rotation- and translation-equivariant neural networks for 3D point clouds. MIT - DTNN (🥉7 · ⭐ 77 · 💀) - Deep Tensor Neural Network. MIT - DeeperGATGNN (🥉7 · ⭐ 62 · 💀) - Scalable graph neural networks for materials property prediction. MIT - Cormorant (🥉7 · ⭐ 60 · 💀) - Codebase for Cormorant Neural Networks. Custom - escnn_jax (🥉7 · ⭐ 30 · 💀) - Equivariant Steerable CNNs Library for Pytorch https://quva-lab.github.io/escnn/. Custom - CGAT (🥉7 · ⭐ 27 · 💀) - Crystal graph attention neural networks for materials prediction. MIT - Geom3D (🥉6 · ⭐ 120 · 💀) - Geom3D: Geometric Modeling on 3D Structures, NeurIPS 2023. MIT benchmarking single-paper - MACE-Layer (🥉6 · ⭐ 37 · 💀) - Higher order equivariant graph neural networks for 3D point clouds. MIT - charge_transfer_nnp (🥉6 · ⭐ 35 · 💀) - Graph neural network potential with charge transfer. MIT electrostatics - GLAMOUR (🥉6 · ⭐ 21 · 💀) - Graph Learning over Macromolecule Representations. MIT single-paper - Autobahn (🥉5 · ⭐ 29 · 💀) - Repository for Autobahn: Automorphism Based Graph Neural Networks. MIT - FieldSchNet (🥉5 · ⭐ 19 · 💀) - Deep neural network for molecules in external fields. MIT - SCFNN (🥉5 · ⭐ 15 · 💀) - Self-consistent determination of long-range electrostatics in neural network potentials. MIT C++ electrostatics single-paper - CraTENet (🥉5 · ⭐ 14 · 💀) - An attention-based deep neural network for thermoelectric transport properties. MIT transport-phenomena - EGraFFBench (🥉5 · ⭐ 11 · 💀) - Unlicensed single-paper benchmarking ML-IAP - Per-site PAiNN (🥉5 · ⭐ 2 · 💀) - Fork of PaiNN for PerovskiteOrderingGCNNs. MIT probabilistic pretrained single-paper - ML4pXRDs (🥉5 · ⭐ 1 · 💀) - Contains code to train neural networks based on simulated powder XRDs from synthetic crystals. MIT XRD single-paper - Per-Site CGCNN (🥉5 · ⭐ 1 · 💀) - Crystal graph convolutional neural networks for predicting material properties. MIT pretrained single-paper - Crystalformer (🥉4 · ⭐ 19) - The official code respository for Crystalformer: Infinitely Connected Attention for Periodic Structure Encoding (ICLR.. MIT transformer single-paper - Graph Transport Network (🥉4 · ⭐ 15 · 💀) - Graph transport network (GTN), as proposed in Scalable Optimal Transport in High Dimensions for Graph Distances,.. Custom transport-phenomena - gkx: Green-Kubo Method in JAX (🥉4 · ⭐ 7 · 💀) - Green-Kubo + JAX + MLPs = Anharmonic Thermal Conductivities Done Fast. MIT transport-phenomena - atom_by_atom (🥉3 · ⭐ 10 · 💀) - Atom-by-atom design of metal oxide catalysts for the oxygen evolution reaction with Machine Learning. Unlicensed surface-science single-paper - Element encoder (🥉3 · ⭐ 6 · 💀) - Autoencoder neural network to compress properties of atomic species into a vector representation. GPL-3.0 single-paper - Point Edge Transformer (🥉2) - Smooth, exact rotational symmetrization for deep learning on point clouds. CC-BY-4.0 - SphericalNet (🥉1 · ⭐ 3 · 💀) - Implementation of Clebsch-Gordan Networks (CGnet: https://arxiv.org/pdf/1806.09231.pdf) by GElib & cnine libraries in.. Unlicensed


Universal Potentials

Back to top

Machine-learned interatomic potentials (ML-IAP) that have been trained on large, chemically and structural diverse datasets. For materials, this means e.g. datasets that include a majority of the periodic table.

🔗 TeaNet - Universal neural network interatomic potential inspired by iterative electronic relaxations.. ML-IAP

🔗 PreFerred Potential (PFP) - Universal neural network potential for material discovery https://doi.org/10.1038/s41467-022-30687-9. ML-IAP proprietary

FAIRChem EquiformerV2 models (🥇28 · ⭐ 1.4K · 📈) - FAIRChem implementation of Equiformer V2 (eqV2) models. MIT pretrained UIP rep-learn catalysis - [GitHub](https://github.com/facebookresearch/fairchem) (👨‍💻 50 · 🔀 320 · 📋 340 - 4% open · ⏱️ 21.05.2025):
git clone https://github.com/FAIR-Chem/fairchem
- [PyPi](https://pypi.org/project/fairchem-core) (📥 5.9K / month · 📦 10 · ⏱️ 21.05.2025):
pip install fairchem-core
FAIRChem eSEN models (🥇28 · ⭐ 1.4K · 📈) - FAIRChem implementation of Smooth Energy Network (eSEN) models arXiv:2502.12147. MIT pretrained UIP rep-learn catalysis - [GitHub](https://github.com/facebookresearch/fairchem) (👨‍💻 50 · 🔀 320 · 📋 340 - 4% open · ⏱️ 21.05.2025):
git clone https://github.com/FAIR-Chem/fairchem
- [PyPi](https://pypi.org/project/fairchem-core) (📥 5.9K / month · 📦 10 · ⏱️ 21.05.2025):
pip install fairchem-core
DPA-2 (🥈27 · ⭐ 1.7K) - A large atomic model as a multi-task learner https://arxiv.org/abs/2312.15492. LGPL-3.0 ML-IAP pretrained workflows datasets - [GitHub](https://github.com/deepmodeling/deepmd-kit) (👨‍💻 75 · 🔀 540 · 📥 52K · 📦 33 · 📋 900 - 10% open · ⏱️ 02.03.2025):
git clone https://github.com/deepmodeling/deepmd-kit
- [PyPi](https://pypi.org/project/deepmd-kit) (📥 3.9K / month · 📦 9 · ⏱️ 30.03.2025):
pip install deepmd-kit
- [Conda](https://anaconda.org/conda-forge/deepmd-kit) (📥 1.9M · ⏱️ 22.04.2025):
conda install -c conda-forge deepmd-kit
- [Docker Hub](https://hub.docker.com/r/deepmodeling/deepmd-kit) (📥 3.6K · ⭐ 1 · ⏱️ 05.03.2025):
docker pull deepmodeling/deepmd-kit
DeePMD-DPA3 (🥈27 · ⭐ 1.7K) - Successor of DPA-2. LGPL-3.0 ML-IAP pretrained workflows datasets - [GitHub](https://github.com/deepmodeling/deepmd-kit) (👨‍💻 75 · 🔀 540 · 📥 52K · 📦 33 · 📋 900 - 10% open · ⏱️ 02.03.2025):
git clone https://github.com/deepmodeling/deepmd-kit
- [PyPi](https://pypi.org/project/deepmd-kit) (📥 3.9K / month · 📦 9 · ⏱️ 30.03.2025):
pip install deepmd-kit
- [Conda](https://anaconda.org/conda-forge/deepmd-kit) (📥 1.9M · ⏱️ 22.04.2025):
conda install -c conda-forge deepmd-kit
- [Docker Hub](https://hub.docker.com/r/deepmodeling/deepmd-kit) (📥 3.6K · ⭐ 1 · ⏱️ 05.03.2025):
docker pull deepmodeling/deepmd-kit
SevenNet (🥈23 · ⭐ 180) - SevenNet - a graph neural network interatomic potential package supporting efficient multi-GPU parallel molecular.. GPL-3.0 ML-IAP MD pretrained - [GitHub](https://github.com/MDIL-SNU/SevenNet) (👨‍💻 16 · 🔀 30 · 📥 2.6K · 📦 16 · 📋 61 - 26% open · ⏱️ 21.05.2025):
git clone https://github.com/MDIL-SNU/SevenNet
- [PyPi](https://pypi.org/project/sevenn) (📥 13K / month · 📦 14 · ⏱️ 20.05.2025):
pip install sevenn
Orb Models (🥈21 · ⭐ 430) - ORB forcefield models from Orbital Materials. Custom ML-IAP pretrained - [GitHub](https://github.com/orbital-materials/orb-models) (👨‍💻 11 · 🔀 57 · 📦 15 · 📋 43 - 6% open · ⏱️ 30.04.2025):
git clone https://github.com/orbital-materials/orb-models
- [PyPi](https://pypi.org/project/orb-models) (📥 12K / month · 📦 12 · ⏱️ 30.04.2025):
pip install orb-models
CHGNet (🥈21 · ⭐ 300) - Pretrained universal neural network potential for charge-informed atomistic modeling https://chgnet.lbl.gov. Custom ML-IAP MD pretrained electrostatics magnetism structure-relaxation - [GitHub](https://github.com/CederGroupHub/chgnet) (👨‍💻 11 · 🔀 79 · 📦 59 · 📋 72 - 4% open · ⏱️ 14.04.2025):
git clone https://github.com/CederGroupHub/chgnet
- [PyPi](https://pypi.org/project/chgnet) (📥 16K / month · 📦 21 · ⏱️ 16.09.2024):
pip install chgnet
MACE-FOUNDATION models (🥉19 · ⭐ 700) - MACE foundation models (MP, OMAT, Matpes). MIT ML-IAP pretrained rep-learn MD - [GitHub](https://github.com/ACEsuit/mace-foundations) (👨‍💻 2 · 🔀 270 · 📥 130K · 📋 17 - 23% open · ⏱️ 28.03.2025):
git clone https://github.com/ACEsuit/mace-foundations
- [PyPi](https://pypi.org/project/mace-torch) (📥 36K / month · 📦 36 · ⏱️ 01.05.2025):
pip install mace-torch
MatterSim (🥉19 · ⭐ 400 · 📉) - MatterSim: A deep learning atomistic model across elements, temperatures and pressures. MIT ML-IAP active-learning multimodal phase-transition pretrained - [GitHub](https://github.com/microsoft/mattersim) (👨‍💻 17 · 🔀 50 · 📥 19 · 📋 28 - 39% open · ⏱️ 19.05.2025):
git clone https://github.com/microsoft/mattersim
- [PyPi](https://pypi.org/project/mattersim) (📥 79K / month · 📦 2 · ⏱️ 21.02.2025):
pip install mattersim
M3GNet (🥉16 · ⭐ 280) - Materials graph network with 3-body interactions featuring a DFT surrogate crystal relaxer and a state-of-the-art.. BSD-3 ML-IAP pretrained - [GitHub](https://github.com/materialsvirtuallab/m3gnet) (👨‍💻 16 · 🔀 68 · 📋 35 - 42% open · ⏱️ 07.04.2025):
git clone https://github.com/materialsvirtuallab/m3gnet
- [PyPi](https://pypi.org/project/m3gnet) (📥 980 / month · 📦 5 · ⏱️ 17.11.2022):
pip install m3gnet
MLIP Arena Leaderboard (🥉14 · ⭐ 56) - Fair and transparent benchmark of machine learning interatomic potentials (MLIPs), beyond basic error metrics. Apache-2 ML-IAP benchmarking - [GitHub](https://github.com/atomind-ai/mlip-arena) (👨‍💻 3 · 🔀 4 · 📦 2 · 📋 16 - 68% open · ⏱️ 22.05.2025):
git clone https://github.com/atomind-ai/mlip-arena
PET-MAD (🥉12 · ⭐ 63 · 🐣) - PET-MAD, a universal interatomic potential for advanced materials modeling. BSD-3 ML-IAP MD rep-learn transformer - [GitHub](https://github.com/lab-cosmo/pet-mad) (👨‍💻 7 · 🔀 3 · 📦 1 · 📋 2 - 50% open · ⏱️ 22.05.2025):
git clone https://github.com/lab-cosmo/pet-mad
- [PyPi](https://pypi.org/project/pet-mad) (📥 760 / month · ⏱️ 29.04.2025):
pip install pet-mad
- [Conda](https://anaconda.org/conda-forge/pet-mad):
conda install -c conda-forge pet-mad
GRACE (🥉11 · ⭐ 56) - GRACE models and gracemaker (as implemented in TensorPotential package). Custom ML-IAP pretrained MD rep-learn rep-eng - [GitHub](https://github.com/ICAMS/grace-tensorpotential) (👨‍💻 3 · 🔀 3 · 📦 5 · 📋 5 - 60% open · ⏱️ 02.04.2025):
git clone https://github.com/ICAMS/grace-tensorpotential
CHIPS-FF (🥉9 · ⭐ 36) - Evaluation of universal machine learning force-fields https://arxiv.org/abs/2412.10516. Custom benchmarking structure-optimization MD materials-discovery transport-phenomena - [GitHub](https://github.com/usnistgov/chipsff) (👨‍💻 3 · 🔀 4 · ⏱️ 06.02.2025):
git clone https://github.com/usnistgov/chipsff
ffonons (🥉7 · ⭐ 20) - Phonons from ML force fields. MIT benchmarking density-of-states - [GitHub](https://github.com/janosh/ffonons) (👨‍💻 2 · 🔀 2 · 📦 2 · ⏱️ 08.12.2024):
git clone https://github.com/janosh/ffonons
- [PyPi](https://pypi.org/project/ffonons) (📥 32 / month · ⏱️ 10.01.2024):
pip install ffonons
EScAIP (🥉6 · ⭐ 50) - [NeurIPS 2024] Official implementation of the Efficiently Scaled Attention Interatomic Potential. MIT ML-IAP rep-learn transformer single-paper - [GitHub](https://github.com/ASK-Berkeley/EScAIP) (👨‍💻 2 · 🔀 5 · 📥 5 · 📋 6 - 66% open · ⏱️ 06.03.2025):
git clone https://github.com/ASK-Berkeley/EScAIP
Joint Multidomain Pre-Training (JMP) (🥉5 · ⭐ 56 · 💤) - Code for From Molecules to Materials Pre-training Large Generalizable Models for Atomic Property Prediction. CC-BY-NC-4.0 pretrained ML-IAP general-tool - [GitHub](https://github.com/facebookresearch/JMP) (👨‍💻 2 · 🔀 7 · 📋 5 - 40% open · ⏱️ 22.10.2024):
git clone https://github.com/facebookresearch/JMP


Unsupervised Learning

Back to top

Projects that focus on unsupervised, semi- or self-supervised learning for atomistic ML, such as dimensionality reduction, clustering, contrastive learning, etc.

DADApy (🥇21 · ⭐ 130) - Distance-based Analysis of DAta-manifolds in python. Apache-2 - [GitHub](https://github.com/sissa-data-science/DADApy) (👨‍💻 21 · 🔀 21 · 📦 13 · 📋 38 - 28% open · ⏱️ 14.04.2025):
git clone https://github.com/sissa-data-science/DADApy
- [PyPi](https://pypi.org/project/dadapy) (📥 260 / month · ⏱️ 11.04.2025):
pip install dadapy
mat_discover (🥈13 · ⭐ 41 · 💤) - A materials discovery algorithm geared towards exploring high-performance candidates in new chemical spaces. MIT materials-discovery rep-eng HTC - [GitHub](https://github.com/sparks-baird/mat_discover) (👨‍💻 5 · 🔀 9 · 📋 40 - 72% open · ⏱️ 20.08.2024):
git clone https://github.com/sparks-baird/mat_discover
- [PyPi](https://pypi.org/project/mat_discover) (📥 520 / month · ⏱️ 23.06.2023):
pip install mat_discover
ASAP (🥈11 · ⭐ 140 · 💤) - ASAP is a package that can quickly analyze and visualize datasets of crystal or molecular structures. MIT - [GitHub](https://github.com/BingqingCheng/ASAP) (👨‍💻 6 · 🔀 28 · 📦 8 · 📋 26 - 26% open · ⏱️ 27.06.2024):
git clone https://github.com/BingqingCheng/ASAP
Show 7 hidden projects... - pumml (🥈11 · ⭐ 37 · 💀) - Positive and Unlabeled Materials Machine Learning (pumml) is a code that uses semi-supervised machine learning to.. MIT materials-discovery - Sketchmap (🥉8 · ⭐ 46 · 💀) - Suite of programs to perform non-linear dimensionality reduction -- sketch-map in particular. GPL-3.0 C++ - paper-ml-robustness-material-property (🥉5 · ⭐ 4 · 💀) - A critical examination of robustness and generalizability of machine learning prediction of materials properties. BSD-3 datasets single-paper - 3D-EMGP (🥉4 · ⭐ 34 · 💤) - [AAAI 2023] The implementation for the paper Energy-Motivated Equivariant Pretraining for 3D Molecular Graphs. MIT pretrained rep-learn single-paper - Coarse-Graining-Auto-encoders (🥉4 · ⭐ 21 · 💀) - Implementation of coarse-graining Autoencoders. Unlicensed single-paper - KmdPlus (🥉4 · ⭐ 7 · 💤) - This module contains a class for treating kernel mean descriptor (KMD), and a function for generating descriptors with.. MIT - Descriptor Embedding and Clustering for Atomisitic-environment Framework (DECAF) ( ⭐ 2) - Provides a workflow to obtain clustering of local environments in dataset of structures. Unlicensed


Visualization

Back to top

Projects that focus on visualization (viz.) for atomistic ML.

pymatviz (🥇23 · ⭐ 220) - A toolkit for visualizations in materials informatics. MIT general-tool probabilistic - [GitHub](https://github.com/janosh/pymatviz) (👨‍💻 11 · 🔀 27 · 📦 21 · 📋 56 - 12% open · ⏱️ 22.05.2025):
git clone https://github.com/janosh/pymatviz
- [PyPi](https://pypi.org/project/pymatviz) (📥 16K / month · 📦 6 · ⏱️ 02.05.2025):
pip install pymatviz
Crystal Toolkit (🥈22 · ⭐ 170) - Crystal Toolkit is a framework for building web apps for materials science and is currently powering the new Materials.. MIT - [GitHub](https://github.com/materialsproject/crystaltoolkit) (👨‍💻 31 · 🔀 60 · 📦 43 · 📋 130 - 50% open · ⏱️ 05.03.2025):
git clone https://github.com/materialsproject/crystaltoolkit
- [PyPi](https://pypi.org/project/crystal-toolkit) (📥 2K / month · 📦 10 · ⏱️ 25.01.2025):
pip install crystal-toolkit
Chemiscope (🥈19 · ⭐ 140) - An interactive structure/property explorer for materials and molecules. BSD-3 JavaScript - [GitHub](https://github.com/lab-cosmo/chemiscope) (👨‍💻 25 · 🔀 39 · 📥 460 · 📦 6 · 📋 140 - 27% open · ⏱️ 13.05.2025):
git clone https://github.com/lab-cosmo/chemiscope
- [npm](https://www.npmjs.com/package/chemiscope) (📥 63 / month · 📦 3 · ⏱️ 15.03.2023):
npm install chemiscope
ZnDraw (🥈19 · ⭐ 42) - A powerful tool for visualizing, modifying, and analysing atomistic systems. EPL-2.0 MD generative JavaScript - [GitHub](https://github.com/zincware/ZnDraw) (👨‍💻 7 · 🔀 4 · 📦 11 · 📋 360 - 27% open · ⏱️ 18.02.2025):
git clone https://github.com/zincware/ZnDraw
- [PyPi](https://pypi.org/project/zndraw) (📥 1.6K / month · 📦 5 · ⏱️ 19.02.2025):
pip install zndraw
Elementari (🥉17 · ⭐ 150) - Interactive browser visualizations for materials science: periodic tables, 3d crystal structures, Bohr atoms, nuclei,.. MIT JavaScript - [GitHub](https://github.com/janosh/elementari) (👨‍💻 2 · 🔀 16 · 📦 4 · 📋 7 - 28% open · ⏱️ 10.05.2025):
git clone https://github.com/janosh/elementari
- [npm](https://www.npmjs.com/package/elementari) (📥 900 / month · 📦 2 · ⏱️ 10.05.2025):
npm install elementari
Show 1 hidden projects... - Atomvision (🥉12 · ⭐ 34 · 💀) - Deep learning framework for atomistic image data. Custom computer-vision experimental-data rep-learn


Wavefunction methods (ML-WFT)

Back to top

Projects and models that focus on quantities of wavefunction theory methods, such as Monte Carlo techniques like deep learning variational Monte Carlo (DL-VMC), quantum chemistry methods, etc.

DeepQMC (🥇18 · ⭐ 380 · 💤) - Deep learning quantum Monte Carlo for electrons in real space. MIT - [GitHub](https://github.com/deepqmc/deepqmc) (👨‍💻 13 · 🔀 63 · 📦 3 · 📋 52 - 5% open · ⏱️ 23.10.2024):
git clone https://github.com/deepqmc/deepqmc
- [PyPi](https://pypi.org/project/deepqmc) (📥 210 / month · ⏱️ 24.09.2024):
pip install deepqmc
FermiNet (🥈16 · ⭐ 770) - An implementation of the Fermionic Neural Network for ab-initio electronic structure calculations. Apache-2 transformer - [GitHub](https://github.com/google-deepmind/ferminet) (👨‍💻 21 · 🔀 150 · 📋 67 - 4% open · ⏱️ 17.03.2025):
git clone https://github.com/google-deepmind/ferminet
DeepErwin (🥈8 · ⭐ 54) - DeepErwin is a python 3.8+ package that implements and optimizes JAX 2.x wave function models for numerical solutions.. Custom - [GitHub](https://github.com/mdsunivie/deeperwin) (👨‍💻 9 · 🔀 8 · 📥 15 · 📦 2 · ⏱️ 18.04.2025):
git clone https://github.com/mdsunivie/deeperwin
- [PyPi](https://pypi.org/project/deeperwin) (📥 69 / month · ⏱️ 14.12.2021):
pip install deeperwin
JaQMC (🥉6 · ⭐ 75) - JAX accelerated Quantum Monte Carlo. Apache-2 - [GitHub](https://github.com/bytedance/jaqmc) (👨‍💻 2 · 🔀 8 · ⏱️ 10.03.2025):
git clone https://github.com/bytedance/jaqmc
LapNet (🥉5 · ⭐ 62) - Efficient and Accurate Neural-Network Ansatz for Quantum Monte Carlo. Apache-2 - [GitHub](https://github.com/bytedance/LapNet) (👨‍💻 4 · 🔀 12 · ⏱️ 04.12.2024):
git clone https://github.com/bytedance/LapNet
Show 2 hidden projects... - SchNOrb (🥉6 · ⭐ 63 · 💀) - Unifying machine learning and quantum chemistry with a deep neural network for molecular wavefunctions. MIT - ACEpsi.jl (🥉6 · ⭐ 2 · 💀) - ACE wave function parameterizations. MIT rep-eng Julia


Others

Back to top

Show 1 hidden projects...

Contribution

Contributions are encouraged and always welcome! If you like to add or update projects, choose one of the following ways:

  • Open an issue by selecting one of the provided categories from the issue page and fill in the requested information.
  • Modify the projects.yaml with your additions or changes, and submit a pull request. This can also be done directly via the Github UI.

If you like to contribute to or share suggestions regarding the project metadata collection or markdown generation, please refer to the best-of-generator repository. If you like to create your own best-of list, we recommend to follow this guide.

For more information on how to add or update projects, please read the contribution guidelines. By participating in this project, you agree to abide by its Code of Conduct.

License

CC0

Source

BibTeX Generator

Have you ever found yourself weary and uninspired from the tedious task of manually creating BibTeX entries for your paper?

There are, indeed, support tools and plugins that are bundled with reference managers such as Zotero, Mendeley, etc. These tools can automate the generation of a .bib file. To use them, you need to install a reference manager, its associated plugins, and a library of papers on your computer. However, these tools are not flawless. The BibTeX entries they generate often contain incomplete information, are poorly formatted, and include numerous unnecessary fields. You then still need to manually check and correct the entries.

There are the times you just need to cite a paper or two, and you don't want to go through the hassle of the aforementioned complex process. In such situations, a simple tool that allows you to quickly copy and paste a BibTeX entry into your .bib file would be ideal. Think of such a simple tool, I have looked around the Chrome extension store to see if there is any that can pick up the Bibtex while you are browsing the paper. I found some, but they do not really work.

Therefore, I decided to create my own tool to address this dilemma. I developed a Chrome extension that can generate the BibTeX entry for any browsing URL with just one click. I named it the 1click BibTeX. It delivers exactly what it is expected and has proven to be quite helpful. This extension, along with the Latex tools, will ensure that the manuscript's citations are properly formatted before they are delivered to the journal.

Usage

Install the 1click BibTeX extension on your Chrome browser. Then, whenever you're browsing a paper or any URL, just click on the extension icon, and the BibTeX entry will be instantly generated and copied to your clipboard. The remaining thing is just paste it to your .bib file.

BibTeX generator

I've tested the extension on numerous publishers and websites with varying structures and it works consistently as it was designed. The tested publishers include Elsevier, Wiley, ACS, IOP, AIP, APS, arXiv,...

Below are some examples of BibTeX entries generated by the extension 1click BibTeX:

@article{nguyen2019pattern,
    title = {Pattern transformation induced by elastic instability of metallic porous structures},
    author = {Cao Thang Nguyen and Duc Tam Ho and Seung Tae Choi and Doo-Man Chun and Sung Youb Kim },
    year = {2019},
    month = {2},
    journal = {Computational Materials Science},
    publisher = {Elsevier},
    volume = {157},
    pages = {17-24},
    doi = {10.1016/j.commatsci.2018.10.023},
    url = {https://www.sciencedirect.com/science/article/abs/pii/S0927025618306955?via%3Dihub},
    accessDate = {Jan 25, 2024}
}
@article{nguyen2024an,
    title = {An Enhanced Sampling Approach for Computing the Free Energy of Solid Surface and Solid–Liquid Interface},
    author = {Cao Thang Nguyen and Duc Tam Ho and Sung Youb Kim},
    year = {2024},
    month = {1},
    journal = {Advanced Theory and Simulations},
    publisher = {John Wiley & Sons, Ltd},
    volume = {7},
    number = {1},
    pages = {2300538},
    doi = {10.1002/adts.202300538},
    url = {https://onlinelibrary.wiley.com/doi/10.1002/adts.202300538},
    accessDate = {Jan 25, 2024}
}
@book{daum2003america,,
    title = {America, the Vietnam War, and the World},
    author = {Andreas W. Daum and Lloyd C. Gardner and Wilfried Mausbach},
    year = {2003},
    month = {7},
    publisher = {Cambridge University Press},
    isbn = {052100876X},
    url = {https://www.google.co.kr/books/edition/America_the_Vietnam_War_and_the_World/9kn6qYwsGs4C?hl=en&gbpv=0},
    accessDate = {Jan 25, 2024}
}
@book{rickards2011currency,
    title = {Currency Wars},
    author = {James Rickards},
    year = {2011},
    month = {11},
    publisher = {Penguin},
    isbn = {110155889X},
    url = {https://books.google.co.kr/books?id=-GDwL2s5sJoC&source=gbs_book_other_versions},
    accessDate = {Jan 25, 2024}
}
@misc{deci2024introducing,
    title = {Introducing DeciCoder-6B: The Best Multi-Language Code LLM in Its Class},
    author = {Deci},
    year = {2024},
    month = {1},
    publisher = {Deci},
    url = {https://deci.ai/blog/decicoder-6b-the-best-multi-language-code-generation-llm-in-its-class/},
    accessDate = {Jan 25, 2024}
}
@misc{kai2023forcefield,
    title = {Force-field files for "Noble gas (He, Ne and Ar) solubilities in high-pressure silicate melts calculated based on deep potential modeling"},
    author = {Wang, Kai and Lu, Xiancai and Liu, Xiandong and Yin, Kun},
    year = {2023},
    month = {3},
    publisher = {Zenodo},
    doi = {10.5281/zenodo.7751762},
    url = {https://zenodo.org/records/7751762},
    accessDate = {Jan 25, 2024}
}
  • Bibtex this page
@misc{nguyen2024bibtex,
    title = {BibTeX Generator},
    author = {Cao Thang Nguyen},
    year = {2024},
    month = {1},
    url = {https://thangckt.github.io/blog/2024/01/25/bibtex_generator},
    accessDate = {Jan 25, 2024}
}

In summary, the new extension 1click BibTeX works well for most websites with varying data structures.

Accelerated Molecular Simulation Using Deep Potential Workflow with NGC

Credit: NVIDIA's blog

Molecular simulation communities have faced the accuracy-versus-efficiency dilemma in modeling the potential energy surface and interatomic forces for decades. Deep Potential, the artificial neural network force field, solves this problem by combining the speed of classical molecular dynamics (MD) simulation with the accuracy of density functional theory (DFT) calculation.1 This is achieved by using the GPU-optimized package DeePMD-kit, which is a deep learning package for many-body potential energy representation and MD simulation.2

This post provides an end-to-end demonstration of training a neural network potential for the 2D material graphene and using it to drive MD simulation in the open-source platform Large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS).3 Training data can be obtained either from the Vienna Ab initio Simulation Package (VASP)4, or Quantum ESPRESSO (QE).5

A seamless integration of molecular modeling, machine learning, and high-performance computing (HPC) is demonstrated with the combined efficiency of molecular dynamics with ab initio accuracy — that is entirely driven through a container-based workflow. Using AI techniques to fit the interatomic forces generated by DFT, the accessible time and size scales can be boosted several orders of magnitude with linear scaling.

Deep potential is essentially a combination of machine learning and physical principles, which start a new computing paradigm as shown in Figure 1.

The image shows the new computing paradigm that combines molecular modeling, machine learning and high-performance computing to understand the interatomic forces of molecules compared to the traditional methods.


Figure 1. A new computing paradigm composed of molecular modeling, AI, and HPC. (Figure courtesy: Dr. Linfeng Zhang, DP Technology)

The entire workflow is shown in Figure 2. The data generation step is done with VASP and QE. The data preparation, model training, testing, and compression steps are done using DeePMD-kit. The model deployment is in LAMMPS.

This figure displays the workflow of training and deploying a deep potential model. The workflow includes data generation, data preparation, model training, model testing, model compression, and model deployment.


Figure 2. Diagram of the DeePMD workflow.

Why Containers?

A container is a portable unit of software that combines the application, and all its dependencies, into a single package that is agnostic to the underlying host OS.

The workflow in this post involves AIMD, DP training, and LAMMPS MD simulation. It is nontrivial and time-consuming to install each software package from source with the correct setup of the compiler, MPI, GPU library, and optimization flags.

Containers solve this problem by providing a highly optimized GPU-enabled computing environment for each step, and eliminates the time to install and test software.

The NGC catalog, a hub of GPU-optimized HPC and AI software, carries a whole of HPC and AI containers that can be readily deployed on any GPU system. The HPC and AI containers from the NGC catalog are updated frequently and are tested for reliability and performance — necessary to speed up the time to solution.

These containers are also scanned for Common Vulnerabilities and Exposure (CVEs), ensuring that they are devoid of any open ports and malware. Additionally, the HPC containers support both Docker and Singularity runtimes, and can be deployed on multi-GPU and multinode systems running in the cloud or on-premises.

Training data generation

The first step in the simulation is data generation. We will show you how you can use VASP and Quantum ESPRESSO to run AIMD simulations and generate training datasets for DeePMD. All input files can be downloaded from the GitHub repository using the following command:

git clone https://github.com/deepmodeling/SC21_DP_Tutorial.git

VASP

A two-dimensional graphene system with 98-atoms is used as shown in Figure 3.6 To generate the training datasets, 0.5ps NVT AIMD simulation at 300 K is performed. The time step chosen is 0.5fs. The DP model is created using 1000 time steps from a 0.5ps MD trajectory at a fixed temperature.

Due to the short simulation time, the training dataset contains consecutive system snapshots, which are highly correlated. Generally, the training dataset should be sampled from uncorrelated snapshots with various system conditions and configurations. For this example, we used a simplified training data scheme. For production DP training, using DP-GEN is recommended to utilize the concurrent learning scheme to efficiently explore more combinations of conditions.7

The projector-augmented wave pseudopotentials are employed to describe the interactions between the valence electrons and frozen cores. The generalized gradient approximation exchange−correlation functional of Perdew−Burke−Ernzerhof. Only the Γ-point was used for k-space sampling in all systems.

This figure displays the top view of a single layer graphene system with 98 carbon atoms.


Figure 3. A graphene system composed of 98 carbon atoms is used in AIMD simulation.

Quantum Espresso

The AIMD simulation can also be carried out using Quantum ESPRESSO, available as a container from the NGC Catalog. Quantum ESPRESSO is an integrated suite of open-source computer codes for electronic-structure calculations and materials modeling at the nanoscale based on density-functional theory, plane waves, and pseudopotentials. The same graphene structure is used in the QE calculations. The following command can be used to start the AIMD simulation:

$ singularity exec --nv docker://nvcr.io/hpc/quantum_espresso:qe-6.8 cp.x
< c.md98.cp.in

Training data preparation

Once the training data is obtained from AIMD simulation, we want to convert its format using dpdata so that it can be used as input to the deep neural network. The dpdata package is a format conversion toolkit between AIMD, classical MD, and DeePMD-kit.

You can use the convenient tool dpdata to convert data directly from the output of first-principles packages to the DeePMD-kit format. For deep potential training, the following information of a physical system has to be provided: atom type, box boundary, coordinate, force, viral, and system energy.

A snapshot, or a frame of the system, contains all these data points for all atoms at one-time step, which can be stored in two formats, that is raw and npy.

The first format raw is plain text with all information in one file, and each line of the file represents a snapshot. Different system information is stored in different files named as box.raw, coord.raw, force.raw, energy.raw, and virial.raw. We recommended you follow these naming conventions when preparing the training files.

An example of force.raw:

$ cat force.raw
-0.724  2.039 -0.951  0.841 -0.464  0.363
 6.737  1.554 -5.587 -2.803  0.062  2.222
-1.968 -0.163  1.020 -0.225 -0.789  0.343

This force.raw contains three frames, with each frame having the forces of two atoms, resulting in three lines and six columns. Each line provides all three force components of two atoms in one frame. The first three numbers are the three force components of the first atom, while the next three numbers are the force components of the second atom.

The coordinate file coord.raw is organized similarly. In box.raw, the nine components of the box vectors should be provided on each line. In virial.raw, the nine components of the virial tensor should be provided on each line in the order XX XY XZ YX YY YZ ZX ZY ZZ. The number of lines of all raw files should be identical. We assume that the atom types do not change in all frames. It is provided by type.raw, which has one line with the types of atoms written one by one.

The atom types should be integers. For example, the type.raw of a system that has two atoms with zero and one:

$ cat type.raw
0 1

It is not a requirement to convert the data format to raw, but this process should give a sense on the types of data that can be used as inputs to DeePMD-kit for training.

The easiest way to convert the first-principles results to the training data is to save them as numpy binary data.

For VASP output, we have prepared an outcartodata.py script to process the VASP OUTCAR file. By running the commands:

$ cd SC21_DP_Tutorial/AIMD/VASP/
$ singularity exec --nv docker://nvcr.io/hpc/deepmd-kit:v2.0.3 python outcartodata.py
$ mv deepmd_data ../../DP/

For QE output:

$ cd SC21_DP_Tutorial/AIMD/QE/
$ singularity exec --nv docker://nvcr.io/hpc/deepmd-kit:v2.0.3 python logtodata.py
$ mv deepmd_data ../../DP/

A folder called deepmd_data is generated and moved to the training directory. It generates five sets 0/set.000, 1/set.000, 2/set.000, 3/set.000, 4/set.000, with each set containing 200 frames. It is not required to take care of the binary data files in each of the set.* directories. The path containing the set.* folder and type.raw file is called a system. If you want to train a nonperiodic system, an empty nopbc file should be placed under the system directory. box.raw is not necessary as it is a nonperiodic system.

We are going to use three of the five sets for training, one for validating, and the remaining one for testing.

Deep Potential model training

The input of the deep potential model is a descriptor vector containing the system information mentioned previously. The neural network contains several hidden layers with a composition of linear and nonlinear transformations. In this post, a three layer-neural network with 25, 50 and 100 neurons in each layer is used. The target value, or the label, for the neural network to learn is the atomic energies. The training process optimizes the weights and the bias vectors by minimizing the loss function.

The training is initiated by the command where input.json contains the training parameters:

$ singularity exec --nv docker://nvcr.io/hpc/deepmd-kit:v2.0.3 dp train input.json

The DeePMD-kit prints detailed information on the training and validation data sets. The data sets are determined by training_data and validation_data as defined in the training section of the input script. The training dataset is composed of three data systems, while the validation data set is composed of one data system. The number of atoms, batch size, number of batches in the system, and the probability of using the system are all shown in Figure 4. The last column presents if the periodic boundary condition is assumed for the system.

This image is a screenshot of the DP training output. Summaries of the training and validation dataset are shown with detailed information on the number of atoms, batch size, number of batches in the system and the probability of using the system.


Figure 4. Screenshot of the DP training output.

During the training, the error of the model is tested every disp_freq training step with the batch used to train the model and with numb_btch batches from the validating data. The training error and validation error are printed correspondingly in the file disp_file (default is lcurve.out). The batch size can be set in the input script by the key batch_size in the corresponding sections for training and validation data set.

An example of the output:

#  step      rmse_val    rmse_trn    rmse_e_val  rmse_e_trn    rmse_f_val  rmse_f_trn         lr
      0      3.33e+01    3.41e+01      1.03e+01    1.03e+01      8.39e-01    8.72e-01    1.0e-03
    100      2.57e+01    2.56e+01      1.87e+00    1.88e+00      8.03e-01    8.02e-01    1.0e-03
    200      2.45e+01    2.56e+01      2.26e-01    2.21e-01      7.73e-01    8.10e-01    1.0e-03
    300      1.62e+01    1.66e+01      5.01e-02    4.46e-02      5.11e-01    5.26e-01    1.0e-03
    400      1.36e+01    1.32e+01      1.07e-02    2.07e-03      4.29e-01    4.19e-01    1.0e-03
    500      1.07e+01    1.05e+01      2.45e-03    4.11e-03      3.38e-01    3.31e-01    1.0e-03

The training error reduces monotonically with training steps as shown in Figure 5. The trained model is tested on the test dataset and compared with the AIMD simulation results. The test command is:

$ singularity exec --nv docker://nvcr.io/hpc/deepmd-kit:v2.0.3 dp test -m frozen_model.pb -s deepmd_data/4/ -n 200 -d detail.out

This image shows the total training loss, energy loss, force loss and learning rate decay with training steps from 0 to 1,000,000. Both the training and validation loss decrease monotonically with training steps.


Figure 5. Training loss with steps

The results are shown in Figure 6.

This image displays the inferenced energy and force in the y-axis, and the ground true on the x-axis. The inferenced values soundly coincide with the ground truth with all data distributed in the diagonal direction.


Figure 6. Test of the prediction accuracy of trained DP model with AIMD energies and forces.

Model export and compression

After the model has been trained, a frozen model is generated for inference in MD simulation. The process of saving neural network from a checkpoint is called “freezing” a model:

$ singularity exec --nv docker://nvcr.io/hpc/deepmd-kit:v2.0.3 dp freeze -o graphene.pb

After the frozen model is generated, the model can be compressed without sacrificing its accuracy; while greatly speeding up the inference performance in MD. Depending on simulation and training setup, model compression can boost performance by 10X, and reduce memory consumption by 20X when running on GPUs.

The frozen model can be compressed using the following command where -i refers to the frozen model and -o points to the output name of the compressed model:

$ singularity exec --nv docker://nvcr.io/hpc/deepmd-kit:v2.0.3 dp compress -i graphene.pb -o graphene-compress.pb

Model deployment in LAMMPS

A new pair-style has been implemented in LAMMPS to deploy the trained neural network in prior steps. For users familiar with the LAMMPS workflow, only minimal changes are needed to switch to deep potential. For instance, a traditional LAMMPS input with Tersoff potential has the following setting for potential setup:

pair_style      tersoff
pair_coeff      * * BNC.tersoff C

To use deep potential, replace previous lines with:

pair_style      deepmd graphene-compress.pb
pair_coeff      * *

The pair_style command in the input file uses the DeePMD model to describe the atomic interactions in the graphene system.

The graphene-compress.pb file represents the frozen and compressed model for inference. The graphene system in MD simulation contains 1,560 atoms. Periodic boundary conditions are applied in the lateral x– and y-directions, and free boundary is applied to the z-direction. The time step is set as 1 fs. The system is placed under NVT ensemble at temperature 300 K for relaxation, which is consistent with the AIMD setup. The system configuration after NVT relaxation is shown in Figure 7. It can be observed that the deep potential can describe the atomic structures with small ripples in the cross-plane direction. After 10ps NVT relaxation, the system is placed under NVE ensemble to check system stability.

The image displays the side view of the single layer graphene system after thermal relaxation in LAMMPS.


Figure 7. Atomic configuration of the graphene system after relaxation with deep potential.

The system temperature is shown in Figure 8.

The image displays the temperature profiles of the graphene system under NVT and NVE ensembles from 0 to 20 picoseconds. The first 10 picosecond is NVT and the second 10 picosecond is NVE.


Figure 8. System temperature under NVT and NVE ensembles. The MD system driven by deep potential is very stable after relaxation.

To validate the accuracy of the trained DP model, the calculated radial distribution function (RDF) from AIMD, DP and Tersoff, are plotted in Figure 9. The DP model-generated RDF is very close to that of AIMD, which indicates that the crystalline structure of graphene can be well presented by the DP model.

This image displays the plotted radial distribution function from three different methods, including DP, Tersoff and AIMD, which are denoted in black, red and blue solid lines respectively.


Figure 9. Radial distribution function calculated by AIMD, DP and Tersoff potential, respectively. It can be observed that the RDF calculated by DP is very close to that of AIMD.

Conclusion

This post demonstrates a simple case study of graphene under given conditions. The DeePMD-kit package streamlines the workflow from AIMD to classical MD with deep potential, providing the following key advantages:

Highly automatic and efficient workflow implemented in the TensorFlow framework. APIs with popular DFT and MD packages such as VASP, QE, and LAMMPS. Broad applications in organic molecules, metals, semiconductors, insulators, and more. Highly efficient code for HPC with MPI and GPU support. Modularization for easy adoption by other deep learning potential models. Furthermore, the use of GPU-optimized containers from the NGC catalog simplifies and accelerates the overall workflow by eliminating the steps to install and configure software. To train a comprehensive model for other applications, download the DeepMD Kit Container from the NGC catalog.

References

[1] Jia W, Wang H, Chen M, Lu D, Lin L, Car R, E W and Zhang L 2020 Pushing the limit of molecular dynamics with ab initio accuracy to 100 million atoms with machine learning IEEE Press 5 1-14

[2] Wang H, Zhang L, Han J and E W 2018 DeePMD-kit: A deep learning package for many-body potential energy representation and molecular dynamics Computer Physics Communications 228 178-84

[3] Plimpton S 1995 Fast Parallel Algorithms for Short-Range Molecular Dynamics Journal of Computational Physics 117 1-19

[4] Kresse G and Hafner J 1993 Ab initio molecular dynamics for liquid metals Physical Review B 47 558-61

[5] Giannozzi P, Baroni S, Bonini N, Calandra M, Car R, Cavazzoni C, Ceresoli D, Chiarotti G L, Cococcioni M, Dabo I, Dal Corso A, de Gironcoli S, Fabris S, Fratesi G, Gebauer R, Gerstmann U, Gougoussis C, Kokalj A, Lazzeri M, Martin-Samos L, Marzari N, Mauri F, Mazzarello R, Paolini S, Pasquarello A, Paulatto L, Sbraccia C, Scandolo S, Sclauzero G, Seitsonen A P, Smogunov A, Umari P and Wentzcovitch R M 2009 QUANTUM ESPRESSO: a modular and open-source software project for quantum simulations of materials Journal of Physics: Condensed Matter 21 395502

[6] Humphrey W, Dalke A and Schulten K 1996 VMD: Visual molecular dynamics Journal of Molecular Graphics 14 33-8

[7] Yuzhi Zhang, Haidi Wang, Weijie Chen, Jinzhe Zeng, Linfeng Zhang, Han Wang, and Weinan E, DP-GEN: A concurrent learning platform for the generation of reliable deep learning based potential energy models, Computer Physics Communications, 2020, 107206.