`clff` Documentation¤

`clff` ¤

.

CLFF: Automated Framework for Concurrent Learning Force Fields and Material Property Calculations.

Concurrent Learning Force Field (CLFF) Framework¤

Automating the Lifecycle of Machine Learning Interatomic Potentials across Heterogeneous HPC Infrastructures.¤

CLFF is an open-source, end-to-end MLOps framework designed to eliminate the data-curation bottleneck in computational materials science. By transitioning from static, human-guided sampling to an automated, dynamic concurrent learning paradigm, CLFF orchestrates the entire concurrent learning lifecycle: Model Training → MD Exploration → Uncertainty Quantification → DFT Labeling → Dataset Augmentation, requiring zero user intervention once launched.

🚀 Key Features¤

🧠 Automated Concurrent Learning¤

Ensemble-Based UQ: Automatically tracks model discrepancies across multiple initializations to isolate regions of high configurational uncertainty.
Multi-Property Filtering: Classifies configurations into Accurate, Candidate, or Inaccurate/Unphysical states using strict, configurable thresholds for Energy, Forces, and Stress tensors.
Fault-Tolerant Loops: Features automated task filtering to skip redundant configurations, early MD termination for anomalous box behavior, and seamless workflow resumption following hardware disruptions.

🌐 Asynchronous Multi-Cluster Orchestration (The CLFF Superpower)¤

Heterogeneous Pool Aggregation: Powered by a robust backend, CLFF enables completely different computing assets — such as Commercial Cloud, national supercomputing centers, campus clusters, and local workstations — to collaborate seamlessly within a unified environment.
Proportional Workload Distribution: Specify an exact ratio of tasks (e.g., 60% to Cluster A, 40% to Cluster B) right in your configuration file.
Universal Protocol Compatibility: Native support for diverse connectivity protocols (Local, SSH, Cloud APIs, etc.) and major job schedulers (SLURM, SGE, PBS, TORQUE, etc.).

🔬 Deep Physics & Calculator Ecosystem¤

Modern Architectures: Native, extensible support for cutting-edge graph-based MLIP architectures including SevenNet, MACE, NequIP, TensorNet, etc..
Flexible Interfaces: Out-of-the-box integration with LAMMPS, ASE, and GPAW, alongside innate support for any DFT/MD calculator featuring an ASE interface.
Enhanced Chemical Sampling: Full compatibility with PLUMED for complex chemical landscaping (Metadynamics, Umbrella Sampling) and support for Grimme-D3 van der Waals corrections in both MD and DFT tasks.

📊 Batteries-Included Downstream Workflows¤

Beyond dataset curation, CLFF features modular pipelines to automatically calculate, post-process, and plot downstream material properties using your trained models:

🗺️ Potential Energy Surface (PES) Scans
🎼 Phonon Dispersion Curves (integrated with Phonopy)
📐 Elastic Constant Tensors

💡 The CLFF Advantage: Real-World Scaling¤

Scenario 1: Aggregating Fragmented Assets¤

Imagine combining entirely disconnected computing environments—a local workstation, a university cluster, and a cloud instance—and having them dynamically execute a complex active learning loop without manual script editing or cross-server file transferring. CLFF acts as a centralized orchestrator, managing data routing and job tracking on the fly.

Scenario 2: Splitting CPU/GPU Computations Simultaneously¤

Consider a practical scenario where you must label thousands of atomic structures via DFT across two heterogeneous clusters:

Cluster 1: Equipped with GPU + CPU nodes under a SLURM scheduler.
Cluster 2: CPU-only nodes under an SGE scheduler.

With a simple adjustment to a YAML file, CLFF can simultaneously route 50% of your DFT calculations to Cluster 1's GPUs, 25% to Cluster 1's CPUs, and the remaining 25% to Cluster 2's CPUs. It generates the native submission scripts, monitors real-time completion status, handles scheduling bottlenecks, and unifies the resulting data back into your training pool automatically.

🛠️ How it Works¤

The design principle behind CLFF is simple: Maximum automation, minimum overhead. The only task required from you is to provide two intuitive, human-readable YAML configuration files—one defining your physical parameters, the other mapping your machine credentials.

clff_cl parameters.yml machines.yml

Once executed, you can safely fasten your seatbelt, track real-time progress via our clean console logger, and enjoy the ride while CLFF builds your production-grade force field.

clff Documentation¤

clff ¤