Skip to content

alff.al¤

alff.al ¤

Active Learning package.

Modules:

active_learning ¤

Active Learning module.

Classes:

Functions:

  • stage_train

    Stage function for ML training tasks.

  • stage_md

    Stage function for MD exploration tasks.

  • stage_dft

    Stage function for DFT labeling tasks.

WorkflowActiveLearning(params_file: str, machines_file: str) ¤

Bases: Workflow

Workflow for active learning. Notes: Need to redefine .run() method, since the Active Learning workflow is different from the base class.

Methods:

Attributes:

stage_map = {'ml_train': stage_train, 'md_explore': stage_md, 'dft_label': stage_dft} instance-attribute ¤
wf_name = 'ACTIVE LEARNING' instance-attribute ¤
params_file = params_file instance-attribute ¤
machines_file = machines_file instance-attribute ¤
schema_file = schema_file instance-attribute ¤
multi_mdicts = config_machine.multi_mdicts instance-attribute ¤
pdict = Config.loadconfig(self.params_file) instance-attribute ¤
stage_list = self._load_stage_list() instance-attribute ¤
run() ¤

stage_train(iter_idx, pdict, mdict) ¤

Stage function for ML training tasks.

This function includes: preparing training data and args, running training, and postprocessing. - collect data files - prepare training args based on MLP engine

stage_md(iter_idx, pdict, mdict) ¤

Stage function for MD exploration tasks.

Including: pre, run, post MD. - Collect initial configurations - Prepare MD args - Submit MD jobs to remote machines - Postprocess MD results

stage_dft(iter_idx, pdict, mdict) ¤

Stage function for DFT labeling tasks. Including: pre, run, post DFT.

finetune ¤

Fine-tuning module.

Classes:

  • WorkflowFinetune

    Workflow for fine-tuning the existed ML models or train a new ML model.

Functions:

  • stage_train

    Stage function for ML training tasks.

WorkflowFinetune(params_file: str, machines_file: str) ¤

Bases: Workflow

Workflow for fine-tuning the existed ML models or train a new ML model. Needs to override self.stage_list in base class, because the stages are fixed here.

Methods:

  • run

    The main function to run the workflow. This default implementation works for simple workflow,

Attributes:

stage_map = {'ml_train': stage_train} instance-attribute ¤
wf_name = 'FINE-TUNING' instance-attribute ¤
stage_list = ['ml_train'] instance-attribute ¤
params_file = params_file instance-attribute ¤
machines_file = machines_file instance-attribute ¤
schema_file = schema_file instance-attribute ¤
multi_mdicts = config_machine.multi_mdicts instance-attribute ¤
pdict = Config.loadconfig(self.params_file) instance-attribute ¤
run() ¤

The main function to run the workflow. This default implementation works for simple workflow, for more complex workflow (e.g. with iteration like active learning), need to reimplement this .run() function.

Notes: - Force garbage collection before running the workflow to release unreachable objects (may reduce retained memory from previous stages such as prepare())..

stage_train(pdict, mdict) ¤

Stage function for ML training tasks.

libal_md_ase ¤

Library for ASE MD with SevenNet model.

Classes:

Functions:

OperAlmdAseSevennet(work_dir, pdict, multi_mdict, mdict_prefix='md') ¤

Bases: RemoteOperation

This class runs ASE md for a list of structures in task_dirs.

Methods:

Attributes:

op_name = 'ASE MD with SevenNet' instance-attribute ¤
task_filter = {'has_files': [K.FILE_FRAME_UNLABEL], 'no_files': ['committee_error.txt']} instance-attribute ¤
work_dir = work_dir instance-attribute ¤
pdict = pdict instance-attribute ¤
mdict_list = self._select_machines(multi_mdicts, mdict_prefix) instance-attribute ¤
task_dirs = self._load_task_dirs() instance-attribute ¤
commandlist_list: list[list[str]] instance-attribute ¤
forward_files: list[str] instance-attribute ¤
backward_files: list[str] instance-attribute ¤
forward_common_files: list[str] instance-attribute ¤
backward_common_files: list[str] = [] instance-attribute ¤
prepare() ¤

Prepare MD tasks.

Includes: - Prepare the task_list - Prepare forward & backward files - Prepare commandlist_list for multi-remote submission

postprocess() ¤
run() ¤

Function to submit jobs to remote machines.

Notes
  • Orginal taks_dirs is relative to run_dir, and should not be changed. But the sumbmission function needs taks_dirs relative path to work_dir, so we make temporary change here.

premd_ase_sevenn(work_dir, pdict, mdict) ¤

Prepare MD args.

Includes: - copy ML models to work_dir - collect initial configurations - prepare ASE args - generate task_dirs for ranges of temperature and press

temperature_press_mdarg_ase(struct_dirs: list, temperature_list: list = [], press_list: list = [], ase_argdict: dict = {}) -> list ¤

Generate the task_dirs for ranges of temperatures and stresses.

Parameters:

  • struct_dirs (list) –

    List of dirs contains configuration files.

  • temperature_list (list, default: [] ) –

    List of temperatures.

  • press_list (list, default: [] ) –

    List of stresses.

  • ase_argdict (dict, default: {} ) –

libal_md_lammps ¤

Library for LAMMPS MD with SevenNet model.

Classes:

Functions:

OperAlmdLammpsSevennet(work_dir, pdict, multi_mdict, mdict_prefix='md') ¤

Bases: RemoteOperation

This class runs LAMMPS md for a list of structures in task_dirs.

Methods:

Attributes:

op_name = 'LAMMPS MD with SevenNet' instance-attribute ¤
task_filter = {'has_files': ['conf.lmpdata'], 'no_files': ['committee_error.txt']} instance-attribute ¤
work_dir = work_dir instance-attribute ¤
pdict = pdict instance-attribute ¤
mdict_list = self._select_machines(multi_mdicts, mdict_prefix) instance-attribute ¤
task_dirs = self._load_task_dirs() instance-attribute ¤
commandlist_list: list[list[str]] instance-attribute ¤
forward_files: list[str] instance-attribute ¤
backward_files: list[str] instance-attribute ¤
forward_common_files: list[str] instance-attribute ¤
backward_common_files: list[str] = [] instance-attribute ¤
prepare() ¤

Prepare MD tasks.

Includes: - Prepare the task_list - Prepare forward & backward files - Prepare commandlist_list for multi-remote submission

postprocess() ¤
run() ¤

Function to submit jobs to remote machines.

Notes
  • Orginal taks_dirs is relative to run_dir, and should not be changed. But the sumbmission function needs taks_dirs relative path to work_dir, so we make temporary change here.

premd_lammps_sevenn(work_dir, pdict, mdict) ¤

Prepare MD args.

Includes: - copy ML models to work_dir - collect initial configurations - prepare lammps args - generate task_dirs for ranges of temperature and press

temperature_press_mdarg_lammps(struct_dirs: list, temperature_list: list = [], press_list: list = [], lammps_argdict: dict = {}) -> list ¤

Generate the task_dirs for ranges of temperatures and stresses.

Parameters:

  • struct_dirs (list) –

    List of dirs contains configuration files.

  • temperature_list (list, default: [] ) –

    List of temperatures.

  • press_list (list, default: [] ) –

    List of stresses.

  • lammps_argdict (dict, default: {} ) –

mlp ¤

MLP engines package.

Modules:

mlp_graphpes ¤

Library for MLP training models GraphPES.

mlp_mace ¤

Library for MLP training models MACE.

Functions:

pre_train_mace(iter_idx, pdict, mdict) ¤
run_train_mace(iter_idx, pdict, mdict) ¤
post_train_mace(iter_idx, pdict, mdict) ¤

mlp_sevenn ¤

Library for MLP training models SevenNet.

Classes:

Functions:

OperAltrainSevennet(work_dir, pdict, multi_mdict, mdict_prefix='train') ¤

Bases: RemoteOperation

Methods:

  • prepare

    Prepare for remote training operation.

  • postprocess

    Collect the best checkpoint files and save them in FILE_CHECKPOINTS.

  • run

    Function to submit jobs to remote machines.

Attributes:

op_name = 'Training' instance-attribute ¤
task_filter = {'has_files': [K.FILE_ARG_TRAIN], 'no_files': ['checkpoint_best.pth']} instance-attribute ¤
work_dir = work_dir instance-attribute ¤
pdict = pdict instance-attribute ¤
mdict_list = self._select_machines(multi_mdicts, mdict_prefix) instance-attribute ¤
task_dirs = self._load_task_dirs() instance-attribute ¤
commandlist_list: list[list[str]] instance-attribute ¤
forward_files: list[str] instance-attribute ¤
backward_files: list[str] instance-attribute ¤
forward_common_files: list[str] instance-attribute ¤
backward_common_files: list[str] = [] instance-attribute ¤
prepare() ¤

Prepare for remote training operation.

Includes: - Prepare the task_list - Prepare forward & backward files - Prepare commandlist_list for multi-remote submission

postprocess() ¤

Collect the best checkpoint files and save them in FILE_CHECKPOINTS.

run() ¤

Function to submit jobs to remote machines.

Notes
  • Orginal taks_dirs is relative to run_dir, and should not be changed. But the sumbmission function needs taks_dirs relative path to work_dir, so we make temporary change here.
pretrain_sevenn(work_dir, pdict, mdict) ¤

Prepare arguments and data for ML training.

Includes: - split dataset into train/valid sets - build graph_data using SEVENN graph_build - prepare SEVENN args - establish train tasks (one folder for each training model) - Save all common_files in DIR_FWDATA for convenience in transferring files

Notes
  • DIR_COLLECTDATA: is tmp directory containing collected extxyz data
  • DIR_FWDATA: directory containing common_files to forward to remote machines

util_mlp ¤

Utilities for MLP training.

Classes:

  • Xyz2GraphData

    Convert XYZ file to graph data format used in MLP training.

Functions:

Xyz2GraphData ¤

Convert XYZ file to graph data format used in MLP training.

Methods:

build_graph_sevenn(files: list[str], outfile: str = 'graph_atoms.pt', outdir: str = '.', num_cores: int = 1, cutoff: float = 5.0, **ase_kwargs) staticmethod ¤

Build SevenNet graph dataset from source files.

Parameters:

  • files (list[str]) –

    List of input data files. Supported formats: extxyz, and other formats defined in function SevenNetGraphDataset.file_to_graph_list().

  • outfile (str, default: 'graph_atoms.pt' ) –

    Name of the output file. Defaults to "graph_atoms.pt".

  • outdir (str, default: '.' ) –

    Output directory. Defaults to ".".

  • num_cores (int, default: 1 ) –

    Number of CPU cores for parallel processing. Defaults to 1.

  • cutoff (float, default: 5.0 ) –

    Cutoff distance for neighbor search. Defaults to 5.0.

  • **ase_kwargs

    Additional keyword arguments for ASE's read() function

build_graph_mace() staticmethod ¤
suggest_num_epochs(dataset_size: int, batch_size: int, num_grad_updates: int = 300000) -> int ¤

Suggest number of epochs for training. Based on MACE's setting.

Parameters:

  • dataset_size (int) –

    Number of samples in the dataset.

  • batch_size (int) –

    Batch size.

  • num_grad_updates (int, default: 300000 ) –

    Maximum number of updates of model weights & biases. Defaults to 300000.

utilal ¤

Utilities for Active Learning workflow.

Classes:

  • D3ParamMD

    Different packages use different names for D3 parameters.

  • MLP2Lammps

    Convert MLP model to be used in LAMMPS.

D3ParamMD(d3package: str = 'sevenn') ¤

Different packages use different names for D3 parameters. This class to 'return' conventional names for D3 parameters for different packages used for MD.

Notes
  • The default cutoff values are 95 Bohr (50.2718 Angstrom) for two-body dispersion calculations and 40 Bohr (21.1671 Angstrom) for coordination number and three-body calculations, as in ASE-DFTD3 package. Other packages may use different default values.
  • Some dftd3 parameters for DFT calculations support triple-body interactions, but most MD packages only support pairwise interactions. So the triple-body cutoff parameter is not included in this class.

Methods:

Attributes:

d3package: str = d3package instance-attribute ¤
default_twobody_cutoff: float = 50.2718 instance-attribute ¤
default_cn_cutoff: float = 21.1671 instance-attribute ¤
param_names = params['params'] instance-attribute ¤
damping_map = params['damping_map'] instance-attribute ¤
get_params() -> dict ¤

Return D3 parameter names according to different packages.

check_supported_damping(damping: str) ¤

Check if the damping method is supported in the selected package.

angstrom2bohr(angstrom_value: float) -> float staticmethod ¤

Convert Angstrom to Bohr.

Notes

in simple-dftd3, 60*Bohr converts 60 Bohr to Angstrom.

angstrom2bohr2(angstrom_value: float) -> float staticmethod ¤

Convert Angstrom to Bohr^2. To used in sevenn package.

bohr2angstrom(bohr_value: float) -> float staticmethod ¤

Convert Bohr to Angstrom.

MLP2Lammps(mlp_model: str = 'sevenn') ¤

Convert MLP model to be used in LAMMPS.

Methods:

Attributes:

mlp_model: str = mlp_model instance-attribute ¤
convert(checkpoint: str, outfile: str = 'deployed.pt', **kwargs) ¤

Convert MLP model to LAMMPS format.

Parameters:

  • checkpoint (str) –

    Path to checkpoint file of MLP model.

  • outfile (str, default: 'deployed.pt' ) –

    Path to output LAMMPS potential file.

  • **kwargs

    Additional arguments for specific conversion methods.

convert_sevenn(checkpoint: str, outfile: str = 'deploy_sevenn', modal: str | None = None, enable_flash: bool = False, parallel_type=False, **kwargs) staticmethod ¤

Convert sevenn model to be used in LAMMPS.

Parameters:

  • checkpoint (str) –

    Path to checkpoint file of sevenn model.

  • outfile (str, default: 'deploy_sevenn' ) –

    Path to output LAMMPS potential file.

  • modal (str, default: None ) –

    Channel of multi-task model.

  • parallel_type (bool, default: False ) –

    Convert to potential for run in parallel simulations.

  • enable_flash (bool, default: False ) –

    Use flashTP.

  • **kwargs

    Additional arguments to avoid breaking the function signature when future additional arguments are added.

Notes

Single mode: will generate file as "outfile.pt" Parallel mode: will generate files as "outfile/deployed_parallel_0.pt", "outfile/deployed_parallel_1.pt", ...

convert_sevenn_mliap(checkpoint: str, outfile: str = 'deploy_sevenn_mliap.pt', modal: str | None = None, enable_cueq: bool = False, enable_flash: bool = False, enable_oeq: bool = False, **kwargs) staticmethod ¤

Convert sevenn model to be used in LAMMPS MLIAP.

Parameters:

  • checkpoint (str) –

    Path to checkpoint file of sevenn model.

  • outfile (str, default: 'deploy_sevenn_mliap.pt' ) –

    Path to output LAMMPS potential file.

  • modal (str, default: None ) –

    Channel of multi-task model.

  • enable_cueq (bool, default: False ) –

    Use cueq. cuEquivariance is only supported in ML-IAP interface.

  • enable_flash (bool, default: False ) –

    Use flashTP.

  • enable_oeq (bool, default: False ) –

    Use oeq.

  • **kwargs

    Additional arguments to avoid breaking the function signature when future additional arguments are added.

utilal_uncertainty ¤

Utilities for uncertainty estimation using models committee. - DO NOT import any alff libs in this file, since this file will be used remotely.

Classes:

  • ModelCommittee

    A class to manage a committee of models for uncertainty estimation.

Functions:

  • simple_lmpdump2extxyz

    Convert LAMMPS dump file to extended xyz file. This is very simple version, only convert atomic positions, but not stress tensor.

  • chunk_list

    Yield successive n-sized chunks from input_list.

ModelCommittee(mlp_model: str, model_files: list[str], calc_kwargs: dict | None = None, compute_stress: bool = False, rel_force: float | None = None, rel_stress: float | None = None, e_std_lo: float = 0.05, e_std_hi: float = 0.1, f_std_lo: float = 0.05, f_std_hi: float = 0.1, s_std_lo: float = 0.05, s_std_hi: float = 0.1, block_size: int = 1000) ¤

A class to manage a committee of models for uncertainty estimation.

Parameters:

  • mlp_model (str) –

    MLP model engine, e.g., 'sevenn'.

  • model_files (list[str]) –

    List of model files for the committee.

  • calc_kwargs (dict, default: None ) –

    Additional arguments for the MLP calculator. Defaults to {}.

  • compute_stress (bool, default: False ) –

    Whether to compute stress. Defaults to False.

  • rel_force (float, default: None ) –

    Relative force to normalize force std. Defaults to None.

  • rel_stress (float, default: None ) –

    Relative stress to normalize stress std. Defaults to None.

  • e_std_lo (float, default: 0.05 ) –

    energy std low. Defaults to 0.05.

  • e_std_hi (float, default: 0.1 ) –

    energy std high. Defaults to 0.1.

  • f_std_lo (float, default: 0.05 ) –

    force std low. Defaults to 0.05.

  • f_std_hi (float, default: 0.1 ) –

    force std high. Defaults to 0.1.

  • s_std_lo (float, default: 0.05 ) –

    stress std low. Defaults to 0.05.

  • s_std_hi (float, default: 0.1 ) –

    stress std high. Defaults to 0.1.

  • block_size (int, default: 1000 ) –

    Block size of configurations to compute 'committee error' at once, adjust this value to avoid flooding RAM memory. Defaults to 1000.

Notes
  • Consider using @staticmethod for some functions to avoid recursive messing.

Methods:

  • compute_committee_error_blockwise

    Compute committee error for energy, forces, and stress for a multiple configurations in a block-wise manner.

  • committee_judge

    Decide whether a configuration is candidate, accurate, or inaccurate based on committee error.

  • select_candidate

    Select candidate configurations for DFT calculation.

  • remove_inaccurate

    Remove inaccurate configurations based on committee error. This is used to revise the dataset.

Attributes:

mlp_model = mlp_model instance-attribute ¤
model_files = model_files instance-attribute ¤
calc_kwargs = calc_kwargs or {} instance-attribute ¤
compute_stress = compute_stress instance-attribute ¤
rel_force = rel_force instance-attribute ¤
rel_stress = rel_stress instance-attribute ¤
block_size = block_size instance-attribute ¤
e_std_lo = e_std_lo instance-attribute ¤
e_std_hi = e_std_hi instance-attribute ¤
f_std_lo = f_std_lo instance-attribute ¤
f_std_hi = f_std_hi instance-attribute ¤
s_std_lo = s_std_lo instance-attribute ¤
s_std_hi = s_std_hi instance-attribute ¤
calc_list = self._get_calc_list() instance-attribute ¤
committee_error_file: str = 'committee_error.txt' instance-attribute ¤
committee_judge_file: str = 'committee_judge_summary.yml' instance-attribute ¤
compute_committee_error_blockwise(struct_list: list[Atoms]) ¤

Compute committee error for energy, forces, and stress for a multiple configurations in a block-wise manner.

Parameters:

  • struct_list (list[Atoms]) –

    List of Atoms objects.

Notes
  • The output file is controlled by the class attribute self.committee_error_file.
  • This method now can handle list of structures with variable numbers of atoms. This allows processing of mixed-size structures in a single block.
committee_judge() -> tuple[np.ndarray, np.ndarray, np.ndarray] ¤

Decide whether a configuration is candidate, accurate, or inaccurate based on committee error.

Returns:

  • committee_judge_file ( s ) –

    files contain candidate, accurate and inaccurate configurations

Notes
  • If need to select candidates based on only energy, just set f_std_hi and s_std_hi to a very large values. By this way, the criterion for those terms will always meet.
  • Similarly, if need to select candidates based on only energy and force, set s_std_hi to a very large value. E.g., s_std_hi=1e6 for selecting candidates based on energy and force.
select_candidate(extxyz_file: str) ¤

Select candidate configurations for DFT calculation.

Returns:

  • extxyz_file ( str ) –

    candidate configurations

Notes

See parameters in functions committee_error and committee_judge.

remove_inaccurate(extxyz_file: str) ¤

Remove inaccurate configurations based on committee error. This is used to revise the dataset.

Returns:

  • extxyz_file ( str ) –

    revise configurations

simple_lmpdump2extxyz(lmpdump_file: str, extxyz_file: str) ¤

Convert LAMMPS dump file to extended xyz file. This is very simple version, only convert atomic positions, but not stress tensor.

chunk_list(input_list: list, chunk_size: int) -> Generator[list, None, None] ¤

Yield successive n-sized chunks from input_list.

Parameters:

  • input_list (list) –

    Input list to be chunked.

  • chunk_size (int) –

    Chunk size (number of elements per chunk).