alff.al¤
alff.al
¤
Active Learning package.
Modules:
-
active_learning–Active Learning module.
-
finetune–Fine-tuning module.
-
libal_md_ase–Library for ASE MD with SevenNet model.
-
libal_md_lammps–Library for LAMMPS MD with SevenNet model.
-
mlp–MLP engines package.
-
utilal–Utilities for Active Learning workflow.
-
utilal_uncertainty–Utilities for uncertainty estimation using models committee.
active_learning
¤
Active Learning module.
Classes:
-
WorkflowActiveLearning–Workflow for active learning.
Functions:
-
stage_train–Stage function for ML training tasks.
-
stage_md–Stage function for MD exploration tasks.
-
stage_dft–Stage function for DFT labeling tasks.
WorkflowActiveLearning(params_file: str, machines_file: str)
¤
Bases: Workflow
Workflow for active learning.
Notes:
Need to redefine .run() method, since the Active Learning workflow is different from the base class.
Methods:
-
run–
Attributes:
-
stage_map– -
wf_name– -
params_file– -
machines_file– -
schema_file– -
multi_mdicts– -
pdict– -
stage_list–
stage_map = {'ml_train': stage_train, 'md_explore': stage_md, 'dft_label': stage_dft}
instance-attribute
¤
wf_name = 'ACTIVE LEARNING'
instance-attribute
¤
params_file = params_file
instance-attribute
¤
machines_file = machines_file
instance-attribute
¤
schema_file = schema_file
instance-attribute
¤
multi_mdicts = config_machine.multi_mdicts
instance-attribute
¤
pdict = Config.loadconfig(self.params_file)
instance-attribute
¤
stage_list = self._load_stage_list()
instance-attribute
¤
run()
¤
stage_train(iter_idx, pdict, mdict)
¤
Stage function for ML training tasks.
This function includes: preparing training data and args, running training, and postprocessing. - collect data files - prepare training args based on MLP engine
stage_md(iter_idx, pdict, mdict)
¤
Stage function for MD exploration tasks.
Including: pre, run, post MD. - Collect initial configurations - Prepare MD args - Submit MD jobs to remote machines - Postprocess MD results
stage_dft(iter_idx, pdict, mdict)
¤
Stage function for DFT labeling tasks. Including: pre, run, post DFT.
finetune
¤
Fine-tuning module.
Classes:
-
WorkflowFinetune–Workflow for fine-tuning the existed ML models or train a new ML model.
Functions:
-
stage_train–Stage function for ML training tasks.
WorkflowFinetune(params_file: str, machines_file: str)
¤
Bases: Workflow
Workflow for fine-tuning the existed ML models or train a new ML model.
Needs to override self.stage_list in base class, because the stages are fixed here.
Methods:
-
run–The main function to run the workflow. This default implementation works for simple workflow,
Attributes:
-
stage_map– -
wf_name– -
stage_list– -
params_file– -
machines_file– -
schema_file– -
multi_mdicts– -
pdict–
stage_map = {'ml_train': stage_train}
instance-attribute
¤
wf_name = 'FINE-TUNING'
instance-attribute
¤
stage_list = ['ml_train']
instance-attribute
¤
params_file = params_file
instance-attribute
¤
machines_file = machines_file
instance-attribute
¤
schema_file = schema_file
instance-attribute
¤
multi_mdicts = config_machine.multi_mdicts
instance-attribute
¤
pdict = Config.loadconfig(self.params_file)
instance-attribute
¤
run()
¤
The main function to run the workflow. This default implementation works for simple workflow,
for more complex workflow (e.g. with iteration like active learning), need to reimplement this .run() function.
Notes:
- Force garbage collection before running the workflow to release unreachable objects (may reduce retained memory from previous stages such as prepare())..
stage_train(pdict, mdict)
¤
Stage function for ML training tasks.
libal_md_ase
¤
Library for ASE MD with SevenNet model.
Classes:
-
OperAlmdAseSevennet–This class runs ASE md for a list of structures in
task_dirs.
Functions:
-
premd_ase_sevenn–Prepare MD args.
-
temperature_press_mdarg_ase–Generate the task_dirs for ranges of temperatures and stresses.
OperAlmdAseSevennet(work_dir, pdict, multi_mdict, mdict_prefix='md')
¤
Bases: RemoteOperation
This class runs ASE md for a list of structures in task_dirs.
Methods:
-
prepare–Prepare MD tasks.
-
postprocess– -
run–Function to submit jobs to remote machines.
Attributes:
-
op_name– -
task_filter– -
work_dir– -
pdict– -
mdict_list– -
task_dirs– -
commandlist_list(list[list[str]]) – -
forward_files(list[str]) – -
backward_files(list[str]) – -
forward_common_files(list[str]) – -
backward_common_files(list[str]) –
op_name = 'ASE MD with SevenNet'
instance-attribute
¤
task_filter = {'has_files': [K.FILE_FRAME_UNLABEL], 'no_files': ['committee_error.txt']}
instance-attribute
¤
work_dir = work_dir
instance-attribute
¤
pdict = pdict
instance-attribute
¤
mdict_list = self._select_machines(multi_mdicts, mdict_prefix)
instance-attribute
¤
task_dirs = self._load_task_dirs()
instance-attribute
¤
commandlist_list: list[list[str]]
instance-attribute
¤
forward_files: list[str]
instance-attribute
¤
backward_files: list[str]
instance-attribute
¤
forward_common_files: list[str]
instance-attribute
¤
backward_common_files: list[str] = []
instance-attribute
¤
prepare()
¤
Prepare MD tasks.
Includes: - Prepare the task_list - Prepare forward & backward files - Prepare commandlist_list for multi-remote submission
postprocess()
¤
run()
¤
Function to submit jobs to remote machines.
Notes
- Orginal
taks_dirsis relative torun_dir, and should not be changed. But the sumbmission function needstaks_dirsrelative path towork_dir, so we make temporary change here.
premd_ase_sevenn(work_dir, pdict, mdict)
¤
Prepare MD args.
Includes: - copy ML models to work_dir - collect initial configurations - prepare ASE args - generate task_dirs for ranges of temperature and press
temperature_press_mdarg_ase(struct_dirs: list, temperature_list: list = [], press_list: list = [], ase_argdict: dict = {}) -> list
¤
Generate the task_dirs for ranges of temperatures and stresses.
Parameters:
-
struct_dirs(list) –List of dirs contains configuration files.
-
temperature_list(list, default:[]) –List of temperatures.
-
press_list(list, default:[]) –List of stresses.
-
ase_argdict(dict, default:{}) –See ase.md schema
libal_md_lammps
¤
Library for LAMMPS MD with SevenNet model.
Classes:
-
OperAlmdLammpsSevennet–This class runs LAMMPS md for a list of structures in
task_dirs.
Functions:
-
premd_lammps_sevenn–Prepare MD args.
-
temperature_press_mdarg_lammps–Generate the task_dirs for ranges of temperatures and stresses.
OperAlmdLammpsSevennet(work_dir, pdict, multi_mdict, mdict_prefix='md')
¤
Bases: RemoteOperation
This class runs LAMMPS md for a list of structures in task_dirs.
Methods:
-
prepare–Prepare MD tasks.
-
postprocess– -
run–Function to submit jobs to remote machines.
Attributes:
-
op_name– -
task_filter– -
work_dir– -
pdict– -
mdict_list– -
task_dirs– -
commandlist_list(list[list[str]]) – -
forward_files(list[str]) – -
backward_files(list[str]) – -
forward_common_files(list[str]) – -
backward_common_files(list[str]) –
op_name = 'LAMMPS MD with SevenNet'
instance-attribute
¤
task_filter = {'has_files': ['conf.lmpdata'], 'no_files': ['committee_error.txt']}
instance-attribute
¤
work_dir = work_dir
instance-attribute
¤
pdict = pdict
instance-attribute
¤
mdict_list = self._select_machines(multi_mdicts, mdict_prefix)
instance-attribute
¤
task_dirs = self._load_task_dirs()
instance-attribute
¤
commandlist_list: list[list[str]]
instance-attribute
¤
forward_files: list[str]
instance-attribute
¤
backward_files: list[str]
instance-attribute
¤
forward_common_files: list[str]
instance-attribute
¤
backward_common_files: list[str] = []
instance-attribute
¤
prepare()
¤
Prepare MD tasks.
Includes: - Prepare the task_list - Prepare forward & backward files - Prepare commandlist_list for multi-remote submission
postprocess()
¤
run()
¤
Function to submit jobs to remote machines.
Notes
- Orginal
taks_dirsis relative torun_dir, and should not be changed. But the sumbmission function needstaks_dirsrelative path towork_dir, so we make temporary change here.
premd_lammps_sevenn(work_dir, pdict, mdict)
¤
Prepare MD args.
Includes: - copy ML models to work_dir - collect initial configurations - prepare lammps args - generate task_dirs for ranges of temperature and press
temperature_press_mdarg_lammps(struct_dirs: list, temperature_list: list = [], press_list: list = [], lammps_argdict: dict = {}) -> list
¤
Generate the task_dirs for ranges of temperatures and stresses.
Parameters:
-
struct_dirs(list) –List of dirs contains configuration files.
-
temperature_list(list, default:[]) –List of temperatures.
-
press_list(list, default:[]) –List of stresses.
-
lammps_argdict(dict, default:{}) –See lammps.md schema
mlp
¤
MLP engines package.
Modules:
-
mlp_graphpes–Library for MLP training models GraphPES.
-
mlp_mace–Library for MLP training models MACE.
-
mlp_sevenn–Library for MLP training models SevenNet.
-
util_mlp–Utilities for MLP training.
mlp_sevenn
¤
Library for MLP training models SevenNet.
Classes:
Functions:
-
pretrain_sevenn–Prepare arguments and data for ML training.
OperAltrainSevennet(work_dir, pdict, multi_mdict, mdict_prefix='train')
¤
Bases: RemoteOperation
Methods:
-
prepare–Prepare for remote training operation.
-
postprocess–Collect the best checkpoint files and save them in FILE_CHECKPOINTS.
-
run–Function to submit jobs to remote machines.
Attributes:
-
op_name– -
task_filter– -
work_dir– -
pdict– -
mdict_list– -
task_dirs– -
commandlist_list(list[list[str]]) – -
forward_files(list[str]) – -
backward_files(list[str]) – -
forward_common_files(list[str]) – -
backward_common_files(list[str]) –
op_name = 'Training'
instance-attribute
¤
task_filter = {'has_files': [K.FILE_ARG_TRAIN], 'no_files': ['checkpoint_best.pth']}
instance-attribute
¤
work_dir = work_dir
instance-attribute
¤
pdict = pdict
instance-attribute
¤
mdict_list = self._select_machines(multi_mdicts, mdict_prefix)
instance-attribute
¤
task_dirs = self._load_task_dirs()
instance-attribute
¤
commandlist_list: list[list[str]]
instance-attribute
¤
forward_files: list[str]
instance-attribute
¤
backward_files: list[str]
instance-attribute
¤
forward_common_files: list[str]
instance-attribute
¤
backward_common_files: list[str] = []
instance-attribute
¤
prepare()
¤
Prepare for remote training operation.
Includes: - Prepare the task_list - Prepare forward & backward files - Prepare commandlist_list for multi-remote submission
postprocess()
¤
Collect the best checkpoint files and save them in FILE_CHECKPOINTS.
run()
¤
Function to submit jobs to remote machines.
Notes
- Orginal
taks_dirsis relative torun_dir, and should not be changed. But the sumbmission function needstaks_dirsrelative path towork_dir, so we make temporary change here.
pretrain_sevenn(work_dir, pdict, mdict)
¤
Prepare arguments and data for ML training.
Includes: - split dataset into train/valid sets - build graph_data using SEVENN graph_build - prepare SEVENN args - establish train tasks (one folder for each training model) - Save all common_files in DIR_FWDATA for convenience in transferring files
Notes
- DIR_COLLECTDATA: is
tmpdirectory containing collected extxyz data - DIR_FWDATA: directory containing
common_filesto forward to remote machines
util_mlp
¤
Utilities for MLP training.
Classes:
-
Xyz2GraphData–Convert XYZ file to graph data format used in MLP training.
Functions:
-
suggest_num_epochs–Suggest number of epochs for training. Based on MACE's setting.
Xyz2GraphData
¤
Convert XYZ file to graph data format used in MLP training.
Methods:
-
build_graph_sevenn–Build SevenNet graph dataset from source files.
-
build_graph_mace–
build_graph_sevenn(files: list[str], outfile: str = 'graph_atoms.pt', outdir: str = '.', num_cores: int = 1, cutoff: float = 5.0, **ase_kwargs)
staticmethod
¤
Build SevenNet graph dataset from source files.
Parameters:
-
files(list[str]) –List of input data files. Supported formats: extxyz, and other formats defined in function
SevenNetGraphDataset.file_to_graph_list(). -
outfile(str, default:'graph_atoms.pt') –Name of the output file. Defaults to "graph_atoms.pt".
-
outdir(str, default:'.') –Output directory. Defaults to ".".
-
num_cores(int, default:1) –Number of CPU cores for parallel processing. Defaults to 1.
-
cutoff(float, default:5.0) –Cutoff distance for neighbor search. Defaults to 5.0.
-
**ase_kwargs–Additional keyword arguments for ASE's
read()function
build_graph_mace()
staticmethod
¤
suggest_num_epochs(dataset_size: int, batch_size: int, num_grad_updates: int = 300000) -> int
¤
Suggest number of epochs for training. Based on MACE's setting.
Parameters:
utilal
¤
Utilities for Active Learning workflow.
Classes:
-
D3ParamMD–Different packages use different names for D3 parameters.
-
MLP2Lammps–Convert MLP model to be used in LAMMPS.
D3ParamMD(d3package: str = 'sevenn')
¤
Different packages use different names for D3 parameters. This class to 'return' conventional names for D3 parameters for different packages used for MD.
Notes
- The default cutoff values are
95 Bohr (50.2718 Angstrom)for two-body dispersion calculations and40 Bohr (21.1671 Angstrom)for coordination number and three-body calculations, as in ASE-DFTD3 package. Other packages may use different default values. - Some dftd3 parameters for DFT calculations support triple-body interactions, but most MD packages only support pairwise interactions. So the triple-body cutoff parameter is not included in this class.
Methods:
-
get_params–Return D3 parameter names according to different packages.
-
check_supported_damping–Check if the damping method is supported in the selected package.
-
angstrom2bohr–Convert Angstrom to Bohr.
-
angstrom2bohr2–Convert Angstrom to Bohr^2. To used in sevenn package.
-
bohr2angstrom–Convert Bohr to Angstrom.
Attributes:
-
d3package(str) – -
default_twobody_cutoff(float) – -
default_cn_cutoff(float) – -
param_names– -
damping_map–
d3package: str = d3package
instance-attribute
¤
default_twobody_cutoff: float = 50.2718
instance-attribute
¤
default_cn_cutoff: float = 21.1671
instance-attribute
¤
param_names = params['params']
instance-attribute
¤
damping_map = params['damping_map']
instance-attribute
¤
get_params() -> dict
¤
Return D3 parameter names according to different packages.
check_supported_damping(damping: str)
¤
Check if the damping method is supported in the selected package.
angstrom2bohr(angstrom_value: float) -> float
staticmethod
¤
Convert Angstrom to Bohr.
Notes
in simple-dftd3, 60*Bohr converts 60 Bohr to Angstrom.
angstrom2bohr2(angstrom_value: float) -> float
staticmethod
¤
Convert Angstrom to Bohr^2. To used in sevenn package.
bohr2angstrom(bohr_value: float) -> float
staticmethod
¤
Convert Bohr to Angstrom.
MLP2Lammps(mlp_model: str = 'sevenn')
¤
Convert MLP model to be used in LAMMPS.
Methods:
-
convert–Convert MLP model to LAMMPS format.
-
convert_sevenn–Convert sevenn model to be used in LAMMPS.
-
convert_sevenn_mliap–Convert sevenn model to be used in LAMMPS MLIAP.
Attributes:
mlp_model: str = mlp_model
instance-attribute
¤
convert(checkpoint: str, outfile: str = 'deployed.pt', **kwargs)
¤
convert_sevenn(checkpoint: str, outfile: str = 'deploy_sevenn', modal: str | None = None, enable_flash: bool = False, parallel_type=False, **kwargs)
staticmethod
¤
Convert sevenn model to be used in LAMMPS.
Parameters:
-
checkpoint(str) –Path to checkpoint file of sevenn model.
-
outfile(str, default:'deploy_sevenn') –Path to output LAMMPS potential file.
-
modal(str, default:None) –Channel of multi-task model.
-
parallel_type(bool, default:False) –Convert to potential for run in parallel simulations.
-
enable_flash(bool, default:False) –Use flashTP.
-
**kwargs–Additional arguments to avoid breaking the function signature when future additional arguments are added.
Notes
Single mode: will generate file as "outfile.pt" Parallel mode: will generate files as "outfile/deployed_parallel_0.pt", "outfile/deployed_parallel_1.pt", ...
convert_sevenn_mliap(checkpoint: str, outfile: str = 'deploy_sevenn_mliap.pt', modal: str | None = None, enable_cueq: bool = False, enable_flash: bool = False, enable_oeq: bool = False, **kwargs)
staticmethod
¤
Convert sevenn model to be used in LAMMPS MLIAP.
Parameters:
-
checkpoint(str) –Path to checkpoint file of sevenn model.
-
outfile(str, default:'deploy_sevenn_mliap.pt') –Path to output LAMMPS potential file.
-
modal(str, default:None) –Channel of multi-task model.
-
enable_cueq(bool, default:False) –Use cueq. cuEquivariance is only supported in ML-IAP interface.
-
enable_flash(bool, default:False) –Use flashTP.
-
enable_oeq(bool, default:False) –Use oeq.
-
**kwargs–Additional arguments to avoid breaking the function signature when future additional arguments are added.
utilal_uncertainty
¤
Utilities for uncertainty estimation using models committee.
- DO NOT import any alff libs in this file, since this file will be used remotely.
Classes:
-
ModelCommittee–A class to manage a committee of models for uncertainty estimation.
Functions:
-
simple_lmpdump2extxyz–Convert LAMMPS dump file to extended xyz file. This is very simple version, only convert atomic positions, but not stress tensor.
-
chunk_list–Yield successive n-sized chunks from
input_list.
ModelCommittee(mlp_model: str, model_files: list[str], calc_kwargs: dict | None = None, compute_stress: bool = False, rel_force: float | None = None, rel_stress: float | None = None, e_std_lo: float = 0.05, e_std_hi: float = 0.1, f_std_lo: float = 0.05, f_std_hi: float = 0.1, s_std_lo: float = 0.05, s_std_hi: float = 0.1, block_size: int = 1000)
¤
A class to manage a committee of models for uncertainty estimation.
Parameters:
-
mlp_model(str) –MLP model engine, e.g., 'sevenn'.
-
model_files(list[str]) –List of model files for the committee.
-
calc_kwargs(dict, default:None) –Additional arguments for the MLP calculator. Defaults to {}.
-
compute_stress(bool, default:False) –Whether to compute stress. Defaults to False.
-
rel_force(float, default:None) –Relative force to normalize force std. Defaults to None.
-
rel_stress(float, default:None) –Relative stress to normalize stress std. Defaults to None.
-
e_std_lo(float, default:0.05) –energy std low. Defaults to 0.05.
-
e_std_hi(float, default:0.1) –energy std high. Defaults to 0.1.
-
f_std_lo(float, default:0.05) –force std low. Defaults to 0.05.
-
f_std_hi(float, default:0.1) –force std high. Defaults to 0.1.
-
s_std_lo(float, default:0.05) –stress std low. Defaults to 0.05.
-
s_std_hi(float, default:0.1) –stress std high. Defaults to 0.1.
-
block_size(int, default:1000) –Block size of configurations to compute 'committee error' at once, adjust this value to avoid flooding RAM memory. Defaults to 1000.
Notes
- Consider using
@staticmethodfor some functions to avoid recursive messing.
Methods:
-
compute_committee_error_blockwise–Compute committee error for energy, forces, and stress for a multiple configurations in a block-wise manner.
-
committee_judge–Decide whether a configuration is candidate, accurate, or inaccurate based on committee error.
-
select_candidate–Select candidate configurations for DFT calculation.
-
remove_inaccurate–Remove inaccurate configurations based on committee error. This is used to revise the dataset.
Attributes:
-
mlp_model– -
model_files– -
calc_kwargs– -
compute_stress– -
rel_force– -
rel_stress– -
block_size– -
e_std_lo– -
e_std_hi– -
f_std_lo– -
f_std_hi– -
s_std_lo– -
s_std_hi– -
calc_list– -
committee_error_file(str) – -
committee_judge_file(str) –
mlp_model = mlp_model
instance-attribute
¤
model_files = model_files
instance-attribute
¤
calc_kwargs = calc_kwargs or {}
instance-attribute
¤
compute_stress = compute_stress
instance-attribute
¤
rel_force = rel_force
instance-attribute
¤
rel_stress = rel_stress
instance-attribute
¤
block_size = block_size
instance-attribute
¤
e_std_lo = e_std_lo
instance-attribute
¤
e_std_hi = e_std_hi
instance-attribute
¤
f_std_lo = f_std_lo
instance-attribute
¤
f_std_hi = f_std_hi
instance-attribute
¤
s_std_lo = s_std_lo
instance-attribute
¤
s_std_hi = s_std_hi
instance-attribute
¤
calc_list = self._get_calc_list()
instance-attribute
¤
committee_error_file: str = 'committee_error.txt'
instance-attribute
¤
committee_judge_file: str = 'committee_judge_summary.yml'
instance-attribute
¤
compute_committee_error_blockwise(struct_list: list[Atoms])
¤
Compute committee error for energy, forces, and stress for a multiple configurations in a block-wise manner.
Parameters:
-
struct_list(list[Atoms]) –List of Atoms objects.
Notes
- The output file is controlled by the class attribute
self.committee_error_file. - This method now can handle list of structures with variable numbers of atoms. This allows processing of mixed-size structures in a single block.
committee_judge() -> tuple[np.ndarray, np.ndarray, np.ndarray]
¤
Decide whether a configuration is candidate, accurate, or inaccurate based on committee error.
Returns:
-
committee_judge_file(s) –files contain candidate, accurate and inaccurate configurations
Notes
- If need to select candidates based on only
energy, just setf_std_hiands_std_hito a very large values. By this way, the criterion for those terms will always meet. - Similarly, if need to select candidates based on only
energyandforce, sets_std_hito a very large value. E.g.,s_std_hi=1e6for selecting candidates based on energy and force.
simple_lmpdump2extxyz(lmpdump_file: str, extxyz_file: str)
¤
Convert LAMMPS dump file to extended xyz file. This is very simple version, only convert atomic positions, but not stress tensor.