General¶
ignore task_dirs¶
If for some reasons you want to ignore some task_dirs (e.g., they are remaining jobs from previous launch of clff and need to be ignored on a new launch, some tasks are out_of_memory of your computing resources, etc.) from submission to remote machines, you can create (manually) a file named ignore_task_dirs.yml in the work_dir with the following content:
yml
- path/to/ignore/task_dir1
- path/to/ignore/task_dir2
- This trick is applied to all processes of
clff
Active Learning¶
Fake MD candidates¶
DFT stage collects configurations in dir 01_md/md_candidate/*.extxyz. These are the candidates selected by the active learning algorithm.
- You can totally "fake" these candidates by: adding some configurations, removing some configurations, or even replacing all candidates with your own configurations,... just by place your
.extxyzfiles in the dir01_md/md_candidate/. The active learning algorithm will just read the configurations in thisdirand use them for the next DFT labeling stage. This is useful when you want have more custom configurations to be labeled by DFT.
Note: avoid adding duplicated custom configurations, do as follow:
- Run below cell to collect
extxyzfiles in DFT label directories. Then, add the collectedextxyzfile back to dataset for training.- Delete all DFT task directories
- Add custom MD candidate structures to
01_md/md_candidate/*.extxyz- Modify
_clip.iterto ben-1 2step,
- Relaunch
clip_trainstep to recollect interation-data for checking duplicates. (remember to quitclffbefore graph building)- Then, modify
_clip.iterto ben 1step, and relaunchclffto DFT label the new candidate structures.
- You can also ignore completely the MD runs, and just place your own configurations in the dir
01_md/md_candidate/for DFT labeling. Then, modify the CLFF iterlog_clip.iterto skip the MD stage and directly go to DFT stage (for example ignoring MD stage atiter 12, modify corresponding lines to12 1). This is useful when you have some specific configurations that you want to be labeled by DFT, but you don't want to run the MD simulations.
Effective sampling¶
Rules of thumb:
- Explicitly index
init_struct_pathsfor each structure for easier control sampling (can get them from01_md/init_paths.yml), i.e., avoid using wildcards likeinit_structs/*. - Exclude enough sampled structures from the sampling space to save computational resources (MD run).
- Add empty sampling spaces if want to ignore some iterations of sampling
md:
sampling_spaces:
- {} # ignore some iterations
- {}