ALFF: Active Learning¶
Run the main ALFF active learning process. The active learning process in ALFF is designed to perform the following tasks automatically and iteratively without needs any user intervention:
-
Train ML models
- Collect data files
- Split dataset
- Prepare training args
- Trainning ML models at remote machines
- Monitor the training job status
- Get back the trained ML models whenever they are finished
-
Run MD simulations for sampling exploration
- Configure sampling exploration spaces
- Prepare MD args
- Submit and run MD simulations to explore atomic configurations at remote machines
- Monitor the MD job status and get back the results whenever they are finished
- Select condidate atomic configurations for DFT calculations
-
Run DFT calculations for labeling the selected atomic configurations
- Prepare DFT args and DFT tasks
- Submit and run DFT jobs to remote machines
- Monitor the job status, and get back the DFT calculation results
- Collect the data from the DFT calculations
- Convert labeled data to the readable formats for training ML models on the next iteration.
PARAM.yaml
: The parameters of the generator.MACHINE.yaml
: The settings of the machines running the generator's subprocesses.
An example run:
-
0th iteration:
-
1st iteration:
-
2nd iteration:
-
Nth iteration:
-------------------------------- ALFF --------------------------------
Version: 0.1.dev409+g177de1d
Path: D:/work/code_hub/alff
---------------------------- Dependencies ----------------------------
numpy 1.26.4 C:/conda/envs/py13/Lib/site-packages/numpy
scipy 1.14.1 C:/conda/envs/py13/Lib/site-packages/scipy
ase 3.23.1b1 C:/conda/envs/py13/Lib/site-packages/ase
thutil 0.1.dev122 D:/work/code_hub/thutil
phonopy 2.29.1 C:/conda/envs/py13/Lib/site-packages/phonopy
----------------------- Author: C.Thang Nguyen -----------------------
----------------- Contact: http://thangckt.github.io/email -----------------
___ __ ____________
/ | / / / ____/ ____/
/ /| | / / / /_ / /_
/ ___ |/ /___/ __/ / __/
/_/ |_/_____/_/ /_/
INFO: START ACTIVE LEARNING
INFO: ======================= iter_000000 ========================
INFO: ------------- iter_00000 stage_00: pre_train ---------------
INFO: Collecting data files
INFO: Split dataset
INFO: Prepare training args
INFO: Prepare train tasks
INFO: ------------- iter_00000 stage_01: run_train ---------------
INFO: Trainning ML models... be patient
Remote host: server_1
Remote path: 91m/uwork/user01/work/w24_alff_job
Job info: log/2025Jan09_035212_dispatch.log
INFO: Running chunk 1 / 1 (4 of 4 jobs).
INFO: --------------- iter_00000 stage_02: post_train ------------
INFO: --------------- iter_00000 stage_03: pre_md ----------------
The directory "iter_00000/01_md" already existed. Select an action: [yes/no/backup]?
Yes: overwrite the existing directory and continue/update uncompleted tasks.
No: interrupt and exit process.
Backup: backup the existing directory and perform fresh tasks.
Please input your choice (y/n/b): y
Overwrite the existing directory
INFO: Prepare MD args
INFO: --------------- iter_00000 stage_04: run_md ----------------
INFO: ======================== iter_00004 ========================
INFO: --------------- iter_00004 stage_04: run_md ----------------
INFO: Running jobs on the remote machines
INFO: Machine 0, assigned 43 jobs
Remote host: 0.0.23.191
Remote path: /home/tha/work/w24_CAN_md
Job info: log/2025Feb28_183617_alff.log
INFO: Machine 1, assigned 29 jobs
Remote host: 0.0.177.80
Remote path: /mnt/d/_tmp_job/w24_WSL_md
Job info: log/2025Feb28_183617_alff.log
INFO: Machine 0, running 4 of 43 jobs (chunk 1/11).
INFO: Machine 1, running 2 of 29 jobs (chunk 1/15).
INFO: Machine 1, running 2 of 27 jobs (chunk 2/15). Time left 0:07
INFO: Machine 1, running 2 of 25 jobs (chunk 3/15). Time left 0:07
INFO: Machine 1, running 2 of 23 jobs (chunk 4/15). Time left 0:06
INFO: Machine 0, running 4 of 39 jobs (chunk 2/11). Time left 0:21
INFO: Machine 1, running 2 of 21 jobs (chunk 5/15). Time left 0:06
...
INFO: --------------- iter_00000 stage_05: post_md ---------------
INFO: --------------- iter_00000 stage_06: pre_dft ---------------
INFO: Prepare DFT tasks
INFO: --------------- iter_00000 stage_07: run_dft ---------------
INFO: Running jobs on the remote machines
INFO: Machine 0, assigned 144 jobs
Remote host: 10.0.23.191
Remote path: /home/tha/work/w24_CAN_dft
Job info: log/2025Feb28_183617_alff.log
INFO: Machine 1, assigned 96 jobs
Remote host: 10.0.177.80
Remote path: /mnt/d/_tmp_job/w24_WSL_dft
Job info: log/2025Feb28_183617_alff.log
INFO: Machine 0, running 4 of 144 jobs (chunk 1/36).
INFO: Machine 1, running 2 of 96 jobs (chunk 1/48).
INFO: Machine 0, running 4 of 140 jobs (chunk 2/36). Time left 6:03
INFO: Machine 1, running 2 of 94 jobs (chunk 2/48). Time left 10:06
INFO: Machine 0, running 4 of 136 jobs (chunk 3/36). Time left 7:27
INFO: Machine 1, running 2 of 92 jobs (chunk 3/48). Time left 9:30
INFO: Machine 1, running 2 of 90 jobs (chunk 4/48). Time left 7:58
INFO: Machine 0, running 4 of 132 jobs (chunk 4/36). Time left 10:57
INFO: Machine 1, running 2 of 88 jobs (chunk 5/48). Time left 7:36
INFO: Machine 1, running 2 of 86 jobs (chunk 6/48). Time left 8:31
INFO: Machine 0, running 4 of 128 jobs (chunk 5/36). Time left 9:30
INFO: Machine 1, running 2 of 84 jobs (chunk 7/48). Time left 9:34
INFO: Machine 0, running 4 of 124 jobs (chunk 6/36). Time left 7:38
INFO: Machine 1, running 2 of 82 jobs (chunk 8/48). Time left 8:59
INFO: Machine 0, running 4 of 120 jobs (chunk 7/36). Time left 7:38
...
INFO: -------------- iter_00000 stage_08: post_dft ---------------
INFO: Collect data on the path: iter_00000/03_data
INFO: ======================== iter_00001 ========================
...