ALFF: Active Learning¶

Run the main ALFF active learning process. The active learning process in ALFF is designed to perform the following tasks automatically and iteratively without needs any user intervention:

Train ML models
- Collect data files
- Split dataset
- Prepare training args
- Training ML models at remote machines
- Monitor the training job status
- Get back the trained ML models whenever they are finished
Run MD simulations for sampling exploration
- Configure sampling exploration spaces
- Prepare MD args
- Submit and run MD simulations to explore atomic configurations at remote machines
- Monitor the MD job status and get back the results whenever they are finished
- Select condidate atomic configurations for DFT calculations
Run DFT calculations for labeling the selected atomic configurations
- Prepare DFT args and DFT tasks
- Submit and run DFT jobs to remote machines
- Monitor the job status, and get back the DFT calculation results
- Collect the data from the DFT calculations
- Convert labeled data to the readable formats for training ML models on the next iteration.

alff_al PARAM.yml MACHINE.yml

PARAM.yml: The parameters of the generator.
MACHINE.yml: The settings of the machines running the generator's subprocesses.

An example run:

0^th iteration:
1^st iteration:
2^nd iteration:
Nth iteration:

-------------------------------- ALFF --------------------------------
    Version:  0.1.dev409+g177de1d
       Path:  D:/work/code_hub/alff
---------------------------- Dependencies ----------------------------
       numpy  1.26.4       C:/conda/envs/py13/Lib/site-packages/numpy
       scipy  1.14.1       C:/conda/envs/py13/Lib/site-packages/scipy
         ase  3.23.1b1     C:/conda/envs/py13/Lib/site-packages/ase
      thkit  0.1.dev122   D:/work/code_hub/thkit
     phonopy  2.29.1       C:/conda/envs/py13/Lib/site-packages/phonopy
----------------------- Author: C.Thang Nguyen -----------------------
----------------- Contact: http://thangckt.github.io/email -----------------

                             ___    __    ____________
                            /   |  / /   / ____/ ____/
                           / /| | / /   / /_  / /_
                          / ___ |/ /___/ __/ / __/
                         /_/  |_/_____/_/   /_/

INFO: START ACTIVE LEARNING
INFO: ======================= iter_000000 ========================
INFO: ------------- iter_00000 stage_00: pre_train ---------------
INFO: Collecting data files
INFO: Split dataset
INFO: Prepare training args
INFO: Prepare train tasks
INFO: ------------- iter_00000 stage_01: run_train ---------------
INFO: Training ML models... be patient
               Remote host: server_1
               Remote path: 91m/uwork/user01/work/w24_alff_job
               Job info: log/2025Jan09_035212_dispatch.log
INFO: Running chunk 1 / 1 (4 of 4 jobs).
INFO: --------------- iter_00000 stage_02: post_train ------------
INFO: --------------- iter_00000 stage_03: pre_md ----------------
The directory "iter_00000/01_md" already existed. Select an action: [yes/no/backup]?
  Yes: overwrite the existing directory and continue/update uncompleted tasks.
  No: interrupt and exit process.
  Backup: backup the existing directory and perform fresh tasks.
        Please input your choice (y/n/b): y
        Overwrite the existing directory
INFO: Prepare MD args
INFO: --------------- iter_00000 stage_04: run_md ----------------
INFO: ======================== iter_00004 ========================
INFO: --------------- iter_00004 stage_04: run_md ----------------
INFO: Running jobs on the remote machines
INFO: Machine 0, assigned 43 jobs
           Remote host: 0.0.23.191
           Remote path: /home/tha/work/w24_CAN_md
           Job info: log/2025Feb28_183617_alff.log
INFO: Machine 1, assigned 29 jobs
           Remote host: 0.0.177.80
           Remote path: /mnt/d/_tmp_job/w24_WSL_md
           Job info: log/2025Feb28_183617_alff.log
INFO: Machine 0, running 4 of 43 jobs (chunk 1/11).
INFO: Machine 1, running 2 of 29 jobs (chunk 1/15).
INFO: Machine 1, running 2 of 27 jobs (chunk 2/15). Time left 0:07
INFO: Machine 1, running 2 of 25 jobs (chunk 3/15). Time left 0:07
INFO: Machine 1, running 2 of 23 jobs (chunk 4/15). Time left 0:06
INFO: Machine 0, running 4 of 39 jobs (chunk 2/11). Time left 0:21
INFO: Machine 1, running 2 of 21 jobs (chunk 5/15). Time left 0:06
...
INFO: --------------- iter_00000 stage_05: post_md ---------------
INFO: --------------- iter_00000 stage_06: pre_dft ---------------
INFO: Prepare DFT tasks
INFO: --------------- iter_00000 stage_07: run_dft ---------------
INFO: Running jobs on the remote machines
INFO: Machine 0, assigned 144 jobs
           Remote host: 10.0.23.191
           Remote path: /home/tha/work/w24_CAN_dft
           Job info: log/2025Feb28_183617_alff.log
INFO: Machine 1, assigned 96 jobs
           Remote host: 10.0.177.80
           Remote path: /mnt/d/_tmp_job/w24_WSL_dft
           Job info: log/2025Feb28_183617_alff.log
INFO: Machine 0, running 4 of 144 jobs (chunk 1/36).
INFO: Machine 1, running 2 of 96 jobs (chunk 1/48).
INFO: Machine 0, running 4 of 140 jobs (chunk 2/36). Time left 6:03
INFO: Machine 1, running 2 of 94 jobs (chunk 2/48). Time left 10:06
INFO: Machine 0, running 4 of 136 jobs (chunk 3/36). Time left 7:27
INFO: Machine 1, running 2 of 92 jobs (chunk 3/48). Time left 9:30
INFO: Machine 1, running 2 of 90 jobs (chunk 4/48). Time left 7:58
INFO: Machine 0, running 4 of 132 jobs (chunk 4/36). Time left 10:57
INFO: Machine 1, running 2 of 88 jobs (chunk 5/48). Time left 7:36
INFO: Machine 1, running 2 of 86 jobs (chunk 6/48). Time left 8:31
INFO: Machine 0, running 4 of 128 jobs (chunk 5/36). Time left 9:30
INFO: Machine 1, running 2 of 84 jobs (chunk 7/48). Time left 9:34
INFO: Machine 0, running 4 of 124 jobs (chunk 6/36). Time left 7:38
INFO: Machine 1, running 2 of 82 jobs (chunk 8/48). Time left 8:59
INFO: Machine 0, running 4 of 120 jobs (chunk 7/36). Time left 7:38
...
INFO: -------------- iter_00000 stage_08: post_dft ---------------
INFO: Collect data on the path: iter_00000/03_data
INFO: ======================== iter_00001 ========================
...