API¶
thkit
¶

The Python package of general ulitities.
Developed and maintained by C.Thang Nguyen
Modules:
Attributes:
-
THKIT_ROOT– -
__author__– -
__contact__–
THKIT_ROOT = Path(__file__).parent
module-attribute
¶
__author__ = 'C.Thang Nguyen'
module-attribute
¶
__contact__ = 'http://thangckt.github.io/email'
module-attribute
¶
_version
¶
Attributes:
-
version(str) – -
__version__(str) – -
__version_tuple__(VERSION_TUPLE) – -
version_tuple(VERSION_TUPLE) – -
commit_id(COMMIT_ID) – -
__commit_id__(COMMIT_ID) –
__all__ = ['__version__', '__version_tuple__', 'version', 'version_tuple', '__commit_id__', 'commit_id']
module-attribute
¶
TYPE_CHECKING = False
module-attribute
¶
VERSION_TUPLE = Tuple[Union[int, str], ...]
module-attribute
¶
COMMIT_ID = Union[str, None]
module-attribute
¶
version: str = '0.1.1.dev1'
module-attribute
¶
__version__: str = '0.1.1.dev1'
module-attribute
¶
__version_tuple__: VERSION_TUPLE = (0, 1, 1, 'dev1')
module-attribute
¶
version_tuple: VERSION_TUPLE = (0, 1, 1, 'dev1')
module-attribute
¶
commit_id: COMMIT_ID = 'g858759d9c'
module-attribute
¶
__commit_id__: COMMIT_ID = 'g858759d9c'
module-attribute
¶
config
¶
Functions:
-
validate_config–Validate the config file with the schema file.
-
loadconfig–Load data from a JSON or YAML file.
-
load_jsonc–Load data from a JSON file that allow comments.
-
get_default_args–Get dict of default values of arguments of a function
-
argdict_to_schemadict–Convert a function's type-annotated arguments into a cerberus schema dict.
validate_config(config_dict=None, config_file=None, schema_dict=None, schema_file=None, allow_unknown=False, require_all=False)
¶
Validate the config file with the schema file.
Parameters:
-
config_dict(dict, default:None) –config dictionary. Defaults to None.
-
config_file(str, default:None) –path to the YAML config file, will override
config_dict. Defaults to None. -
schema_dict(dict, default:None) –schema dictionary. Defaults to None.
-
schema_file(str, default:None) –path to the YAML schema file, will override
schema_dict. Defaults to None. -
allow_unknown(bool, default:False) –whether to allow unknown fields in the config file. Defaults to False.
-
require_all(bool, default:False) –whether to require all fields in the schema file to be present in the config file. Defaults to False.
Raises:
-
ValueError–if the config file does not match the schema
loadconfig(filename: str | Path) -> dict
¶
Load data from a JSON or YAML file.
Args: filename (Union[str, Path]): The filename to load data from, whose suffix should be .json, jsonc, .yml, or .yml
Returns:
-
jdata(dict) –(dict) The data loaded from the file
Notes
- The YAML file can contain variable-interpolation, will be processed by OmegaConf. Example input YAML file:
load_jsonc(filename: str) -> dict
¶
Load data from a JSON file that allow comments.
get_default_args(func: callable) -> dict
¶
Get dict of default values of arguments of a function Args: func (callable): function to inspect
argdict_to_schemadict(func: callable) -> dict
¶
Convert a function's type-annotated arguments into a cerberus schema dict.
Handles
- Single types
- Union types (as list of types)
- Nullable types (
Nonein Union) - Only checks top-level types (no recursion into
list[int],dict[str, float], etc.) - Supports multiple types in cerberus (e.g.
{"type": ["integer", "string"]}) when aUnionis given.
Parameters:
-
func(callable) –function to inspect
Returns:
-
schemadict(dict) –cerberus schema dictionary
io
¶
Classes:
-
DotDict–Dictionary supporting dot notation (attribute access) as well as standard dictionary access.
Functions:
-
write_yaml–Write data to a YAML file.
-
read_yaml–Read data from a YAML file.
-
combine_text_files–Combine text files into a single file in a memory-efficient. Read and write in chunks to avoid loading large files into memory
-
unpack_dict–Unpack one level of nested dictionary.
-
download_rawtext–Download raw text from a URL.
-
txt2str–Convert a text file to a string
-
str2txt–Convert a string to a text file
-
txt2list–Convert a text file to a list of lines (without newline characters)
-
list2txt–Convert a list of lines to a text file
-
float2str–convert float number to str
DotDict(dct=None)
¶
Bases: dict
Dictionary supporting dot notation (attribute access) as well as standard dictionary access. Nested dicts and sequences (list/tuple/set) are converted recursively.
Parameters:
-
dct(dict, default:None) –Initial dictionary to populate the DotDict. Defaults to empty dict.
Usage
d = DotDict({'a': 1, 'b': {'c': 2, 'd': [3, {'e': 4}]}}) print(d.b.c) # 2 print(d['b']['c']) # 2 d.b.d[1].e = 42 print(d.b.d[1].e) # 42 print(d.to_dict()) # plain dict
Methods:
-
__setitem__–Set item using dot notation or standard dict syntax.
-
__setattr__– -
to_dict–Recursively convert DotDict back to plain dict.
Attributes:
__getattr__ = dict.__getitem__
class-attribute
instance-attribute
¶
__delattr__ = dict.__delitem__
class-attribute
instance-attribute
¶
__setitem__(key, value)
¶
Set item using dot notation or standard dict syntax.
__setattr__(key, value)
¶
_wrap(value)
¶
to_dict()
¶
Recursively convert DotDict back to plain dict.
write_yaml(jdata: dict, filename: str | Path)
¶
Write data to a YAML file.
read_yaml(filename: str | Path) -> dict
¶
Read data from a YAML file.
combine_text_files(files: list[str], output_file: str, chunk_size: int = 1024)
¶
Combine text files into a single file in a memory-efficient. Read and write in chunks to avoid loading large files into memory
Parameters:
-
files(list[str]) –List of file paths to combine.
-
output_file(str) –Path to the output file.
-
chunk_size(int, default:1024) –Size of each chunk in KB to read/write. Defaults to 1024 KB.
unpack_dict(nested_dict: dict) -> dict
¶
Unpack one level of nested dictionary.
download_rawtext(url: str, outfile: str = None) -> str
¶
Download raw text from a URL.
txt2str(file_path: str | Path) -> str
¶
Convert a text file to a string
str2txt(text: str, file_path: str | Path) -> None
¶
Convert a string to a text file
txt2list(file_path: str | Path) -> list[str]
¶
Convert a text file to a list of lines (without newline characters)
list2txt(lines: list[str], file_path: str | Path) -> None
¶
Convert a list of lines to a text file
float2str(number: float, decimals=6)
¶
convert float number to str REF: https://stackoverflow.com/questions/2440692/formatting-floats-without-trailing-zeros
Parameters:
-
number(float) –float number
-
fmt(str) –format of the output string
Returns:
-
s(str) –string of the float number
jobman
¶
Modules:
helper
¶
Functions:
-
change_logpath_dispatcher–Change the logfile of dpdispatcher.
-
validate_machine_config–Validate the YAML file contains multiple machines configs. This function is used to validate machine configs at very beginning of program to avoid later errors.
-
loadconfig_multi_machines–Load and validate the YAML file contains multiple machine configs. This function to load machine configs for general purpose usage.
_COLOR_MAP = {0: 'blue', 1: 'green', 2: 'yellow', 3: 'magenta', 4: 'cyan', 5: 'red', 6: 'white', 7: 'white', 8: 'white', 9: 'white', 10: 'white'}
module-attribute
¶
change_logpath_dispatcher(newlogfile: str)
¶
Change the logfile of dpdispatcher.
_init_jobman_logger(logfile: str | None = None)
¶
Initialize the default logger under log/, if not provided
_info_current_dispatch(num_tasks: int, num_tasks_current_chunk: int, job_limit, chunk_index, old_time=None, new_time=None, machine_index=0) -> str
¶
Return the information of the current chunk of tasks.
_remote_info(machine_dict) -> str
¶
Return the remote machine information. Args: mdict (dict): the machine dictionary
validate_machine_config(machine_file: str)
¶
Validate the YAML file contains multiple machines configs. This function is used to validate machine configs at very beginning of program to avoid later errors.
Notes
- To specify multiple remote machines for the same purpose, the top-level keys in the machine config file should start with the same prefix. Example:
train_1,train_2,... for training jobslammps_1,lammps_2,... for lammps jobsgpaw_1,gpaw_2,... for gpaw jobs
_parse_multi_mdict(multi_mdict: dict, mdict_prefix: str = '') -> list[dict]
¶
Parse multiple machine dicts from a multi-machine dict based on the prefix.
Parameters:
-
multi_mdict(dict) –the bid dict contains multiple machines configs
-
mdict_prefix(str, default:'') –the prefix to select remote machines for the same purpose. Example: 'dft', 'md', 'train'.
Returns:
-
list[dict]–list[dict]: list of machine dicts
loadconfig_multi_machines(machine_file: str, mdict_prefix: str = '') -> list[dict]
¶
Load and validate the YAML file contains multiple machine configs. This function to load machine configs for general purpose usage.
Parameters:
-
machine_file(str) –the path of the machine config file
Returns:
-
dict(list[dict]) –the multi-machine dict
submit
¶
Functions:
-
submit_job_chunk–Function to submit a jobs to the remote machine. The function will:
-
async_submit_job_chunk–Convert
submit_job_chunk()into an async function but only need to wait for the completion of the entireforloop (without worrying about the specifics of each operation inside the loop) -
alff_submit_job_multi_remotes–Submit jobs to multiple machines asynchronously.
-
alff_submit_job_multi_remotes_new–Submit jobs to multiple machines asynchronously.
_machine_locks = {}
module-attribute
¶
_prepare_submission(mdict: dict, work_dir: str, task_list: list[Task], forward_common_files: list[str] = [], backward_common_files: list[str] = [], create_remote_path: bool = True) -> Submission
¶
Function to simplify the preparation of the Submission object for dispatching jobs.
Parameters:
-
mdict(dict) –a dictionary contain settings of the remote machine. The parameters described in here
-
create_remote_path(bool, default:True) –whether to create the remote path if it does not exist.
submit_job_chunk(mdict: dict, work_dir: str, task_list: list[Task], forward_common_files: list[str] = [], backward_common_files: list[str] = [], machine_index: int = 0, logger: object = None)
¶
Function to submit a jobs to the remote machine. The function will:
- Prepare the task list
- Make the submission of jobs to remote machines
- Wait for the jobs to finish and download the results to the local machine
Parameters:
-
mdict(dict) –a dictionary contain settings of the remote machine. The parameters described in the remote machine schema. This dictionary defines the login information, resources, execution command, etc. on the remote machine.
-
task_list(list[Task]) –a list of Task objects. Each task object contains the command to be executed on the remote machine, and the files to be copied to and from the remote machine. The dirs of each task must be relative to the
work_dir. -
forward_common_files(list[str], default:[]) –common files used for all tasks. These files are i n the
work_dir. -
backward_common_files(list[str], default:[]) –common files to download from the remote machine when the jobs are finished.
-
machine_index(int, default:0) –index of the machine in the list of machines.
-
logger(object, default:None) –the logger object to be used for logging.
Note
- Split the
task_listinto chunks to control the number of jobs submitted at once. - Should not use the
Localcontexts, it will interference the current shell environment which leads to the unexpected behavior on local machine. Instead, use another account to connect local machine withSSHcontext.
_get_machine_lock(machine_index)
¶
_run_submission_wrapper(submission, logger, check_interval=30, machine_index=0)
async
¶
Ensure only one instance of 'submission.run_submission' runs at a time. - If use one global lock for all machines, it will prevent concurrent execution of submissions on different machines. Therefore, each machine must has its own lock, so different machines can process jobs in parallel.
async_submit_job_chunk(mdict: dict, work_dir: str, task_list: list[Task], forward_common_files: list[str] = [], backward_common_files: list[str] = [], machine_index: int = 0, logger: object = None)
async
¶
Convert submit_job_chunk() into an async function but only need to wait for the completion of the entire for loop (without worrying about the specifics of each operation inside the loop)
Note
- An async function normally contain a
await ...statement to be awaited (yield control to event loop) - If the 'event loop is blocked' by a asynchronous function (it will not yield control to event loop), the async function will wait for the completion of the synchronous function. So, the async function will not be executed asynchronously. Try to use
await asyncio.to_thread()to run the synchronous function in a separate thread, so that the event loop is not blocked.
_alff_prepare_task_list(command_list: list[str], task_dirs: list[str], forward_files: list[str], backward_files: list[str], outlog: str, errlog: str) -> list[Task]
¶
Prepare the task list for alff package.
The feature of jobs in alff package is that they have the same: command_list, forward_files, backward_files. So, this function prepares the list of Task object for alff package. For general usage, should prepare the task list manually.
Parameters:
-
command_list(list[str]) –the list of commands to be executed on the remote machine.
-
task_dirs(list[str]) –the list of directories for each task. They must be relative to the
work_dirin function_prepare_submission -
forward_files(list[str]) –the list of files to be copied to the remote machine. These files must existed in each
task_dir. -
backward_files(list[str]) –the list of files to be copied back from the remote machine.
-
outlog(str) –the name of the output log file.
-
errlog(str) –the name of the error log file.
-
# delay_fail_report(bool) –whether to delay the failure report until all tasks are done. This is useful when there are many tasks, and we want to wait all tasks finished instead of "the controller interupts if one task fail".
Returns:
-
list[Task]–list[Task]: a list of Task objects.
_divide_workload(mdict_list: list[dict]) -> list[float]
¶
Revise workload ratio among multiple machines based on their work load ratio.
_divide_task_dirs(mdict_list: list[dict], task_dirs: list[str]) -> list[list[str]]
¶
Distribute task_dirs among multiple machines based on their work load ratio.
alff_submit_job_multi_remotes(multi_mdict: dict, prepare_command_list: callable, work_dir: str, task_dirs: list[str], forward_files: list[str], backward_files: list[str], forward_common_files: list[str] = [], backward_common_files: list[str] = [], mdict_prefix: str = 'dft', logger: object = None)
async
¶
Submit jobs to multiple machines asynchronously.
Update 2025Sep20: Use alff_submit_job_multi_remotes_new() instead, which is more flexible. This function is kept for backward compatibility.
Parameters:
-
multi_mdict(dict) –the big_dict contains multiple
mdicts. Eachmdictcontains parameters of one remote machine, which parameters as in the remote machine schema. -
prepare_command_list(callable) –a function to prepare the command list based on each remote machine.
-
mdict_prefix(str, default:'dft') –the prefix to select remote machines for the same purpose. Example: 'dft', 'md', 'train'.
alff_submit_job_multi_remotes_new(mdict_list: list[dict], commandlist_list: list[list[str]], work_dir: str, task_dirs: list[str], forward_files: list[str], backward_files: list[str], forward_common_files: list[str] = [], backward_common_files: list[str] = [], logger: object = None)
async
¶
Submit jobs to multiple machines asynchronously.
Parameters:
-
mdict_list(list[dict]) –list of multiple
mdicts. Eachmdictcontains parameters of one remote machine, which parameters as in the remote machine schema. -
commandlist_list(list[list[str]]) –list of command_lists, each list for each remote machine. Need to prepare outside.
path
¶
Functions:
-
make_dir–Create a directory with a backup option.
-
make_dir_ask_backup–Make a directory and ask for backup if the directory already exists.
-
ask_user_action–Ask user for one of: yes / no / backup, and return normalized choice.
-
list_paths–List all files/folders in given directories and their subdirectories that match the given patterns.
-
collect_files–Collect files from a list of paths (files/folders). Will search files in folders and their subdirectories.
-
change_pathname–change path names
-
remove_files–Remove files from a given list of file paths.
-
remove_dirs–Remove a list of directories.
-
remove_files_in_paths–Remove files in the
fileslist in thepathslist. -
remove_dirs_in_paths–Remove directories in the
dirslist in thepathslist. -
copy_file–Copy a file/folder from the
source pathto thedestination path. It will create the destination directory if it does not exist. -
move_file–Move a file/folder from the source path to the destination path.
-
filter_dirs–Return directories containing
has_filesand none ofno_files.
make_dir(path: str, backup: bool = True)
¶
Create a directory with a backup option.
make_dir_ask_backup(dir_path: str)
¶
Make a directory and ask for backup if the directory already exists.
ask_user_action(question: str) -> str
¶
Ask user for one of: yes / no / backup, and return normalized choice.
list_paths(paths: list[str], patterns: list[str], recursive=True) -> list[str]
¶
List all files/folders in given directories and their subdirectories that match the given patterns.
Parameters¶
paths : list[str] The list of paths to search files/folders. patterns : list[str] The list of patterns to apply to the files. Each filter can be a file extension or a pattern.
Returns:¶
List[str]: A list of matching paths.
Example:¶
folders = ["path1", "path2", "path3"]
patterns = ["*.ext1", "*.ext2", "something*.ext3", "*folder/"]
files = list_files_in_dirs(folders, patterns)
Note:¶
- glob() does not list hidden files by default. To include hidden files, use glob(".*", recursive=True).
- When use recursive=True, must include
**in the pattern to search subdirectories.- glob("*", recursive=True) will search all FILES & FOLDERS in the CURRENT directory.
- glob("*/", recursive=True) will search all FOLDERS in the current CURRENT directory.
- glob("**", recursive=True) will search all FILES & FOLDERS in the CURRENT & SUB subdirectories.
- glob("**/", recursive=True) will search all FOLDERS in the current CURRENT & SUB subdirectories.
- "/*" is equivalent to "".
- "/*/" is equivalent to "/".
- IMPORTANT: "/" will replicate the behavior of "**", then give unexpected results.
collect_files(paths: list[str], patterns: list[str]) -> list[str]
¶
Collect files from a list of paths (files/folders). Will search files in folders and their subdirectories.
Parameters¶
paths : list[str] The list of paths to collect files from. patterns : list[str] The list of patterns to apply to the files. Each filter can be a file extension or a pattern.
Returns:¶
List[str]: A list of paths matching files.
change_pathname(paths: list[str], old_string: str, new_string: str, replace: bool = False) -> None
¶
change path names
Parameters:
-
paths(list[str]) –paths to the files/dirs
-
old_string(str) –old string in path name
-
new_string(str) –new string in path name
-
replace(bool, default:False) –replace the old path name if the new one exists. Defaults to False.
remove_files(files: list[str]) -> None
¶
Remove files from a given list of file paths.
Parameters:
-
files(list[str]) –list of file paths
remove_dirs(dirs: list[str]) -> None
¶
Remove a list of directories.
Parameters:
-
dirs(list[str]) –list of directories to remove.
remove_files_in_paths(files: list, paths: list) -> None
¶
Remove files in the files list in the paths list.
remove_dirs_in_paths(dirs: list, paths: list) -> None
¶
Remove directories in the dirs list in the paths list.
copy_file(src_path: str, dest_path: str)
¶
Copy a file/folder from the source path to the destination path. It will create the destination directory if it does not exist.
move_file(src_path: str, dest_path: str)
¶
Move a file/folder from the source path to the destination path.
filter_dirs(dirs: list[str], has_files: list[str] = None, no_files: list[str] = None) -> list[str]
¶
Return directories containing has_files and none of no_files.
Parameters:
-
dirs(list[str]) –List of directory paths to scan.
-
has_files(list[str], default:None) –Files that must exist in the directory. Defaults to [].
-
no_files(list[str], default:None) –Files that must not exist in the directory. Defaults to [].
Returns:
-
list[str]–List of directory paths meeting the conditions.
pkg
¶
Functions:
-
strip_ansi_codes–Strip ANSI codes for color formatting from a string.
-
create_logger–Create and configure a logger with console and optional file handlers.
-
check_package–Check if the required packages are installed
-
install_package–Install the required package:
-
dependency_info–Get the dependency information
strip_ansi_codes(msg: str) -> str
¶
Strip ANSI codes for color formatting from a string.
create_logger(logger_name: str = None, log_file: str = None, level: str = 'INFO', level_logfile: str = None) -> logging.Logger
¶
Create and configure a logger with console and optional file handlers.
check_package(package_name: str, auto_install: bool = False, git_repo: str = None, conda_channel: str = None)
¶
Check if the required packages are installed
install_package(package_name: str, git_repo: str | None = None, conda_channel: str | None = None) -> None
¶
Install the required package
- Default using:
pip install -U {package_name} - If
git_repois provided:pip install -U git+{git_repo} - If
conda_channelis provided:conda install -c {conda_channel} {package_name}
Parameters:
-
package_name(str) –package name
-
git_repo(str, default:None) –git path for the package. Default: None. E.g., http://somthing.git
-
conda_channel(str, default:None) –conda channel for the package. Default: None. E.g., conda-forge
dependency_info(packages=['numpy', 'polars', 'thkit', 'ase']) -> str
¶
Get the dependency information
Note
Use importlib instead of __import__ for clarity.
range
¶
Functions:
-
range_inclusive–Generate evenly spaced points including the endpoint (within tolerance).
-
composite_range–A custom parser to allow define composite ranges. This use needed for defining input parameters in YAML files.
-
composite_index–Allow define composite index ranges.
-
composite_strain_points–Generate composite spacing points from multiple ranges with tolerance-based uniqueness.
range_inclusive(start: float, end: float, step: float, tol: float = 1e-06) -> list[float]
¶
Generate evenly spaced points including the endpoint (within tolerance).
composite_range(list_inputs: list[int | float | str], tol=1e-06) -> list[float]
¶
A custom parser to allow define composite ranges. This use needed for defining input parameters in YAML files.
Parameters:
-
list_inputs(list[int | float | str]) –Accepts numbers or strings with special form 'start:end[:step]' (inclusive).
-
tol(float, default:1e-06) –Tolerance for including the endpoint.
Examples: ["-3.1:-1", 0.1, 2, "3.1:5.2", "6.0:10.1:0.5"]
composite_index(list_inputs: list[int | str]) -> list[int]
¶
Allow define composite index ranges.
Parameters:
-
list_inputs(list[int | str]) –Accepts ints or strings with special form 'start:end[:step]' (inclusive).
Examples: [1, 2, "3-5", "7-10:2"] -> [1, 2, 3, 4, 5, 7, 9, 10]
composite_strain_points(list_inputs: list[int | float | str], tol=1e-06) -> list[float]
¶
Generate composite spacing points from multiple ranges with tolerance-based uniqueness.
Notes:
- np.round(np.array(all_points) / tol).astype(int) is a trick to avoid floating point issues
when comparing points with a certain tolerance.
stuff
¶
Functions:
-
chunk_list–Yield successive n-sized chunks from
input_list. -
text_fill_center–Create a line with centered text.
-
text_fill_left–Create a line with left-aligned text.
-
text_fill_box–Return a string centered in a box with side delimiters.
-
text_repeat–Repeat the input string to a specified length.
-
text_color–ANSI escape codes for color the text.
-
time_uuid– -
simple_uuid–Generate a simple random UUID of 4 digits.
chunk_list(input_list: list, n: int) -> Generator
¶
Yield successive n-sized chunks from input_list.
text_fill_center(input_text='example', fill='-', length=60)
¶
Create a line with centered text.
text_fill_left(input_text='example', margin=15, fill_left='-', fill_right=' ', length=60)
¶
Create a line with left-aligned text.
text_fill_box(input_text: str = '', fill: str = ' ', sp: str = 'ǁ', length: int = 60) -> str
¶
Return a string centered in a box with side delimiters.
Example
text_fill_box("hello", fill="-", sp="|", length=20) '|-------hello-------|'
Notes: - To input unicode characters, use the unicode escape sequence (e.g., "ǁ" for a specific character). See https://symbl.cc/en/unicode-table/ for more details. - ║ (double vertical bar, u2551) - ‖ (double vertical line, u2016) - ǁ (Latin letter lateral click, u01C1)
text_repeat(input_str: str, length: int) -> str
¶
Repeat the input string to a specified length.
text_color(text: str, color: str = 'blue') -> str
¶
ANSI escape codes for color the text. follow this link for more details.
time_uuid() -> str
¶
simple_uuid()
¶
Generate a simple random UUID of 4 digits.