API¶
thutil
¶
The package for general ulitities.
Developed and maintained by C.Thang Nguyen
Modules:
Attributes:
__description__ = 'Python package'
module-attribute
¶
__long_description__ = 'ML based applications '
module-attribute
¶
__author__ = 'thangckt'
module-attribute
¶
config
¶
Functions:
-
validate_config
–Validate the config file with the schema file.
-
load_config
–Load data from a JSON or YAML file. The YAML file can contain variable-interpolation, will be processed by OmegaConf.
-
load_jsonc
–Load data from a JSON file that allow comments.
-
unpack_dict
–Unpack one level of nested dictionary.
-
write_yaml
–Write data to a YAML file.
-
read_yaml
–Read data from a YAML file.
validate_config(config_dict=None, config_file=None, schema_dict=None, schema_file=None, allow_unknown=False, require_all=False)
¶
Validate the config file with the schema file.
Parameters:
-
config_dict
(dict
, default:None
) –config dictionary. Defaults to None.
-
config_file
(str
, default:None
) –path to the YAML config file, will override
config_dict
. Defaults to None. -
schema_dict
(dict
, default:None
) –schema dictionary. Defaults to None.
-
schema_file
(str
, default:None
) –path to the YAML schema file, will override
schema_dict
. Defaults to None. -
allow_unknown
(bool
, default:False
) –whether to allow unknown fields in the config file. Defaults to False.
-
require_all
(bool
, default:False
) –whether to require all fields in the schema file to be present in the config file. Defaults to False.
Raises:
-
ValueError
–if the config file does not match the schema
load_config(filename: Union[str, Path]) -> dict
¶
Load data from a JSON or YAML file. The YAML file can contain variable-interpolation, will be processed by OmegaConf.
Args: filename (Union[str, Path]): The filename to load data from, whose suffix should be .json, jsonc, .yaml, or .yml
Returns:
-
jdata
(dict
) –(dict) The data loaded from the file
load_jsonc(filename: str) -> dict
¶
Load data from a JSON file that allow comments.
unpack_dict(nested_dict: dict) -> dict
¶
Unpack one level of nested dictionary.
write_yaml(jdata: dict, filename: Union[str, Path])
¶
Write data to a YAML file.
read_yaml(filename: Union[str, Path]) -> dict
¶
Read data from a YAML file.
io
¶
Functions:
-
combine_text_files
–Combine text files into a single file in a memory-efficient. Read and write in chunks to avoid loading large files into memory
-
download_rawtext
–Download raw text from a URL.
combine_text_files(files: list[str], output_file: str, chunk_size: int = 1024)
¶
Combine text files into a single file in a memory-efficient. Read and write in chunks to avoid loading large files into memory
Parameters:
-
files
(list[str]
) –List of file paths to combine.
-
output_file
(str
) –Path to the output file.
-
chunk_size
(int
, default:1024
) –Size of each chunk in KB to read/write. Defaults to 1024 KB.
download_rawtext(url: str, outfile: str = None) -> str
¶
Download raw text from a URL.
path
¶
Functions:
-
make_dir
–Create a directory with a backup option.
-
make_dir_ask_backup
–Make a directory and ask for backup if the directory already exists.
-
ask_yes_no
–Asks a yes/no/backup question and returns the response.
-
list_paths
–List all files/folders in given directories and their subdirectories that match the given patterns.
-
collect_files
–Collect files from a list of paths (files/folders). Will search files in folders and their subdirectories.
-
change_pathname
–change path names
-
remove_files
–Remove files from a given list of file paths.
-
remove_dirs
–Remove a list of directories.
-
remove_files_in_paths
–Remove files in the
files
list in thepaths
list. -
remove_dirs_in_paths
–Remove directories in the
dirs
list in thepaths
list. -
copy_file
–Copy a file/folder from the source path to the destination path.
-
move_file
–Move a file/folder from the source path to the destination path.
-
scan_dirs
–Check if the folders contains and not contains some files.
make_dir(path: str, backup: bool = True)
¶
Create a directory with a backup option.
make_dir_ask_backup(dir_path: str)
¶
Make a directory and ask for backup if the directory already exists.
ask_yes_no(question: str) -> str
¶
Asks a yes/no/backup question and returns the response.
list_paths(paths: list[str], patterns: list[str], recursive=True) -> list[str]
¶
List all files/folders in given directories and their subdirectories that match the given patterns.
Parameters¶
paths : list[str] The list of paths to search files/folders. patterns : list[str] The list of patterns to apply to the files. Each filter can be a file extension or a pattern.
Returns:¶
List[str]: A list of matching paths.
Example:¶
folders = ["path1", "path2", "path3"]
patterns = ["*.ext1", "*.ext2", "something*.ext3", "*folder/"]
files = list_files_in_dirs(folders, patterns)
Note:¶
- glob() does not list hidden files by default. To include hidden files, use glob(".*", recursive=True).
- When use recursive=True, must include
**
in the pattern to search subdirectories.- glob("*", recursive=True) will search all FILES & FOLDERS in the CURRENT directory.
- glob("*/", recursive=True) will search all FOLDERS in the current CURRENT directory.
- glob("**", recursive=True) will search all FILES & FOLDERS in the CURRENT & SUB subdirectories.
- glob("**/", recursive=True) will search all FOLDERS in the current CURRENT & SUB subdirectories.
- "/*" is equivalent to "".
- "/*/" is equivalent to "/".
- IMPORTANT: "/" will replicate the behavior of "**", then give unexpected results.
collect_files(paths: list[str], patterns: list[str]) -> list[str]
¶
Collect files from a list of paths (files/folders). Will search files in folders and their subdirectories.
Parameters¶
paths : list[str] The list of paths to collect files from. patterns : list[str] The list of patterns to apply to the files. Each filter can be a file extension or a pattern.
Returns:¶
List[str]: A list of paths matching files.
change_pathname(paths: list[str], old_string: str, new_string: str, replace: bool = False) -> None
¶
change path names
Parameters:
-
paths
(list[str]
) –paths to the files/dirs
-
old_string
(str
) –old string in path name
-
new_string
(str
) –new string in path name
-
replace
(bool
, default:False
) –replace the old path name if the new one exists. Defaults to False.
remove_files(files: list[str]) -> None
¶
Remove files from a given list of file paths.
Parameters:
-
files
(list[str]
) –list of file paths
remove_dirs(dirs: list[str]) -> None
¶
Remove a list of directories.
Parameters:
-
dirs
(list[str]
) –list of directories to remove.
remove_files_in_paths(files: list, paths: list) -> None
¶
Remove files in the files
list in the paths
list.
remove_dirs_in_paths(dirs: list, paths: list) -> None
¶
Remove directories in the dirs
list in the paths
list.
copy_file(src_path: str, dest_path: str)
¶
Copy a file/folder from the source path to the destination path.
move_file(src_path: str, dest_path: str)
¶
Move a file/folder from the source path to the destination path.
scan_dirs(dirs: list[str], with_files: list[str], without_files: list[str] = []) -> list[str]
¶
Check if the folders contains and not contains some files.
Parameters:
-
dirs
(list[str]
) –The paths of dirs to scan.
-
with_files
(list[str]
) –The files that should exist in the path.
-
without_files
(list[str]
, default:[]
) –The files that should not exist in the work_dir. Defaults to [].
Returns:
-
list[str]
–list[str]: The paths that meet the conditions.
pkg
¶
Functions:
-
create_logger
–Create and configure a logger with console and optional file handlers.
-
check_package
–Check if the required packages are installed
-
get_func_args
–Get the arguments of a function
-
dependency_info
–Get the dependency information
create_logger(logger_name: str = None, log_file: str = None, level: str = 'INFO', level_logfile: str = None, format_: str = 'info') -> logging.Logger
¶
Create and configure a logger with console and optional file handlers.
check_package(package_name: str, auto_install: bool = False, git_repo: str = None, conda_channel: str = None)
¶
Check if the required packages are installed
_install_package(package_name: str, git_repo: str = None, conda_channel: str = None)
¶
Install the required package
- Default using:
pip install -U {package_name}
- If
git_repo
is provided:pip install -U git+{git_repo}
- If
conda_channel
is provided:conda install -c {conda_channel} {package_name}
package_name (str): package name
git_repo (str): git path for the package. Default: None. E.g., http://somthing.git
conda_channel (str): conda channel for the package. Default: None. E.g., conda-forge
get_func_args(func)
¶
Get the arguments of a function
dependency_info(modules=['numpy', 'polars', 'thutil', 'ase']) -> str
¶
Get the dependency information
sth2sth
¶
Functions:
file2str(file_path: Union[str, Path]) -> str
¶
str2file(text: str, file_path: Union[str, Path]) -> None
¶
file2list(file_path: Union[str, Path]) -> list[str]
¶
list2file(text_list: list, file_path: Union[str, Path]) -> None
¶
float2str(floatnum, decimals=6)
¶
convert float number to str REF: https://stackoverflow.com/questions/2440692/formatting-floats-without-trailing-zeros
Parameters:
-
floatnum
(float
) –float number
-
fmt
(str
) –format of the output string
Returns:
-
s
(str
) –string of the float number
stuff
¶
Functions:
-
chunk_list
–Yield successive n-sized chunks from
input_list
. -
unpack_indices
–Expand the input list of indices to a list of integers.
-
text_fill_center
–Create a line with centered text.
-
text_fill_left
–Create a line with left-aligned text.
-
text_fill_box
–Put the string at the center of | |.
-
text_color
–ANSI escape codes for color the text.
-
time_uuid
– -
simple_uuid
–Generate a simple random UUID of 4 digits.
chunk_list(input_list: list, n: int) -> Generator
¶
Yield successive n-sized chunks from input_list
.
unpack_indices(list_inputs: list[int | str]) -> list[int]
¶
Expand the input list of indices to a list of integers. Eg: list_inputs = [1, 2, "3-5:2", "6-10"]
text_fill_center(input_text='example', fill='-', max_length=60)
¶
Create a line with centered text.
text_fill_left(input_text='example', left_margin=15, fill='-', max_length=60)
¶
Create a line with left-aligned text.
text_fill_box(input_text='', fill=' ', sp='|', max_length=60)
¶
Put the string at the center of | |.
text_color(text: str, color: str = 'blue') -> str
¶
ANSI escape codes for color the text.
time_uuid() -> str
¶
simple_uuid()
¶
Generate a simple random UUID of 4 digits.