Skip to content

Blog

Awesome Python

# Awesome Python Awesome

An opinionated list of awesome Python frameworks, libraries, software and resources.

Inspired by awesome-php.


Admin Panels

Libraries for administrative interfaces.

  • ajenti - The admin panel your servers deserve.
  • django-grappelli - A jazzy skin for the Django Admin-Interface.
  • flask-admin - Simple and extensible administrative interface framework for Flask.
  • flower - Real-time monitor and web admin for Celery.
  • jet-bridge - Admin panel framework for any application with nice UI (ex Jet Django).
  • wooey - A Django app which creates automatic web UIs for Python scripts.
  • streamlit - A framework which lets you build dashboards, generate reports, or create chat apps in minutes.

Algorithms and Design Patterns

Python implementation of data structures, algorithms and design patterns. Also see awesome-algorithms.

  • Algorithms
    • algorithms - Minimal examples of data structures and algorithms.
    • python-ds - A collection of data structure and algorithms for coding interviews.
    • sortedcontainers - Fast and pure-Python implementation of sorted collections.
    • thealgorithms - All Algorithms implemented in Python.
  • Design Patterns
    • pypattyrn - A simple yet effective library for implementing common design patterns.
    • python-patterns - A collection of design patterns in Python.
    • transitions - A lightweight, object-oriented finite state machine implementation.

ASGI Servers

ASGI-compatible web servers.

  • daphne - A HTTP, HTTP2 and WebSocket protocol server for ASGI and ASGI-HTTP.
  • uvicorn - A lightning-fast ASGI server implementation, using uvloop and httptools.
  • hypercorn - An ASGI and WSGI Server based on Hyper libraries and inspired by Gunicorn.

Asynchronous Programming

Libraries for asynchronous, concurrent and parallel execution. Also see awesome-asyncio.

  • asyncio - (Python standard library) Asynchronous I/O, event loop, coroutines and tasks.
  • concurrent.futures - (Python standard library) A high-level interface for asynchronously executing callables.
  • multiprocessing - (Python standard library) Process-based parallelism.
  • trio - A friendly library for async concurrency and I/O.
  • twisted - An event-driven networking engine.
  • uvloop - Ultra fast asyncio event loop.
  • eventlet - Asynchronous framework with WSGI support.
  • gevent - A coroutine-based Python networking library that uses greenlet.

Audio

Libraries for manipulating audio and its metadata.

  • Audio
    • audioread - Cross-library (GStreamer + Core Audio + MAD + FFmpeg) audio decoding.
    • audioFlux - A library for audio and music analysis, feature extraction.
    • dejavu - Audio fingerprinting and recognition.
    • kapre - Keras Audio Preprocessors.
    • librosa - Python library for audio and music analysis.
    • matchering - A library for automated reference audio mastering.
    • mingus - An advanced music theory and notation package with MIDI file and playback support.
    • pyaudioanalysis - Audio feature extraction, classification, segmentation and applications.
    • pydub - Manipulate audio with a simple and easy high level interface.
    • timeside - Open web audio processing framework.
  • Metadata
    • beets - A music library manager and MusicBrainz tagger.
    • eyed3 - A tool for working with audio files, specifically MP3 files containing ID3 metadata.
    • mutagen - A Python module to handle audio metadata.
    • tinytag - A library for reading music meta data of MP3, OGG, FLAC and Wave files.

Authentication

Libraries for implementing authentications schemes.

  • OAuth
    • authlib - JavaScript Object Signing and Encryption draft implementation.
    • django-allauth - Authentication app for Django that "just works."
    • django-oauth-toolkit - OAuth 2 goodies for Django.
    • oauthlib - A generic and thorough implementation of the OAuth request-signing logic.
  • JWT
    • pyjwt - JSON Web Token implementation in Python.
    • python-jose - A JOSE implementation in Python.

Build Tools

Compile software from source code.

  • bitbake - A make-like build tool for embedded Linux.
  • buildout - A build system for creating, assembling and deploying applications from multiple parts.
  • platformio - A console tool to build code with different development platforms.
  • pybuilder - A continuous build tool written in pure Python.
  • scons - A software construction tool.

Built-in Classes Enhancement

Libraries for enhancing Python built-in classes.

  • attrs - Replacement for __init__, __eq__, __repr__, etc. boilerplate in class definitions.
  • bidict - Efficient, Pythonic bidirectional map data structures and related functionality..
  • box - Python dictionaries with advanced dot notation access.
  • dataclasses - (Python standard library) Data classes.
  • dotteddict - A library that provides a method of accessing lists and dicts with a dotted path notation.

CMS

Content Management Systems.

  • feincms - One of the most advanced Content Management Systems built on Django.
  • indico - A feature-rich event management system, made @ CERN.
  • wagtail - A Django content management system.

Caching

Libraries for caching data.

  • beaker - A WSGI middleware for sessions and caching.
  • django-cache-machine - Automatic caching and invalidation for Django models.
  • django-cacheops - A slick ORM cache with automatic granular event-driven invalidation.
  • dogpile.cache - dogpile.cache is a next generation replacement for Beaker made by the same authors.
  • hermescache - Python caching library with tag-based invalidation and dogpile effect prevention.
  • pylibmc - A Python wrapper around the libmemcached interface.
  • python-diskcache - SQLite and file backed cache backend with faster lookups than memcached and redis.

ChatOps Tools

Libraries for chatbot development.

  • errbot - The easiest and most popular chatbot to implement ChatOps.

Code Analysis

Tools of static analysis, linters and code quality checkers. Also see awesome-static-analysis.

  • Code Analysis
    • code2flow - Turn your Python and JavaScript code into DOT flowcharts.
    • prospector - A tool to analyse Python code.
    • vulture - A tool for finding and analysing dead Python code.
  • Code Linters
  • Code Formatters
    • black - The uncompromising Python code formatter.
    • isort - A Python utility / library to sort imports.
    • yapf - Yet another Python code formatter from Google.
  • Static Type Checkers, also see awesome-python-typing
    • mypy - Check variable types during compile time.
    • pyre-check - Performant type checking.
    • typeshed - Collection of library stubs for Python, with static types.
  • Static Type Annotations Generators
    • monkeytype - A system for Python that generates static type annotations by collecting runtime types.
    • pytype - Pytype checks and infers types for Python code - without requiring type annotations.

Command-line Interface Development

Libraries for building command-line applications.

  • Command-line Application Development
    • cement - CLI Application Framework for Python.
    • click - A package for creating beautiful command line interfaces in a composable way.
    • cliff - A framework for creating command-line programs with multi-level commands.
    • python-fire - A library for creating command line interfaces from absolutely any Python object.
    • python-prompt-toolkit - A library for building powerful interactive command lines.
  • Terminal Rendering
    • alive-progress - A new kind of Progress Bar, with real-time throughput, eta and very cool animations.
    • asciimatics - A package to create full-screen text UIs (from interactive forms to ASCII animations).
    • bashplotlib - Making basic plots in the terminal.
    • colorama - Cross-platform colored terminal text.
    • rich - Python library for rich text and beautiful formatting in the terminal. Also provides a great RichHandler log handler.
    • tqdm - Fast, extensible progress bar for loops and CLI.

Command-line Tools

Useful CLI-based tools for productivity.

  • Productivity Tools
    • copier - A library and command-line utility for rendering projects templates.
    • cookiecutter - A command-line utility that creates projects from cookiecutters (project templates).
    • doitlive - A tool for live presentations in the terminal.
    • howdoi - Instant coding answers via the command line.
    • invoke - A tool for managing shell-oriented subprocesses and organizing executable Python code into CLI-invokable tasks.
    • pathpicker - Select files out of bash output.
    • thefuck - Correcting your previous console command.
    • tmuxp - A tmux session manager.
    • try - A dead simple CLI to try out python packages - it's never been easier.
  • CLI Enhancements
    • httpie - A command line HTTP client, a user-friendly cURL replacement.
    • iredis - Redis CLI with autocompletion and syntax highlighting.
    • litecli - SQLite CLI with autocompletion and syntax highlighting.
    • mycli - MySQL CLI with autocompletion and syntax highlighting.
    • pgcli - PostgreSQL CLI with autocompletion and syntax highlighting.

Computer Vision

Libraries for Computer Vision.

  • easyocr - Ready-to-use OCR with 40+ languages supported.
  • kornia - Open Source Differentiable Computer Vision Library for PyTorch.
  • opencv - Open Source Computer Vision Library.
  • pytesseract - A wrapper for Google Tesseract OCR.
  • tesserocr - Another simple, Pillow-friendly, wrapper around the tesseract-ocr API for OCR.

Configuration Files

Libraries for storing and parsing configuration options.

  • configparser - (Python standard library) INI file parser.
  • configobj - INI file parser with validation.
  • hydra - Hydra is a framework for elegantly configuring complex applications.
  • python-decouple - Strict separation of settings from code.

Cryptography

  • cryptography - A package designed to expose cryptographic primitives and recipes to Python developers.
  • paramiko - The leading native Python SSHv2 protocol library.
  • pynacl - Python binding to the Networking and Cryptography (NaCl) library.

Data Analysis

Libraries for data analyzing.

  • pandas - A library providing high-performance, easy-to-use data structures and data analysis tools.
  • aws-sdk-pandas - Pandas on AWS.
  • datasette - An open source multi-tool for exploring and publishing data.
  • optimus - Agile Data Science Workflows made easy with PySpark.

Data Validation

Libraries for validating data. Used for forms in many cases.

  • cerberus - A lightweight and extensible data validation library.
  • colander - Validating and deserializing data obtained via XML, JSON, an HTML form post.
  • jsonschema - An implementation of JSON Schema for Python.
  • schema - A library for validating Python data structures.
  • schematics - Data Structure Validation.
  • voluptuous - A Python data validation library.
  • pydantic - Data validation using Python type hints.

Data Visualization

Libraries for visualizing data. Also see awesome-javascript.

  • altair - Declarative statistical visualization library for Python.
  • bokeh - Interactive Web Plotting for Python.
  • bqplot - Interactive Plotting Library for the Jupyter Notebook.
  • cartopy - A cartographic python library with matplotlib support.
  • diagrams - Diagram as Code.
  • matplotlib - A Python 2D plotting library.
  • plotnine - A grammar of graphics for Python based on ggplot2.
  • pygal - A Python SVG Charts Creator.
  • pygraphviz - Python interface to Graphviz.
  • pyqtgraph - Interactive and realtime 2D/3D/Image plotting and science/engineering widgets.
  • seaborn - Statistical data visualization using Matplotlib.
  • vispy - High-performance scientific visualization based on OpenGL.

Database

Databases implemented in Python.

  • pickleDB - A simple and lightweight key-value store for Python.
  • tinydb - A tiny, document-oriented database.
  • zodb - A native object database for Python. A key-value and object graph database.

Database Drivers

Libraries for connecting and operating databases.

  • MySQL - awesome-mysql
  • PostgreSQL - awesome-postgres
    • psycopg - The most popular PostgreSQL adapter for Python.
  • SQlite - awesome-sqlite
    • sqlite3 - (Python standard library) SQlite interface compliant with DB-API 2.0.
    • sqlite-utils - Python CLI utility and library for manipulating SQLite databases.
  • Other Relational Databases
    • pymssql - A simple database interface to Microsoft SQL Server.
    • clickhouse-driver - Python driver with native interface for ClickHouse.
  • NoSQL Databases
    • cassandra-driver - The Python Driver for Apache Cassandra.
    • happybase - A developer-friendly library for Apache HBase.
    • kafka-python - The Python client for Apache Kafka.
    • pymongo - The official Python client for MongoDB.
    • motor - The async Python driver for MongoDB.
    • redis-py - The Python client for Redis.

Date and Time

Libraries for working with dates and times.

  • arrow - A Python library that offers a sensible and human-friendly approach to creating, manipulating, formatting and converting dates, times and timestamps.
  • dateutil - Extensions to the standard Python datetime module.
  • pendulum - Python datetimes made easy.
  • pytz - World timezone definitions, modern and historical. Brings the tz database into Python.

Debugging Tools

Libraries for debugging code.

  • pdb-like Debugger
    • ipdb - IPython-enabled pdb.
    • pudb - A full-screen, console-based Python debugger.
  • Tracing
    • manhole - Debugging UNIX socket connections and present the stacktraces for all threads and an interactive prompt.
    • python-hunter - A flexible code tracing toolkit.
  • Profiler
    • py-spy - A sampling profiler for Python programs. Written in Rust.
    • vprof - Visual Python profiler.
  • Others
    • django-debug-toolbar - Display various debug information for Django.
    • flask-debugtoolbar - A port of the django-debug-toolbar to flask.
    • icecream - Inspect variables, expressions, and program execution with a single, simple function call.
    • pyelftools - Parsing and analyzing ELF files and DWARF debugging information.

Deep Learning

Frameworks for Neural Networks and Deep Learning. Also see awesome-deep-learning.

  • keras - A high-level neural networks library and capable of running on top of either TensorFlow or Theano.
  • pytorch - Tensors and Dynamic neural networks in Python with strong GPU acceleration.
  • pytorch-lightning - Deep learning framework to train, deploy, and ship AI products Lightning fast.
  • stable-baselines3 - PyTorch implementations of Stable Baselines (deep) reinforcement learning algorithms.
  • tensorflow - The most popular Deep Learning framework created by Google.
  • theano - A library for fast numerical computation.

DevOps Tools

Software and libraries for DevOps.

  • Configuration Management
    • ansible - A radically simple IT automation platform.
    • cloudinit - A multi-distribution package that handles early initialization of a cloud instance.
    • openstack - Open source software for building private and public clouds.
    • pyinfra - A versatile CLI tools and python libraries to automate infrastructure.
    • saltstack - Infrastructure automation and management system.
  • SSH-style Deployment
    • cuisine - Chef-like functionality for Fabric.
    • fabric - A simple, Pythonic tool for remote execution and deployment.
  • Process Management
    • supervisor - Supervisor process control system for UNIX.
  • Monitoring
    • psutil - A cross-platform process and system utilities module.
  • Backup
    • borg - A deduplicating archiver with compression and encryption.

Distributed Computing

Frameworks and libraries for Distributed Computing.

  • Batch Processing
    • dask - A flexible parallel computing library for analytic computing.
    • luigi - A module that helps you build complex pipelines of batch jobs.
    • PySpark - Apache Spark Python API.
    • Ray - A system for parallel and distributed Python that unifies the machine learning ecosystem.
  • Stream Processing

Distribution

Libraries to create packaged executables for release distribution.

  • py2app - Freezes Python scripts (Mac OS X).
  • py2exe - Freezes Python scripts (Windows).
  • pyarmor - A tool used to obfuscate python scripts, bind obfuscated scripts to fixed machine or expire obfuscated scripts.
  • pyinstaller - Converts Python programs into stand-alone executables (cross-platform).
  • shiv - A command line utility for building fully self-contained zipapps (PEP 441), but with all their dependencies included.

Documentation

Libraries for generating project documentation.

  • sphinx - Python Documentation generator.
  • pdoc - Epydoc replacement to auto generate API documentation for Python libraries.

Downloader

Libraries for downloading.

  • akshare - A financial data interface library, built for human beings!
  • s3cmd - A command line tool for managing Amazon S3 and CloudFront.
  • youtube-dl - A command-line program to download videos from YouTube and other video sites.

Editor Plugins and IDEs

  • Emacs
    • elpy - Emacs Python Development Environment.
  • Vim
    • jedi-vim - Vim bindings for the Jedi auto-completion library for Python.
    • python-mode - An all in one plugin for turning Vim into a Python IDE.
    • YouCompleteMe - Includes Jedi-based completion engine for Python.
  • Visual Studio
    • PTVS - Python Tools for Visual Studio.
  • Visual Studio Code
    • Python - The official VSCode extension with rich support for Python.
  • IDE
    • PyCharm - Commercial Python IDE by JetBrains. Has free community edition available.
    • spyder - Open Source Python IDE.

Email

Libraries for sending and parsing email.

  • Mail Servers
    • modoboa - A mail hosting and management platform including a modern Web UI.
    • salmon - A Python Mail Server.
  • Clients
    • imbox - Python IMAP for Humans.
    • yagmail - Yet another Gmail/SMTP client.
  • Others
    • flanker - An email address and Mime parsing library.
    • mailer - High-performance extensible mail delivery framework.

Environment Management

Libraries for Python version and virtual environment management.

  • pyenv - Simple Python version management.
  • virtualenv - A tool to create isolated Python environments.

File Manipulation

Libraries for file manipulation.

  • mimetypes - (Python standard library) Map filenames to MIME types.
  • pathlib - (Python standard library) An cross-platform, object-oriented path library.
  • path.py - A module wrapper for os.path.
  • python-magic - A Python interface to the libmagic file type identification library.
  • watchdog - API and shell utilities to monitor file system events.

Functional Programming

Functional Programming with Python.

  • coconut - A variant of Python built for simple, elegant, Pythonic functional programming.
  • funcy - A fancy and practical functional tools.
  • more-itertools - More routines for operating on iterables, beyond itertools.
  • returns - A set of type-safe monads, transformers, and composition utilities.
  • cytoolz - Cython implementation of Toolz: High performance functional utilities.
  • toolz - A collection of functional utilities for iterators, functions, and dictionaries.

GUI Development

Libraries for working with graphical user interface applications.

  • curses - Built-in wrapper for ncurses used to create terminal GUI applications.
  • Eel - A library for making simple Electron-like offline HTML/JS GUI apps.
  • enaml - Creating beautiful user-interfaces with Declarative Syntax like QML.
  • Flexx - Flexx is a pure Python toolkit for creating GUI's, that uses web technology for its rendering.
  • Gooey - Turn command line programs into a full GUI application with one line.
  • kivy - A library for creating NUI applications, running on Windows, Linux, Mac OS X, Android and iOS.
  • pyglet - A cross-platform windowing and multimedia library for Python.
  • PyGObject - Python Bindings for GLib/GObject/GIO/GTK+ (GTK+3).
  • PyQt - Python bindings for the Qt cross-platform application and UI framework.
  • PySimpleGUI - Wrapper for tkinter, Qt, WxPython and Remi.
  • pywebview - A lightweight cross-platform native wrapper around a webview component.
  • Tkinter - Tkinter is Python's de-facto standard GUI package.
  • Toga - A Python native, OS native GUI toolkit.
  • urwid - A library for creating terminal GUI applications with strong support for widgets, events, rich colors, etc.
  • wxPython - A blending of the wxWidgets C++ class library with the Python.
  • DearPyGui - A Simple GPU accelerated Python GUI framework

GraphQL

Libraries for working with GraphQL.

  • graphene - GraphQL framework for Python.

Game Development

Awesome game development libraries.

  • Arcade - Arcade is a modern Python framework for crafting games with compelling graphics and sound.
  • Cocos2d - cocos2d is a framework for building 2D games, demos, and other graphical/interactive applications.
  • Harfang3D - Python framework for 3D, VR and game development.
  • Panda3D - 3D game engine developed by Disney.
  • Pygame - Pygame is a set of Python modules designed for writing games.
  • PyOgre - Python bindings for the Ogre 3D render engine, can be used for games, simulations, anything 3D.
  • PyOpenGL - Python ctypes bindings for OpenGL and it's related APIs.
  • PySDL2 - A ctypes based wrapper for the SDL2 library.
  • RenPy - A Visual Novel engine.

Geolocation

Libraries for geocoding addresses and working with latitudes and longitudes.

  • django-countries - A Django app that provides a country field for models and forms.
  • geodjango - A world-class geographic web framework.
  • geojson - Python bindings and utilities for GeoJSON.
  • geopy - Python Geocoding Toolbox.

HTML Manipulation

Libraries for working with HTML and XML.

  • beautifulsoup - Providing Pythonic idioms for iterating, searching, and modifying HTML or XML.
  • bleach - A whitelist-based HTML sanitization and text linkification library.
  • cssutils - A CSS library for Python.
  • html5lib - A standards-compliant library for parsing and serializing HTML documents and fragments.
  • lxml - A very fast, easy-to-use and versatile library for handling HTML and XML.
  • markupsafe - Implements a XML/HTML/XHTML Markup safe string for Python.
  • pyquery - A jQuery-like library for parsing HTML.
  • untangle - Converts XML documents to Python objects for easy access.
  • WeasyPrint - A visual rendering engine for HTML and CSS that can export to PDF.
  • xmldataset - Simple XML Parsing.
  • xmltodict - Working with XML feel like you are working with JSON.

HTTP Clients

Libraries for working with HTTP.

  • httpx - A next generation HTTP client for Python.
  • requests - HTTP Requests for Humans.
  • treq - Python requests like API built on top of Twisted's HTTP client.
  • urllib3 - A HTTP library with thread-safe connection pooling, file post support, sanity friendly.

Hardware

Libraries for programming with hardware.

  • keyboard - Hook and simulate global keyboard events on Windows and Linux.
  • mouse - Hook and simulate global mouse events on Windows and Linux.
  • pynput - A library to control and monitor input devices.
  • scapy - A brilliant packet manipulation library.

Image Processing

Libraries for manipulating images.

  • pillow - Pillow is the friendly PIL fork.
  • python-barcode - Create barcodes in Python with no extra dependencies.
  • pymatting - A library for alpha matting.
  • python-qrcode - A pure Python QR Code generator.
  • pywal - A tool that generates color schemes from images.
  • pyvips - A fast image processing library with low memory needs.
  • quads - Computer art based on quadtrees.
  • scikit-image - A Python library for (scientific) image processing.
  • thumbor - A smart imaging service. It enables on-demand crop, re-sizing and flipping of images.
  • wand - Python bindings for MagickWand, C API for ImageMagick.

Implementations

Implementations of Python.

  • cpython - Default, most widely used implementation of the Python programming language written in C.
  • cython - Optimizing Static Compiler for Python.
  • clpython - Implementation of the Python programming language written in Common Lisp.
  • ironpython - Implementation of the Python programming language written in C#.
  • micropython - A lean and efficient Python programming language implementation.
  • numba - Python JIT compiler to LLVM aimed at scientific Python.
  • peachpy - x86-64 assembler embedded in Python.
  • pypy - A very fast and compliant implementation of the Python language.
  • pyston - A Python implementation using JIT techniques.

Interactive Interpreter

Interactive Python interpreters (REPL).

Internationalization

Libraries for working with i18n.

  • Babel - An internationalization library for Python.
  • PyICU - A wrapper of International Components for Unicode C++ library (ICU).

Job Scheduler

Libraries for scheduling jobs.

  • Airflow - Airflow is a platform to programmatically author, schedule and monitor workflows.
  • APScheduler - A light but powerful in-process task scheduler that lets you schedule functions.
  • django-schedule - A calendaring app for Django.
  • doit - A task runner and build tool.
  • gunnery - Multipurpose task execution tool for distributed systems with web-based interface.
  • Joblib - A set of tools to provide lightweight pipelining in Python.
  • Plan - Writing crontab file in Python like a charm.
  • Prefect - A modern workflow orchestration framework that makes it easy to build, schedule and monitor robust data pipelines.
  • schedule - Python job scheduling for humans.
  • Spiff - A powerful workflow engine implemented in pure Python.
  • TaskFlow - A Python library that helps to make task execution easy, consistent and reliable.

Logging

Libraries for generating and working with logs.

  • logbook - Logging replacement for Python.
  • logging - (Python standard library) Logging facility for Python.
  • loguru - Library which aims to bring enjoyable logging in Python.
  • sentry-python - Sentry SDK for Python.
  • structlog - Structured logging made easy.

Machine Learning

Libraries for Machine Learning. Also see awesome-machine-learning.

  • gym - A toolkit for developing and comparing reinforcement learning algorithms.
  • H2O - Open Source Fast Scalable Machine Learning Platform.
  • Metrics - Machine learning evaluation metrics.
  • NuPIC - Numenta Platform for Intelligent Computing.
  • scikit-learn - The most popular Python library for Machine Learning.
  • Spark ML - Apache Spark's scalable Machine Learning library.
  • vowpal_porpoise - A lightweight Python wrapper for Vowpal Wabbit.
  • xgboost - A scalable, portable, and distributed gradient boosting library.
  • MindsDB - MindsDB is an open source AI layer for existing databases that allows you to effortlessly develop, train and deploy state-of-the-art machine learning models using standard queries.

Microsoft Windows

Python programming on Microsoft Windows.

  • Python(x,y) - Scientific-applications-oriented Python Distribution based on Qt and Spyder.
  • pythonlibs - Unofficial Windows binaries for Python extension packages.
  • PythonNet - Python Integration with the .NET Common Language Runtime (CLR).
  • PyWin32 - Python Extensions for Windows.
  • WinPython - Portable development environment for Windows ⅞.

Miscellaneous

Useful libraries or tools that don't fit in the categories above.

  • blinker - A fast Python in-process signal/event dispatching system.
  • boltons - A set of pure-Python utilities.
  • itsdangerous - Various helpers to pass trusted data to untrusted environments.
  • magenta - A tool to generate music and art using artificial intelligence.
  • pluginbase - A simple but flexible plugin system for Python.
  • tryton - A general purpose business framework.

Natural Language Processing

Libraries for working with human languages.

  • General
    • gensim - Topic Modeling for Humans.
    • langid.py - Stand-alone language identification system.
    • nltk - A leading platform for building Python programs to work with human language data.
    • pattern - A web mining module.
    • polyglot - Natural language pipeline supporting hundreds of languages.
    • pytext - A natural language modeling framework based on PyTorch.
    • PyTorch-NLP - A toolkit enabling rapid deep learning NLP prototyping for research.
    • spacy - A library for industrial-strength natural language processing in Python and Cython.
    • Stanza - The Stanford NLP Group's official Python library, supporting 60+ languages.
  • Chinese
    • funNLP - A collection of tools and datasets for Chinese NLP.
    • jieba - The most popular Chinese text segmentation library.
    • pkuseg-python - A toolkit for Chinese word segmentation in various domains.
    • snownlp - A library for processing Chinese text.

Network Virtualization

Tools and libraries for Virtual Networking and SDN (Software Defined Networking).

  • mininet - A popular network emulator and API written in Python.
  • napalm - Cross-vendor API to manipulate network devices.
  • pox - A Python-based SDN control applications, such as OpenFlow SDN controllers.

News Feed

Libraries for building user's activities.

ORM

Libraries that implement Object-Relational Mapping or data mapping techniques.

  • Relational Databases
    • Django Models - The Django ORM.
    • SQLAlchemy - The Python SQL Toolkit and Object Relational Mapper.
    • dataset - Store Python dicts in a database - works with SQLite, MySQL, and PostgreSQL.
    • orator - The Orator ORM provides a simple yet beautiful ActiveRecord implementation.
    • orm - An async ORM.
    • peewee - A small, expressive ORM.
    • pony - ORM that provides a generator-oriented interface to SQL.
    • pydal - A pure Python Database Abstraction Layer.
  • NoSQL Databases
    • hot-redis - Rich Python data types for Redis.
    • mongoengine - A Python Object-Document-Mapper for working with MongoDB.
    • PynamoDB - A Pythonic interface for Amazon DynamoDB.
    • redisco - A Python Library for Simple Models and Containers Persisted in Redis.

Package Management

Libraries for package and dependency management.

  • pip - The package installer for Python.
    • pip-tools - A set of tools to keep your pinned Python dependencies fresh.
    • PyPI
  • conda - Cross-platform, Python-agnostic binary package manager.
  • poetry - Python dependency management and packaging made easy.
  • uv - An extremely fast Python package and project manager, written in Rust.

Package Repositories

Local PyPI repository server and proxies.

  • bandersnatch - PyPI mirroring tool provided by Python Packaging Authority (PyPA).
  • devpi - PyPI server and packaging/testing/release tool.
  • localshop - Local PyPI server (custom packages and auto-mirroring of pypi).
  • warehouse - Next generation Python Package Repository (PyPI).

Penetration Testing

Frameworks and tools for penetration testing.

  • fsociety - A Penetration testing framework.
  • setoolkit - A toolkit for social engineering.
  • sqlmap - Automatic SQL injection and database takeover tool.

Permissions

Libraries that allow or deny users access to data or functionality.

  • django-guardian - Implementation of per object permissions for Django 1.2+
  • django-rules - A tiny but powerful app providing object-level permissions to Django, without requiring a database.

Processes

Libraries for starting and communicating with OS processes.

Recommender Systems

Libraries for building recommender systems.

  • annoy - Approximate Nearest Neighbors in C++/Python optimized for memory usage.
  • fastFM - A library for Factorization Machines.
  • implicit - A fast Python implementation of collaborative filtering for implicit datasets.
  • libffm - A library for Field-aware Factorization Machine (FFM).
  • lightfm - A Python implementation of a number of popular recommendation algorithms.
  • spotlight - Deep recommender models using PyTorch.
  • Surprise - A scikit for building and analyzing recommender systems.
  • tensorrec - A Recommendation Engine Framework in TensorFlow.

Refactoring

Refactoring tools and libraries for Python

  • Bicycle Repair Man - Bicycle Repair Man, a refactoring tool for Python.
  • Bowler - Safe code refactoring for modern Python.
  • Rope - Rope is a python refactoring library.

RESTful API

Libraries for building RESTful APIs.

  • Django
  • Flask
    • eve - REST API framework powered by Flask, MongoDB and good intentions.
    • flask-api - Browsable Web APIs for Flask.
    • flask-restful - Quickly building REST APIs for Flask.
  • Pyramid
    • cornice - A RESTful framework for Pyramid.
  • Framework agnostic
    • falcon - A high-performance framework for building cloud APIs and web app backends.
    • fastapi - A modern, fast, web framework for building APIs with Python 3.6+ based on standard Python type hints.
    • hug - A Python 3 framework for cleanly exposing APIs.
    • sandman2 - Automated REST APIs for existing database-driven systems.
    • sanic - A Python 3.6+ web server and web framework that's written to go fast.

Robotics

Libraries for robotics.

  • PythonRobotics - This is a compilation of various robotics algorithms with visualizations.
  • rospy - This is a library for ROS (Robot Operating System).

RPC Servers

RPC-compatible servers.

  • RPyC (Remote Python Call) - A transparent and symmetric RPC library for Python
  • zeroRPC - zerorpc is a flexible RPC implementation based on ZeroMQ and MessagePack.

Science

Libraries for scientific computing. Also see Python-for-Scientists.

  • astropy - A community Python library for Astronomy.
  • bcbio-nextgen - Providing best-practice pipelines for fully automated high throughput sequencing analysis.
  • bccb - Collection of useful code related to biological analysis.
  • Biopython - Biopython is a set of freely available tools for biological computation.
  • cclib - A library for parsing and interpreting the results of computational chemistry packages.
  • Colour - Implementing a comprehensive number of colour theory transformations and algorithms.
  • Karate Club - Unsupervised machine learning toolbox for graph structured data.
  • NetworkX - A high-productivity software for complex networks.
  • NIPY - A collection of neuroimaging toolkits.
  • NumPy - A fundamental package for scientific computing with Python.
  • ObsPy - A Python toolbox for seismology.
  • Open Babel - A chemical toolbox designed to speak the many languages of chemical data.
  • PyDy - Short for Python Dynamics, used to assist with workflow in the modeling of dynamic motion.
  • PyMC - Markov Chain Monte Carlo sampling toolkit.
  • QuTiP - Quantum Toolbox in Python.
  • RDKit - Cheminformatics and Machine Learning Software.
  • SciPy - A Python-based ecosystem of open-source software for mathematics, science, and engineering.
  • SimPy - A process-based discrete-event simulation framework.
  • statsmodels - Statistical modeling and econometrics in Python.
  • SymPy - A Python library for symbolic mathematics.
  • Zipline - A Pythonic algorithmic trading library.

Libraries and software for indexing and performing search queries on data.

Serialization

Libraries for serializing complex data types

Serverless Frameworks

Frameworks for developing serverless Python code.

  • python-lambda - A toolkit for developing and deploying Python code in AWS Lambda.
  • Zappa - A tool for deploying WSGI applications on AWS Lambda and API Gateway.

Shell

Shells based on Python.

  • xonsh - A Python-powered, cross-platform, Unix-gazing shell language and command prompt.

Specific Formats Processing

Libraries for parsing and manipulating specific text formats.

  • General
    • tablib - A module for Tabular Datasets in XLS, CSV, JSON, YAML.
  • Office
    • docxtpl - Editing a docx document by jinja2 template
    • openpyxl - A library for reading and writing Excel 2010 xlsx/xlsm/xltx/xltm files.
    • pyexcel - Providing one API for reading, manipulating and writing csv, ods, xls, xlsx and xlsm files.
    • python-docx - Reads, queries and modifies Microsoft Word 2007/2008 docx files.
    • python-pptx - Python library for creating and updating PowerPoint (.pptx) files.
    • unoconv - Convert between any document format supported by LibreOffice/OpenOffice.
    • XlsxWriter - A Python module for creating Excel .xlsx files.
    • xlwings - A BSD-licensed library that makes it easy to call Python from Excel and vice versa.
    • xlwt / xlrd - Writing and reading data and formatting information from Excel files.
  • PDF
    • pdfminer.six - Pdfminer.six is a community maintained fork of the original PDFMiner.
    • PyPDF2 - A library capable of splitting, merging and transforming PDF pages.
    • ReportLab - Allowing Rapid creation of rich PDF documents.
  • Markdown
    • Mistune - Fastest and full featured pure Python parsers of Markdown.
    • Python-Markdown - A Python implementation of John Gruber’s Markdown.
  • YAML
    • PyYAML - YAML implementations for Python.
  • CSV
    • csvkit - Utilities for converting to and working with CSV.
  • Archive
    • unp - A command line tool that can unpack archives easily.

Static Site Generator

Static site generator is a software that takes some text + templates as input and produces HTML files on the output.

  • lektor - An easy to use static CMS and blog engine.
  • mkdocs - Markdown friendly documentation generator.
  • makesite - Simple, lightweight, and magic-free static site/blog generator (< 130 lines).
  • nikola - A static website and blog generator.
  • pelican - Static site generator that supports Markdown and reST syntax.

Tagging

Libraries for tagging items.

Task Queues

Libraries for working with task queues.

  • celery - An asynchronous task queue/job queue based on distributed message passing.
  • dramatiq - A fast and reliable background task processing library for Python 3.
  • huey - Little multi-threaded task queue.
  • mrq - A distributed worker task queue in Python using Redis & gevent.
  • rq - Simple job queues for Python.

Template Engine

Libraries and tools for templating and lexing.

  • Genshi - Python templating toolkit for generation of web-aware output.
  • Jinja2 - A modern and designer friendly templating language.
  • Mako - Hyperfast and lightweight templating for the Python platform.

Testing

Libraries for testing codebases and generating test data.

  • Testing Frameworks
    • hypothesis - Hypothesis is an advanced Quickcheck style property based testing library.
    • nose2 - The successor to nose, based on `unittest2.
    • pytest - A mature full-featured Python testing tool.
    • Robot Framework - A generic test automation framework.
    • unittest - (Python standard library) Unit testing framework.
  • Test Runners
    • green - A clean, colorful test runner.
    • mamba - The definitive testing tool for Python. Born under the banner of BDD.
    • tox - Auto builds and tests distributions in multiple Python versions
  • GUI / Web Testing
    • locust - Scalable user load testing tool written in Python.
    • PyAutoGUI - PyAutoGUI is a cross-platform GUI automation Python module for human beings.
    • Schemathesis - A tool for automatic property-based testing of web applications built with Open API / Swagger specifications.
    • Selenium - Python bindings for Selenium WebDriver.
    • sixpack - A language-agnostic A/B Testing framework.
    • splinter - Open source tool for testing web applications.
  • Mock
    • doublex - Powerful test doubles framework for Python.
    • freezegun - Travel through time by mocking the datetime module.
    • httmock - A mocking library for requests for Python 2.6+ and 3.2+.
    • httpretty - HTTP request mock tool for Python.
    • mock - (Python standard library) A mocking and patching library.
    • mocket - A socket mock framework with gevent/asyncio/SSL support.
    • responses - A utility library for mocking out the requests Python library.
    • VCR.py - Record and replay HTTP interactions on your tests.
  • Object Factories
    • factory_boy - A test fixtures replacement for Python.
    • mixer - Another fixtures replacement. Supports Django, Flask, SQLAlchemy, Peewee and etc.
    • model_mommy - Creating random fixtures for testing in Django.
  • Code Coverage
  • Fake Data
    • fake2db - Fake database generator.
    • faker - A Python package that generates fake data.
    • mimesis - is a Python library that help you generate fake data.
    • radar - Generate random datetime / time.

Text Processing

Libraries for parsing and manipulating plain texts.

  • General
    • chardet - Python ⅔ compatible character encoding detector.
    • difflib - (Python standard library) Helpers for computing deltas.
    • ftfy - Makes Unicode text less broken and more consistent automagically.
    • fuzzywuzzy - Fuzzy String Matching.
    • Levenshtein - Fast computation of Levenshtein distance and string similarity.
    • pangu.py - Paranoid text spacing.
    • pyfiglet - An implementation of figlet written in Python.
    • pypinyin - Convert Chinese hanzi (漢字) to pinyin (拼音).
    • textdistance - Compute distance between sequences with 30+ algorithms.
    • unidecode - ASCII transliterations of Unicode text.
  • Slugify
    • awesome-slugify - A Python slugify library that can preserve unicode.
    • python-slugify - A Python slugify library that translates unicode to ASCII.
    • unicode-slugify - A slugifier that generates unicode slugs with Django as a dependency.
  • Unique identifiers
    • hashids - Implementation of hashids in Python.
    • shortuuid - A generator library for concise, unambiguous and URL-safe UUIDs.
  • Parser
    • ply - Implementation of lex and yacc parsing tools for Python.
    • pygments - A generic syntax highlighter.
    • pyparsing - A general purpose framework for generating parsers.
    • python-nameparser - Parsing human names into their individual components.
    • python-phonenumbers - Parsing, formatting, storing and validating international phone numbers.
    • python-user-agents - Browser user agent parser.
    • sqlparse - A non-validating SQL parser.

Third-party APIs

Libraries for accessing third party services APIs. Also see List of Python API Wrappers and Libraries.

URL Manipulation

Libraries for parsing URLs.

  • furl - A small Python library that makes parsing and manipulating URLs easy.
  • purl - A simple, immutable URL class with a clean API for interrogation and manipulation.
  • pyshorteners - A pure Python URL shortening lib.
  • webargs - A friendly library for parsing HTTP request arguments with built-in support for popular web frameworks.

Video

Libraries for manipulating video and GIFs.

  • moviepy - A module for script-based movie editing with many formats, including animated GIFs.
  • scikit-video - Video processing routines for SciPy.
  • vidgear - Most Powerful multi-threaded Video Processing framework.

Web Asset Management

Tools for managing, compressing and minifying website assets.

  • django-compressor - Compresses linked and inline JavaScript or CSS into a single cached file.
  • django-pipeline - An asset packaging library for Django.
  • django-storages - A collection of custom storage back ends for Django.
  • fanstatic - Packages, optimizes, and serves static file dependencies as Python packages.
  • fileconveyor - A daemon to detect and sync files to CDNs, S3 and FTP.
  • flask-assets - Helps you integrate webassets into your Flask app.
  • webassets - Bundles, optimizes, and manages unique cache-busting URLs for static resources.

Web Content Extracting

Libraries for extracting web contents.

  • html2text - Convert HTML to Markdown-formatted text.
  • lassie - Web Content Retrieval for Humans.
  • micawber - A small library for extracting rich content from URLs.
  • newspaper - News extraction, article extraction and content curation in Python.
  • python-readability - Fast Python port of arc90's readability tool.
  • requests-html - Pythonic HTML Parsing for Humans.
  • sumy - A module for automatic summarization of text documents and HTML pages.
  • textract - Extract text from any document, Word, PowerPoint, PDFs, etc.
  • toapi - Every web site provides APIs.

Web Crawling

Libraries to automate web scraping.

  • feedparser - Universal feed parser.
  • grab - Site scraping framework.
  • mechanicalsoup - A Python library for automating interaction with websites.
  • scrapy - A fast high-level screen scraping and web crawling framework.

Web Frameworks

Traditional full stack web frameworks. Also see RESTful API.

WebSocket

Libraries for working with WebSocket.

  • autobahn-python - WebSocket & WAMP for Python on Twisted and asyncio.
  • channels - Developer-friendly asynchrony for Django.
  • websockets - A library for building WebSocket servers and clients with a focus on correctness and simplicity.

WSGI Servers

WSGI-compatible web servers.

  • gunicorn - Pre-forked, ported from Ruby's Unicorn project.
  • uwsgi - A project aims at developing a full stack for building hosting services, written in C.
  • waitress - Multi-threaded, powers Pyramid.
  • werkzeug - A WSGI utility library for Python that powers Flask and can easily be embedded into your own projects.

Resources

Where to discover learning resources or new Python libraries.

Newsletters

Podcasts

Contributing

Your contributions are always welcome! Please take a look at the contribution guidelines first.


If you have any question about this opinionated list, do not hesitate to contact me @VintaChen on Twitter or open an issue on GitHub.

Source

BibTeX Generator

Have you ever found yourself weary and uninspired from the tedious task of manually creating BibTeX entries for your paper?

There are, indeed, support tools and plugins that are bundled with reference managers such as Zotero, Mendeley, etc. These tools can automate the generation of a .bib file. To use them, you need to install a reference manager, its associated plugins, and a library of papers on your computer. However, these tools are not flawless. The BibTeX entries they generate often contain incomplete information, are poorly formatted, and include numerous unnecessary fields. You then still need to manually check and correct the entries.

There are the times you just need to cite a paper or two, and you don't want to go through the hassle of the aforementioned complex process. In such situations, a simple tool that allows you to quickly copy and paste a BibTeX entry into your .bib file would be ideal. Think of such a simple tool, I have looked around the Chrome extension store to see if there is any that can pick up the Bibtex while you are browsing the paper. I found some, but they do not really work.

Therefore, I decided to create my own tool to address this dilemma. I developed a Chrome extension that can generate the BibTeX entry for any browsing URL with just one click. I named it the 1click BibTeX. It delivers exactly what it is expected and has proven to be quite helpful. This extension, along with the Latex tools, will ensure that the manuscript's citations are properly formatted before they are delivered to the journal.

Usage

Install the 1click BibTeX extension on your Chrome browser. Then, whenever you're browsing a paper or any URL, just click on the extension icon, and the BibTeX entry will be instantly generated and copied to your clipboard. The remaining thing is just paste it to your .bib file.

BibTeX generator

I've tested the extension on numerous publishers and websites with varying structures and it works consistently as it was designed. The tested publishers include Elsevier, Wiley, ACS, IOP, AIP, APS, arXiv,...

Below are some examples of BibTeX entries generated by the extension 1click BibTeX:

@article{nguyen2019pattern,
    title = {Pattern transformation induced by elastic instability of metallic porous structures},
    author = {Cao Thang Nguyen and Duc Tam Ho and Seung Tae Choi and Doo-Man Chun and Sung Youb Kim },
    year = {2019},
    month = {2},
    journal = {Computational Materials Science},
    publisher = {Elsevier},
    volume = {157},
    pages = {17-24},
    doi = {10.1016/j.commatsci.2018.10.023},
    url = {https://www.sciencedirect.com/science/article/abs/pii/S0927025618306955?via%3Dihub},
    accessDate = {Jan 25, 2024}
}
@article{nguyen2024an,
    title = {An Enhanced Sampling Approach for Computing the Free Energy of Solid Surface and Solid–Liquid Interface},
    author = {Cao Thang Nguyen and Duc Tam Ho and Sung Youb Kim},
    year = {2024},
    month = {1},
    journal = {Advanced Theory and Simulations},
    publisher = {John Wiley & Sons, Ltd},
    volume = {7},
    number = {1},
    pages = {2300538},
    doi = {10.1002/adts.202300538},
    url = {https://onlinelibrary.wiley.com/doi/10.1002/adts.202300538},
    accessDate = {Jan 25, 2024}
}
@book{daum2003america,,
    title = {America, the Vietnam War, and the World},
    author = {Andreas W. Daum and Lloyd C. Gardner and Wilfried Mausbach},
    year = {2003},
    month = {7},
    publisher = {Cambridge University Press},
    isbn = {052100876X},
    url = {https://www.google.co.kr/books/edition/America_the_Vietnam_War_and_the_World/9kn6qYwsGs4C?hl=en&gbpv=0},
    accessDate = {Jan 25, 2024}
}
@book{rickards2011currency,
    title = {Currency Wars},
    author = {James Rickards},
    year = {2011},
    month = {11},
    publisher = {Penguin},
    isbn = {110155889X},
    url = {https://books.google.co.kr/books?id=-GDwL2s5sJoC&source=gbs_book_other_versions},
    accessDate = {Jan 25, 2024}
}
@misc{deci2024introducing,
    title = {Introducing DeciCoder-6B: The Best Multi-Language Code LLM in Its Class},
    author = {Deci},
    year = {2024},
    month = {1},
    publisher = {Deci},
    url = {https://deci.ai/blog/decicoder-6b-the-best-multi-language-code-generation-llm-in-its-class/},
    accessDate = {Jan 25, 2024}
}
@misc{kai2023forcefield,
    title = {Force-field files for "Noble gas (He, Ne and Ar) solubilities in high-pressure silicate melts calculated based on deep potential modeling"},
    author = {Wang, Kai and Lu, Xiancai and Liu, Xiandong and Yin, Kun},
    year = {2023},
    month = {3},
    publisher = {Zenodo},
    doi = {10.5281/zenodo.7751762},
    url = {https://zenodo.org/records/7751762},
    accessDate = {Jan 25, 2024}
}
  • Bibtex this page
@misc{nguyen2024bibtex,
    title = {BibTeX Generator},
    author = {Cao Thang Nguyen},
    year = {2024},
    month = {1},
    url = {https://thangckt.github.io/blog/2024/01/25/bibtex_generator},
    accessDate = {Jan 25, 2024}
}

In summary, the new extension 1click BibTeX works well for most websites with varying data structures.

Accelerated Molecular Simulation Using Deep Potential Workflow with NGC

Credit: NVIDIA's blog

Molecular simulation communities have faced the accuracy-versus-efficiency dilemma in modeling the potential energy surface and interatomic forces for decades. Deep Potential, the artificial neural network force field, solves this problem by combining the speed of classical molecular dynamics (MD) simulation with the accuracy of density functional theory (DFT) calculation.1 This is achieved by using the GPU-optimized package DeePMD-kit, which is a deep learning package for many-body potential energy representation and MD simulation.2

This post provides an end-to-end demonstration of training a neural network potential for the 2D material graphene and using it to drive MD simulation in the open-source platform Large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS).3 Training data can be obtained either from the Vienna Ab initio Simulation Package (VASP)4, or Quantum ESPRESSO (QE).5

A seamless integration of molecular modeling, machine learning, and high-performance computing (HPC) is demonstrated with the combined efficiency of molecular dynamics with ab initio accuracy — that is entirely driven through a container-based workflow. Using AI techniques to fit the interatomic forces generated by DFT, the accessible time and size scales can be boosted several orders of magnitude with linear scaling.

Deep potential is essentially a combination of machine learning and physical principles, which start a new computing paradigm as shown in Figure 1.

The image shows the new computing paradigm that combines molecular modeling, machine learning and high-performance computing to understand the interatomic forces of molecules compared to the traditional methods.


Figure 1. A new computing paradigm composed of molecular modeling, AI, and HPC. (Figure courtesy: Dr. Linfeng Zhang, DP Technology)

The entire workflow is shown in Figure 2. The data generation step is done with VASP and QE. The data preparation, model training, testing, and compression steps are done using DeePMD-kit. The model deployment is in LAMMPS.

This figure displays the workflow of training and deploying a deep potential model. The workflow includes data generation, data preparation, model training, model testing, model compression, and model deployment.


Figure 2. Diagram of the DeePMD workflow.

Why Containers?

A container is a portable unit of software that combines the application, and all its dependencies, into a single package that is agnostic to the underlying host OS.

The workflow in this post involves AIMD, DP training, and LAMMPS MD simulation. It is nontrivial and time-consuming to install each software package from source with the correct setup of the compiler, MPI, GPU library, and optimization flags.

Containers solve this problem by providing a highly optimized GPU-enabled computing environment for each step, and eliminates the time to install and test software.

The NGC catalog, a hub of GPU-optimized HPC and AI software, carries a whole of HPC and AI containers that can be readily deployed on any GPU system. The HPC and AI containers from the NGC catalog are updated frequently and are tested for reliability and performance — necessary to speed up the time to solution.

These containers are also scanned for Common Vulnerabilities and Exposure (CVEs), ensuring that they are devoid of any open ports and malware. Additionally, the HPC containers support both Docker and Singularity runtimes, and can be deployed on multi-GPU and multinode systems running in the cloud or on-premises.

Training data generation

The first step in the simulation is data generation. We will show you how you can use VASP and Quantum ESPRESSO to run AIMD simulations and generate training datasets for DeePMD. All input files can be downloaded from the GitHub repository using the following command:

git clone https://github.com/deepmodeling/SC21_DP_Tutorial.git

VASP

A two-dimensional graphene system with 98-atoms is used as shown in Figure 3.6 To generate the training datasets, 0.5ps NVT AIMD simulation at 300 K is performed. The time step chosen is 0.5fs. The DP model is created using 1000 time steps from a 0.5ps MD trajectory at a fixed temperature.

Due to the short simulation time, the training dataset contains consecutive system snapshots, which are highly correlated. Generally, the training dataset should be sampled from uncorrelated snapshots with various system conditions and configurations. For this example, we used a simplified training data scheme. For production DP training, using DP-GEN is recommended to utilize the concurrent learning scheme to efficiently explore more combinations of conditions.7

The projector-augmented wave pseudopotentials are employed to describe the interactions between the valence electrons and frozen cores. The generalized gradient approximation exchange−correlation functional of Perdew−Burke−Ernzerhof. Only the Γ-point was used for k-space sampling in all systems.

This figure displays the top view of a single layer graphene system with 98 carbon atoms.


Figure 3. A graphene system composed of 98 carbon atoms is used in AIMD simulation.

Quantum Espresso

The AIMD simulation can also be carried out using Quantum ESPRESSO, available as a container from the NGC Catalog. Quantum ESPRESSO is an integrated suite of open-source computer codes for electronic-structure calculations and materials modeling at the nanoscale based on density-functional theory, plane waves, and pseudopotentials. The same graphene structure is used in the QE calculations. The following command can be used to start the AIMD simulation:

$ singularity exec --nv docker://nvcr.io/hpc/quantum_espresso:qe-6.8 cp.x
< c.md98.cp.in

Training data preparation

Once the training data is obtained from AIMD simulation, we want to convert its format using dpdata so that it can be used as input to the deep neural network. The dpdata package is a format conversion toolkit between AIMD, classical MD, and DeePMD-kit.

You can use the convenient tool dpdata to convert data directly from the output of first-principles packages to the DeePMD-kit format. For deep potential training, the following information of a physical system has to be provided: atom type, box boundary, coordinate, force, viral, and system energy.

A snapshot, or a frame of the system, contains all these data points for all atoms at one-time step, which can be stored in two formats, that is raw and npy.

The first format raw is plain text with all information in one file, and each line of the file represents a snapshot. Different system information is stored in different files named as box.raw, coord.raw, force.raw, energy.raw, and virial.raw. We recommended you follow these naming conventions when preparing the training files.

An example of force.raw:

$ cat force.raw
-0.724  2.039 -0.951  0.841 -0.464  0.363
 6.737  1.554 -5.587 -2.803  0.062  2.222
-1.968 -0.163  1.020 -0.225 -0.789  0.343

This force.raw contains three frames, with each frame having the forces of two atoms, resulting in three lines and six columns. Each line provides all three force components of two atoms in one frame. The first three numbers are the three force components of the first atom, while the next three numbers are the force components of the second atom.

The coordinate file coord.raw is organized similarly. In box.raw, the nine components of the box vectors should be provided on each line. In virial.raw, the nine components of the virial tensor should be provided on each line in the order XX XY XZ YX YY YZ ZX ZY ZZ. The number of lines of all raw files should be identical. We assume that the atom types do not change in all frames. It is provided by type.raw, which has one line with the types of atoms written one by one.

The atom types should be integers. For example, the type.raw of a system that has two atoms with zero and one:

$ cat type.raw
0 1

It is not a requirement to convert the data format to raw, but this process should give a sense on the types of data that can be used as inputs to DeePMD-kit for training.

The easiest way to convert the first-principles results to the training data is to save them as numpy binary data.

For VASP output, we have prepared an outcartodata.py script to process the VASP OUTCAR file. By running the commands:

$ cd SC21_DP_Tutorial/AIMD/VASP/
$ singularity exec --nv docker://nvcr.io/hpc/deepmd-kit:v2.0.3 python outcartodata.py
$ mv deepmd_data ../../DP/

For QE output:

$ cd SC21_DP_Tutorial/AIMD/QE/
$ singularity exec --nv docker://nvcr.io/hpc/deepmd-kit:v2.0.3 python logtodata.py
$ mv deepmd_data ../../DP/

A folder called deepmd_data is generated and moved to the training directory. It generates five sets 0/set.000, 1/set.000, 2/set.000, 3/set.000, 4/set.000, with each set containing 200 frames. It is not required to take care of the binary data files in each of the set.* directories. The path containing the set.* folder and type.raw file is called a system. If you want to train a nonperiodic system, an empty nopbc file should be placed under the system directory. box.raw is not necessary as it is a nonperiodic system.

We are going to use three of the five sets for training, one for validating, and the remaining one for testing.

Deep Potential model training

The input of the deep potential model is a descriptor vector containing the system information mentioned previously. The neural network contains several hidden layers with a composition of linear and nonlinear transformations. In this post, a three layer-neural network with 25, 50 and 100 neurons in each layer is used. The target value, or the label, for the neural network to learn is the atomic energies. The training process optimizes the weights and the bias vectors by minimizing the loss function.

The training is initiated by the command where input.json contains the training parameters:

$ singularity exec --nv docker://nvcr.io/hpc/deepmd-kit:v2.0.3 dp train input.json

The DeePMD-kit prints detailed information on the training and validation data sets. The data sets are determined by training_data and validation_data as defined in the training section of the input script. The training dataset is composed of three data systems, while the validation data set is composed of one data system. The number of atoms, batch size, number of batches in the system, and the probability of using the system are all shown in Figure 4. The last column presents if the periodic boundary condition is assumed for the system.

This image is a screenshot of the DP training output. Summaries of the training and validation dataset are shown with detailed information on the number of atoms, batch size, number of batches in the system and the probability of using the system.


Figure 4. Screenshot of the DP training output.

During the training, the error of the model is tested every disp_freq training step with the batch used to train the model and with numb_btch batches from the validating data. The training error and validation error are printed correspondingly in the file disp_file (default is lcurve.out). The batch size can be set in the input script by the key batch_size in the corresponding sections for training and validation data set.

An example of the output:

#  step      rmse_val    rmse_trn    rmse_e_val  rmse_e_trn    rmse_f_val  rmse_f_trn         lr
      0      3.33e+01    3.41e+01      1.03e+01    1.03e+01      8.39e-01    8.72e-01    1.0e-03
    100      2.57e+01    2.56e+01      1.87e+00    1.88e+00      8.03e-01    8.02e-01    1.0e-03
    200      2.45e+01    2.56e+01      2.26e-01    2.21e-01      7.73e-01    8.10e-01    1.0e-03
    300      1.62e+01    1.66e+01      5.01e-02    4.46e-02      5.11e-01    5.26e-01    1.0e-03
    400      1.36e+01    1.32e+01      1.07e-02    2.07e-03      4.29e-01    4.19e-01    1.0e-03
    500      1.07e+01    1.05e+01      2.45e-03    4.11e-03      3.38e-01    3.31e-01    1.0e-03

The training error reduces monotonically with training steps as shown in Figure 5. The trained model is tested on the test dataset and compared with the AIMD simulation results. The test command is:

$ singularity exec --nv docker://nvcr.io/hpc/deepmd-kit:v2.0.3 dp test -m frozen_model.pb -s deepmd_data/4/ -n 200 -d detail.out

This image shows the total training loss, energy loss, force loss and learning rate decay with training steps from 0 to 1,000,000. Both the training and validation loss decrease monotonically with training steps.


Figure 5. Training loss with steps

The results are shown in Figure 6.

This image displays the inferenced energy and force in the y-axis, and the ground true on the x-axis. The inferenced values soundly coincide with the ground truth with all data distributed in the diagonal direction.


Figure 6. Test of the prediction accuracy of trained DP model with AIMD energies and forces.

Model export and compression

After the model has been trained, a frozen model is generated for inference in MD simulation. The process of saving neural network from a checkpoint is called “freezing” a model:

$ singularity exec --nv docker://nvcr.io/hpc/deepmd-kit:v2.0.3 dp freeze -o graphene.pb

After the frozen model is generated, the model can be compressed without sacrificing its accuracy; while greatly speeding up the inference performance in MD. Depending on simulation and training setup, model compression can boost performance by 10X, and reduce memory consumption by 20X when running on GPUs.

The frozen model can be compressed using the following command where -i refers to the frozen model and -o points to the output name of the compressed model:

$ singularity exec --nv docker://nvcr.io/hpc/deepmd-kit:v2.0.3 dp compress -i graphene.pb -o graphene-compress.pb

Model deployment in LAMMPS

A new pair-style has been implemented in LAMMPS to deploy the trained neural network in prior steps. For users familiar with the LAMMPS workflow, only minimal changes are needed to switch to deep potential. For instance, a traditional LAMMPS input with Tersoff potential has the following setting for potential setup:

pair_style      tersoff
pair_coeff      * * BNC.tersoff C

To use deep potential, replace previous lines with:

pair_style      deepmd graphene-compress.pb
pair_coeff      * *

The pair_style command in the input file uses the DeePMD model to describe the atomic interactions in the graphene system.

The graphene-compress.pb file represents the frozen and compressed model for inference. The graphene system in MD simulation contains 1,560 atoms. Periodic boundary conditions are applied in the lateral x– and y-directions, and free boundary is applied to the z-direction. The time step is set as 1 fs. The system is placed under NVT ensemble at temperature 300 K for relaxation, which is consistent with the AIMD setup. The system configuration after NVT relaxation is shown in Figure 7. It can be observed that the deep potential can describe the atomic structures with small ripples in the cross-plane direction. After 10ps NVT relaxation, the system is placed under NVE ensemble to check system stability.

The image displays the side view of the single layer graphene system after thermal relaxation in LAMMPS.


Figure 7. Atomic configuration of the graphene system after relaxation with deep potential.

The system temperature is shown in Figure 8.

The image displays the temperature profiles of the graphene system under NVT and NVE ensembles from 0 to 20 picoseconds. The first 10 picosecond is NVT and the second 10 picosecond is NVE.


Figure 8. System temperature under NVT and NVE ensembles. The MD system driven by deep potential is very stable after relaxation.

To validate the accuracy of the trained DP model, the calculated radial distribution function (RDF) from AIMD, DP and Tersoff, are plotted in Figure 9. The DP model-generated RDF is very close to that of AIMD, which indicates that the crystalline structure of graphene can be well presented by the DP model.

This image displays the plotted radial distribution function from three different methods, including DP, Tersoff and AIMD, which are denoted in black, red and blue solid lines respectively.


Figure 9. Radial distribution function calculated by AIMD, DP and Tersoff potential, respectively. It can be observed that the RDF calculated by DP is very close to that of AIMD.

Conclusion

This post demonstrates a simple case study of graphene under given conditions. The DeePMD-kit package streamlines the workflow from AIMD to classical MD with deep potential, providing the following key advantages:

Highly automatic and efficient workflow implemented in the TensorFlow framework. APIs with popular DFT and MD packages such as VASP, QE, and LAMMPS. Broad applications in organic molecules, metals, semiconductors, insulators, and more. Highly efficient code for HPC with MPI and GPU support. Modularization for easy adoption by other deep learning potential models. Furthermore, the use of GPU-optimized containers from the NGC catalog simplifies and accelerates the overall workflow by eliminating the steps to install and configure software. To train a comprehensive model for other applications, download the DeepMD Kit Container from the NGC catalog.

References

[1] Jia W, Wang H, Chen M, Lu D, Lin L, Car R, E W and Zhang L 2020 Pushing the limit of molecular dynamics with ab initio accuracy to 100 million atoms with machine learning IEEE Press 5 1-14

[2] Wang H, Zhang L, Han J and E W 2018 DeePMD-kit: A deep learning package for many-body potential energy representation and molecular dynamics Computer Physics Communications 228 178-84

[3] Plimpton S 1995 Fast Parallel Algorithms for Short-Range Molecular Dynamics Journal of Computational Physics 117 1-19

[4] Kresse G and Hafner J 1993 Ab initio molecular dynamics for liquid metals Physical Review B 47 558-61

[5] Giannozzi P, Baroni S, Bonini N, Calandra M, Car R, Cavazzoni C, Ceresoli D, Chiarotti G L, Cococcioni M, Dabo I, Dal Corso A, de Gironcoli S, Fabris S, Fratesi G, Gebauer R, Gerstmann U, Gougoussis C, Kokalj A, Lazzeri M, Martin-Samos L, Marzari N, Mauri F, Mazzarello R, Paolini S, Pasquarello A, Paulatto L, Sbraccia C, Scandolo S, Sclauzero G, Seitsonen A P, Smogunov A, Umari P and Wentzcovitch R M 2009 QUANTUM ESPRESSO: a modular and open-source software project for quantum simulations of materials Journal of Physics: Condensed Matter 21 395502

[6] Humphrey W, Dalke A and Schulten K 1996 VMD: Visual molecular dynamics Journal of Molecular Graphics 14 33-8

[7] Yuzhi Zhang, Haidi Wang, Weijie Chen, Jinzhe Zeng, Linfeng Zhang, Han Wang, and Weinan E, DP-GEN: A concurrent learning platform for the generation of reliable deep learning based potential energy models, Computer Physics Communications, 2020, 107206.