Author: Taylor Sparks
Date Uploaded: 2024-10-04
Content Type(s): Full Course, pdf course notes, example jupyter notebooks, Homework Assignments, and Final Project
Content Length: Semester
Content Audience: Undergraduate
Content Topics: Materials Informatics, Structure-property relationships, Data-driven discovery, Chemical space exploration, Feature engineering, Small datasets, Uncertainty quantification, Ensemble methods, Active learning, Transfer learning, Self-supervised learning, Composition-based feature vector (CBFV), Structure-based features, Crystal structure representations, Graph Neural Networks (GNNs), Message passing, Generative adversarial networks (GANs), Data augmentation, Inverse design, Diffusion models, Periodic lattices, Sparse graphs, Microstructure segmentation, Two-point statistics, Crystal graph neural networks (CGNNs), Machine learning tasks, Reinforcement learning, Pymatgen, Materials databases (ICSD, MP, OQMD), and Two-point statistics
Overview
- This semester length course will include materials genome initiative, historical materials discovery, the differences between traditional machine learning and materials informatics, materials data, repositories, tasks and types of machine learning, featurization via composition and structure, different algorithms (linear models, ensemble methods, support vector machines, gaussian process, deep learning, convolutional neural networks, variational autoencoders, generative adversarial networks, transformers and attention-based learning, etc.), overfitting, regularization, clustering, generative approaches to machine learning, best practices for model creation, training and validation, metrics, and case studies on successful applications of materials informatics. Students will learn through lecture and practice exercises using data from a variety of sources including experimental and computational using modern programming languages, code repositories, and APIs.
Learning Objectives
- Learning Objectives: (1) Students will access and work with materials data in a variety of formats and modalities.(2) Students will identify unique characteristics of materials informatics compared to traditional machine learning. (3) Students will featurize materials data including composition, crystal structure, microstructure, text. (4) Students will utilize data science best practices to construct and deploy machine learning models with diverse algorithms, scoring metrics, featurization, and tasks. (5) Students will use machine learning models to predict new materials
- Full course on Github and nanoHUB (semester long)
- course notes (PDFs from ppt slides)
- worked example notebooks for a huge number of informatics topics ranging from data access via API, featurization, bayesian optimization, hyperparameter tuning, segmentation, deep learning, to language models and more (downloadable, or operating in the browser without, no download/installation required!)
- Homework assignments and a final project combining Bayesian Optimization and an easy Jello optimization experiment
- Best practices articles and notebooks for getting started
Links