General Machine Learning Resources ================================== ###### tags: `chapter`, `ml`, `by-wasmer`, `shared`, `from-2020-12`, `evolving`, `awesome-list` [up <i class="fa fa-arrow-up"></i> Resources](https://iffmd.fz-juelich.de/WxOZ75GTTHu2MNmnw2u5pA#Resources) ## Description A list of general machine learning (ML) tools & resources. For connected resources lists, go up one level. This list is in the tradition of [best-of lists](https://github.com/best-of-lists/best-of) and [awesome lists](https://www.google.com/search?&q=awesome+list). Since there exist many ML lists like that already, the purpose of this list is to exist as to serve as merely as an extension to the [best-of-atomistic-machine-learning](https://iffmd.fz-juelich.de/68Vn-YcuSWCun3bL_rJHXA) list. ## Changelog ## Table of contents [TOC] ## Tags [tags: software-related](https://iffmd.fz-juelich.de/gxsOkPXATd-dR6r48ZFEiQ#Tags) [tags: atomistic simulation](https://iffmd.fz-juelich.de/0-NbOz2yRqa1mvHD3RtTTQ#Tags) tags: general machine learning | tag | meaning | connected | | ------- | -------------------------------- | --------- | | #AL | active learning | | | #BI | Bayesian inference | | | #BO | Bayesian optimization | | | #clust | clustering | #USL | | #dimred | dimension reduction | #USL | | #DiffM | diffusion model | #GenM | | #DL | deep learning | | | #FV | feature vector / representation | #rep | | #FSn | feature selection | #dimred | | #GCN | graph convolutional network | #GNN | | #GDL | geometric deep learning | #DL | | #GenM | generative models | | | #GNN | graph neural network | #DL | | #GP | Gaussian process | #BI | | #GPR | #GP regression | #GP | | #hypopt | hyperparemeter optimization | | | #KRR | Kernel ridge regression | #SL | | #KPCovR | Kernel #PCovR | #PCovR | | #ML | machine learning | | | #MLOps | MLOps | #ML | | #PCovR | principal covariates regression | #SL | | #PGM | (probabilistic) graphical models | #BI | | #rep | representation | #FV | | #repL | #rep learning | #rep | | #RL | reinforcement learning | | | #SL | supervised learning | | | #SSn | sample selection | #dimred | | #SML | shallow #ML | not #DL | | #SSL | self-supervised learning | #SL | | #USL | unsupervised learning | | [tags: atomistic machine learning](https://iffmd.fz-juelich.de/68Vn-YcuSWCun3bL_rJHXA#Tags) ## Other collections General machine learning - https://github.com/topics/machine-learning - https://github.com/topics/deep-learning -[deep.ai definitions](https://deepai.org/definitions): ML glossary with search function. General machine learning, theory - https://github.com/floodsung/Deep-Learning-Papers-Reading-Roadmap Graph learning #GNN #GDL - [naganandy/graph-based-deep-learning-literature](https://github.com/naganandy/graph-based-deep-learning-literature). Last update 2022. - [LirongWu/awesome-graph-self-supervised-learning](https://github.com/LirongWu/awesome-graph-self-supervised-learning). Last update 2022. - [github topic > tensor-networks](https://github.com/topics/tensor-networks) - [hibayesian/awesome-graph-learning-papers](https://github.com/hibayesian/awesome-graph-learning-papers). Last update 2020. #stale Diffusion models #DiffM - [heejkoo/Awesome-Diffusion-Models](https://github.com/heejkoo/Awesome-Diffusion-Models) ## Tools ### Tools for general ML #### Collections - [best-of-ml-python](https://github.com/ml-tooling/best-of-ml-python) - [awesome-datascience](https://github.com/academic/awesome-datascience) - [awesome-machine-learning](https://github.com/josephmisiti/awesome-machine-learning) - [awesome-deep-learning](https://github.com/ChristosChristofidis/awesome-deep-learning) #### scikit-learn [scikit-learn](https://scikit-learn.org/). - [scikit-learn ML map](https://scikit-learn.org/stable/tutorial/machine_learning_map/index.html): flowchart to find the correct scikit-learn model for your problem. Scikit-learn collections - https://github.com/scikit-learn-contrib - https://github.com/scikit-learn-contrib/scikit-learn-contrib - https://github.com/topics/scikit-learn - https://github.com/ml-tooling/best-of-ml-python#sklearn-utilities Scikit-learn-contrib tools, with > 1'000 stars, as of 2022-10. - https://github.com/scikit-learn-contrib/imbalanced-learn - https://github.com/scikit-learn-contrib/sklearn-pandas - https://github.com/scikit-learn-contrib/hdbscan - https://github.com/scikit-learn-contrib/category_encoders - https://github.com/scikit-learn-contrib/lightning - https://github.com/scikit-learn-contrib/metric-learn - https://github.com/scikit-learn-contrib/boruta_py Other scikit-learn tools - **scikit-learn-intelex**. "Intel(R) Extension for Scikit-learn is a seamless way to speed up your Scikit-learn application." [Repository](https://github.com/intel/scikit-learn-intelex), last update 2022. - For CPU. Also offers offloading to GPU, but see [system requirements](https://intel.github.io/scikit-learn-intelex/system-requirements.html) first. - **scikit-image**. "Image processing in Python". [Repository](https://github.com/scikit-image/scikit-image), last update 2022. - **mlxtend**. "A library of extension and helper modules for Python's data analysis and machine learning libraries.". [Repository](https://github.com/rasbt/mlxtend). - https://github.com/scikit-learn-contrib/imbalanced-learn ### Tools for supervised learning #SL #### Tools for boosting - [LightGBM](https://github.com/microsoft/LightGBM): LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed and efficient. - LightGBM compared to other gradient boosting libraries: [LightGBM > XGBoost](https://neptune.ai/blog/xgboost-vs-lightgbm), dated 2022-07. ### Tools for unsupervised learning #USL #### Tools for dimensionality reduction #dimred [Wikipedia: dimensionality reduction](https://en.wikipedia.org/wiki/Dimensionality_reduction). - t-SNE: [Laurens van der Maaten's t-SNE page](https://lvdmaaten.github.io/tsne/) (inventor). Be aware: [scikit-learn t-SNE](https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html) exists. - UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. [docs](https://umap-learn.readthedocs.io). [code](https://github.com/lmcinnes/umap). - [den-SNE / densMAP](http://cb.csail.mit.edu/cb/densvis/): density-preserving data visualization tools den-SNE and densMAP, augmenting the tools t-SNE and UMAP respectively. [code](https://github.com/hhcho/densvis). [paper](https://www.nature.com/articles/s41587-020-00801-7). ### Tools for probabilistic machine learning #BI, #PGM - **pgmpy**. "pgmpy is a python library for working with Probabilistic Graphical Models." "Python Library for learning (Structure and Parameter), inference (Probabilistic and Causal), and simulations in Bayesian Networks." [Original publication](https://conference.scipy.org/proceedings/scipy2015/pdfs/ankur_ankan.pdf), 2015. [Code](https://github.com/pgmpy/pgmpy), last update 2022. #dep-PyTorch ### Tools for ML optimization #hypopt, #AutoML Hyperparameter optimization, AutoML tools. - [GitHub > Topics > hyperparameter-optimization](https://github.com/topics/hyperparameter-optimization) - [Ray](https://github.com/ray-project/ray). Framework, runtime & libraries for distributed ML, scalable ML workloads. Last update 2022. #distributed - [Ray Tune](https://docs.ray.io/en/latest/tune/index.html). Hyperopt library for Ray. Last update 2022. #hyperopt - [FLAML](https://github.com/microsoft/FLAML). FLAML is a lightweight Python AutoML library that finds accurate machine learning models automatically, efficiently and economically. It frees users from selecting learners and hyperparameters for each learner. Last update 2022. - [optuna](https://github.com/optuna/optuna): Optuna is an automatic hyperparameter optimization software framework, particularly designed for machine learning. - [hyperopt](https://github.com/hyperopt/hyperopt): Python library for serial and parallel optimization over awkward search spaces, which may include real-valued, discrete, and conditional dimensions. [documentation](https://hyperopt.github.io/hyperopt/). Last update 2021. #stale - [hyperopt-sklearn](https://github.com/hyperopt/hyperopt-sklearn): hyperopt extension for scikit-learn. [documentation](https://hyperopt.github.io/hyperopt-sklearn/). Last update 2022. - Tutorial: [machinelearningmastery.com > hyperopt-sklearn tutorial](https://machinelearningmastery.com/hyperopt-for-automated-machine-learning-with-scikit-learn/). For non-high-dimensional hyperparameter search, [can use scikit-learn directly](https://scikit-learn.org/stable/modules/grid_search.html) instead. ### Tools for MLOps #MLOps Tools for data version control, model versioning, model registry, experiment tracking, MLOps. Collections: - https://github.com/topics/mlops - [kelvins/awesome-mlops](https://github.com/kelvins/awesome-mlops): A curated list of awesome MLOps tools. - [mlops.org](https://ml-ops.org/): Curated list of ML Ops articles. - [awesomeopensource.com > MLOps](https://awesomeopensource.com/projects/mlops): The Top ~100 Mlops Open Source Projects. Tools: - [mlflow](https://github.com/mlflow/mlflow): MLflow is a platform to streamline machine learning development, including tracking experiments, packaging code into reproducible runs, and sharing and deploying models. - [dvc](https://dvc.org/). Data Version Control (DVC) is a "Open-source Version Control System for Machine Learning Projects". #git-based - [cml](https://cml.dev/). Continuous Machine Learning (CML) is "CI/CD for Machine Learning Projects". Use GitLab or GitHub to manage ML experiments. #git-based - [mlem](https://mlem.ai/). "Open-source tool to simplify ML model deployment". Capture ML model metadata automatically, use Git as model registry. #git-based ### Tools for explainable ML #XAI Tools for interpretable ML, explainable AI (XAI). - [shap](https://github.com/slundberg/shap): SHAP (SHapley Additive exPlanations) is a game theoretic approach to explain the output of any machine learning model. It connects optimal credit allocation with local explanations using the classic Shapley values from game theory and their related extensions. - [dtreeviz](https://github.com/parrt/dtreeviz): A python library for decision tree visualization and model interpretation. Currently supports scikit-learn, XGBoost, Spark MLlib, and LightGBM trees. ### Tools for data processing - **ease.ml/datascope**. "Measuring data importance over ML pipelines using the Shapley value." [Code](https://github.com/easeml/datascope), last update 2022. By [ds3lab@ETHZ](https://ds3lab.inf.ethz.ch/). ### Unsorted - 230410: - https://github.com/scikit-activeml/scikit-activeml - https://github.com/modAL-python/modAL ## Learning ML ML basic theory & practice knowledge useful for starting on atomistic ML. ### ML Basics - **learnpytorch.io**. 2021, last update 2022. Instructors: [Daniel Bourke](https://www.mrdbourke.com/). Free stuff: Full course materials, first 25h of recordings, programming exercises. #DL #PyTorch #Jupyter #Colab #YouTube - [Course Homepage](https://www.learnpytorch.io/). - **NYU Deep Learning Course**. 2020, last update 2022. Instructors: Yann LeCun, Alfredo Canziani. Free stuff: Full course materials, recordings, programming exercises. #DL #PyTorch #Jupyter #Colab #YouTube. - [Course homepage](https://atcold.github.io/pytorch-Deep-Learning/). - [ACalziani's YouTube archive](https://atcold.github.io/youtube.html). - [Machine Learning for Scientists course 2021](https://ml-lectures.org). Compact course with exercises (notebooks), free public access. Developed by UZH Condensed matter theory / TUDelft Quantum matter and AI group, 2021. [arXiv publication](https://arxiv.org/abs/2102.04883). - [A high bias low-variance introduction to Machine Learning for physicists](https://github.com/drckf/mlreview_notebooks). Notebooks plus [theoretical introduction](https://www.sciencedirect.com/science/article/pii/S0370157319300766) for the theory on which the notebooks build. By Mehta et al. 2018/2019. Older homepages for the project [here](https://mgbukov.github.io/ml/2018/03/20/ML_review.html), [here](https://physics.bu.edu/~pankajm/MLnotebooks.html). - [mclguide.readthedocs.io](https://mclguide.readthedocs.io). Useful practical guide and quick reference for basic ML with `scikit-learn`. last updated 2019. [docs](https://mclguide.readthedocs.io). [code](https://bitbucket.org/pythondsp/machine-learning-guide). - ML lecture courses 2020 by [UofTübingen Cluster of Excellence Machine Learning for Science](https://uni-tuebingen.de/en/research/core-research/cluster-of-excellence-machine-learning/home/): complete/comprehensive set of [video lectures covering math for and ML](https://www.youtube.com/channel/UCupmCsCA5CFXmm31PkUhEbA/playlists?view=50&sort=dd&shelf_id=1). - [Mathematics for Machine Learning](https://www.youtube.com/playlist?list=PL05umP7R6ij1a6KdEy8PVE9zoCv6SlHRS) - [Essential Statistics](https://www.youtube.com/playlist?list=PL05umP7R6ij0Gw5SLIrOA1dMYScCx4oXT) - [Introduction to Machine Learning](https://www.youtube.com/playlist?list=PL05umP7R6ij35ShKLDqccJSDntugY4FQT) - [Statistical Machine Learning](https://www.youtube.com/playlist?list=PL05umP7R6ij2XCvrRzLokX6EoHWaGA2cC) - [Probabilistic Machine Learning](https://www.youtube.com/playlist?list=PL05umP7R6ij1tHaOFY96m5uX3J21a6yNd) - [Deep Learning](https://www.youtube.com/watch?v=OCHbm88xUGU&list=PL05umP7R6ij3NTWIdtMbfvX7Z-4WEXRqD)
{}