Related Projects¶
Below is a list of sister-projects, extensions and domain specific packages.
Interoperability and framework enhancements¶
These tools adapt scikit-learn for use with other technologies or otherwise enhance the functionality of scikit-learn’s estimators.
- ML Frontend provides dataset management and SVM fitting/prediction through web-based and programmatic interfaces.
 - sklearn_pandas bridge for scikit-learn pipelines and pandas data frame with dedicated transformers.
 - Scikit-Learn Laboratory A command-line wrapper around scikit-learn that makes it easy to run machine learning experiments with multiple learners and large feature sets.
 - auto-sklearn An automated machine learning toolkit and a drop-in replacement for a scikit-learn estimator
 - TPOT An automated machine learning toolkit that optimizes a series of scikit-learn operators to design a machine learning pipeline, including data and feature preprocessors as well as the estimators. Works as a drop-in replacement for a scikit-learn estimator.
 - sklearn-pmml Serialization of (some) scikit-learn estimators into PMML.
 - sklearn2pmml Serialization of a wide variety of scikit-learn estimators and transformers into PMML with the help of JPMML-SkLearn library.
 
Other estimators and tasks¶
Not everything belongs or is mature enough for the central scikit-learn project. The following are projects providing interfaces similar to scikit-learn for additional learning algorithms, infrastructures and tasks.
- pylearn2 A deep learning and neural network library build on theano with scikit-learn like interface.
 - sklearn_theano scikit-learn compatible estimators, transformers, and datasets which use Theano internally
 - lightning Fast state-of-the-art linear model solvers (SDCA, AdaGrad, SVRG, SAG, etc...).
 - Seqlearn Sequence classification using HMMs or structured perceptron.
 - HMMLearn Implementation of hidden markov models that was previously part of scikit-learn.
 - PyStruct General conditional random fields and structured prediction.
 - pomegranate Probabilistic modelling for Python, with an emphasis on hidden Markov models.
 - py-earth Multivariate adaptive regression splines
 - sklearn-compiledtrees Generate a C++ implementation of the predict function for decision trees (and ensembles) trained by sklearn. Useful for latency-sensitive production environments.
 - lda: Fast implementation of Latent Dirichlet Allocation in Cython.
 - Sparse Filtering Unsupervised feature learning based on sparse-filtering
 - Kernel Regression Implementation of Nadaraya-Watson kernel regression with automatic bandwidth selection
 - gplearn Genetic Programming for symbolic regression tasks.
 - nolearn A number of wrappers and abstractions around existing neural network libraries
 - sparkit-learn Scikit-learn functionality and API on PySpark.
 - keras Theano-based Deep Learning library.
 - mlxtend Includes a number of additional estimators as well as model visualization utilities.
 - kmodes k-modes clustering algorithm for categorical data, and several of its variations.
 - hdbscan HDBSCAN and Robust Single Linkage clustering algorithms for robust variable density clustering.
 - lasagne A lightweight library to build and train neural networks in Theano.
 - multiisotonic Isotonic regression on multidimensional features.
 - spherecluster Spherical K-means and mixture of von Mises Fisher clustering routines for data on the unit hypersphere.
 
Statistical learning with Python¶
Other packages useful for data analysis and machine learning.
- Pandas Tools for working with heterogeneous and columnar data, relational queries, time series and basic statistics.
 - theano A CPU/GPU array processing framework geared towards deep learning research.
 - statsmodels Estimating and analysing statistical models. More focused on statistical tests and less on prediction than scikit-learn.
 - PyMC Bayesian statistical models and fitting algorithms.
 - REP Environment for conducting data-driven research in a consistent and reproducible way
 - Sacred Tool to help you configure, organize, log and reproduce experiments
 - gensim A library for topic modelling, document indexing and similarity retrieval
 - Seaborn Visualization library based on matplotlib. It provides a high-level interface for drawing attractive statistical graphics.
 - Deep Learning A curated list of deep learning software libraries.
 
Domain specific packages¶
- scikit-image Image processing and computer vision in python.
 - Natural language toolkit (nltk) Natural language processing and some machine learning.
 - NiLearn Machine learning for neuro-imaging.
 - AstroML Machine learning for astronomy.
 - MSMBuilder Machine learning for protein conformational dynamics time series.