Large Scale Machine Learning (Big Data): Difference between revisions

Jump to navigation Jump to search
no edit summary
No edit summary
No edit summary
Line 59: Line 59:


<!--T:16-->
<!--T:16-->
Another option that reduces memory usage even more, is to use [https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDRegressor.html SGDRegressor] instead of <code>Ridge</code>. This class implements many types of Generalized Linear Models for regression, using vanilla Stochastic Gradient Descent as a solver. One caveat of using <code>SGDRegressor</code> is that it only works if the output is 1-dimensional (a scalar).
Another option that reduces memory usage even more, is to use [https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDRegressor.html SGDRegressor] instead of Ridge. This class implements many types of generalized linear models for regression, using a vanilla stochastic gradient descent as a solver. One caveat of using SGDRegressor is that it only works if the output is unidimensional (a scalar).


<!--T:17-->
<!--T:17-->
Line 79: Line 79:
}}
}}


==Batch Learning== <!--T:21-->
==Batch learning== <!--T:21-->


<!--T:22-->
<!--T:22-->
In cases where your dataset is too large to fit in memory - or just large enough that it does not leave enough memory free for training - it is possible to leave your data on disk and load it in batches during training, similar to how Deep Learning packages work. <code>scikit-learn</code> refers to this as [https://scikit-learn.org/stable/computing/scaling_strategies.html out-of-core learning] and it is a viable option whenever an estimator has the <code>partial_fit</code> [https://scikit-learn.org/stable/computing/scaling_strategies.html?highlight=partial_fit#incremental-learning  method available]. In the examples below, we perform out-of-core learning by iterating over datasets stored on disk.
In cases where your dataset is too large to fit in memory --or just large enough that it does not leave enough memory free for training-- it is possible to leave your data on disk and load it in batches during training, similar to how deep learning packages work. Scikit-learn refers to this as [https://scikit-learn.org/stable/computing/scaling_strategies.html <i>out-of-core learning</i>] and it is a viable option whenever an estimator has the <code>partial_fit</code> [https://scikit-learn.org/stable/computing/scaling_strategies.html?highlight=partial_fit#incremental-learning  method available]. In the examples below, we perform out-of-core learning by iterating over datasets stored on disk.


<!--T:23-->
<!--T:23-->
rsnt_translations
56,430

edits

Navigation menu