Large Scale Machine Learning (Big Data): Difference between revisions

Jump to navigation Jump to search
no edit summary
No edit summary
No edit summary
Line 85: Line 85:


<!--T:23-->
<!--T:23-->
In this first example, we use [https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html SGDClassifier] to fit a Linear SVM Classifier with batches of data coming from a pair of '''numpy''' arrays. These arrays are stored on disk as [https://numpy.org/doc/stable/reference/generated/numpy.lib.format.html#npy-format npy files] and we will keep them there by [https://numpy.org/doc/stable/reference/generated/numpy.memmap.html memory-mapping] these files. Since <code>SGDClassifier</code> has the <code>partial_fit</code> method, we can iterate through our large memory-mapped files loading only a small batch of rows from the arrays in memory at a time. Each call to <code>partial_fit</code> will then run one epoch of Stochastic Gradient Descent over a batch of data.
In this first example, we use [https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html SGDClassifier] to fit a linear SVM classifier with batches of data coming from a pair of <b>numpy</b> arrays. These arrays are stored on disk as [https://numpy.org/doc/stable/reference/generated/numpy.lib.format.html#npy-format npy files] and we will keep them there by [https://numpy.org/doc/stable/reference/generated/numpy.memmap.html memory-mapping] these files. Since <code>SGDClassifier</code> has the <code>partial_fit</code> method, we can iterate through our large memory-mapped files loading only a small batch of rows from the arrays in memory at a time. Each call to <code>partial_fit</code> will then run one epoch of stochastic gradient descent over a batch of data.


<!--T:24-->
<!--T:24-->
rsnt_translations
56,430

edits

Navigation menu