cc_staff
353
edits
(Add section about large collections of files) |
No edit summary |
||
Line 1: | Line 1: | ||
{{Draft}} | {{Draft}} | ||
= Python = | |||
[[Python]] is very popular in the field of machine learning. If you (plan to) use it on our clusters, please refer to [[Python|our documentation about Python]] to get important information about Python versions, virtual environments on login or on compute nodes, multiprocessing, Anaconda, Jupyter, etc. | [[Python]] is very popular in the field of machine learning. If you (plan to) use it on our clusters, please refer to [[Python|our documentation about Python]] to get important information about Python versions, virtual environments on login or on compute nodes, multiprocessing, Anaconda, Jupyter, etc. | ||
= Useful information about software packages = | |||
Please refer to the page of your machine learning package of choice for useful information about how to install, common pitfalls, etc.: | Please refer to the page of your machine learning package of choice for useful information about how to install, common pitfalls, etc.: | ||
Line 16: | Line 16: | ||
* [[XGBoost]] | * [[XGBoost]] | ||
= Datasets containing lots of small files (e.g. image datasets) = | |||
In machine learning, it is common to have to manage very large collections of files, meaning hundreds of thousands or more. The individual files may be fairly small, e.g. less than a few hundred kilobytes. In these cases, problems arise: | In machine learning, it is common to have to manage very large collections of files, meaning hundreds of thousands or more. The individual files may be fairly small, e.g. less than a few hundred kilobytes. In these cases, problems arise: |