cc_staff
284
edits
m (Translate) |
(Marked this version for translation) |
||
Line 1: | Line 1: | ||
<translate> | <translate> | ||
<!--T:1--> | |||
[https://arrow.apache.org/ Apache Arrow] is a cross-language development platform for in-memory data. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware. It also provides computational libraries and zero-copy streaming messaging and interprocess communication. Languages currently supported include C, C++, C#, Go, Java, JavaScript, MATLAB, Python, R, Ruby, and Rust. | [https://arrow.apache.org/ Apache Arrow] is a cross-language development platform for in-memory data. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware. It also provides computational libraries and zero-copy streaming messaging and interprocess communication. Languages currently supported include C, C++, C#, Go, Java, JavaScript, MATLAB, Python, R, Ruby, and Rust. | ||
== CUDA == | == CUDA == <!--T:2--> | ||
Arrow is also available with CUDA. | Arrow is also available with CUDA. | ||
{{Command|module load gcc/8.3.0 arrow/0.16.0 cuda/10.1}} | {{Command|module load gcc/8.3.0 arrow/0.16.0 cuda/10.1}} | ||
== Python bindings == | == Python bindings == <!--T:3--> | ||
The module contains bindings for multiple python versions. | The module contains bindings for multiple python versions. | ||
To discover which are the compatible Python versions: | To discover which are the compatible Python versions: | ||
{{Command|module spider arrow/0.16.0}} | {{Command|module spider arrow/0.16.0}} | ||
=== PyArrow === | === PyArrow === <!--T:4--> | ||
The Arrow Python bindings (also named ''PyArrow'') have first-class integration with NumPy, pandas, and built-in Python objects. They are based on the C++ implementation of Arrow. | The Arrow Python bindings (also named ''PyArrow'') have first-class integration with NumPy, pandas, and built-in Python objects. They are based on the C++ implementation of Arrow. | ||
<!--T:5--> | |||
1. Load the required modules: | 1. Load the required modules: | ||
{{Command|module load gcc/8.3.0 arrow/0.16.0 python/3.7 scipy-stack}} | {{Command|module load gcc/8.3.0 arrow/0.16.0 python/3.7 scipy-stack}} | ||
<!--T:6--> | |||
2. Import PyArrow | 2. Import PyArrow | ||
{{Command|python -c "import pyarrow"}} | {{Command|python -c "import pyarrow"}} | ||
<!--T:7--> | |||
The command display nothing. You have successfully imported PyArrow. | The command display nothing. You have successfully imported PyArrow. | ||
<!--T:8--> | |||
For more information, see the [https://arrow.apache.org/docs/python/ Arrow Python] documentation. | For more information, see the [https://arrow.apache.org/docs/python/ Arrow Python] documentation. | ||
==== Apache Parquet Format ==== | ==== Apache Parquet Format ==== <!--T:9--> | ||
The [http://parquet.apache.org/ Parquet] file format is available. | The [http://parquet.apache.org/ Parquet] file format is available. | ||
<!--T:10--> | |||
To import it, execute previous steps for <tt>pyarrow</tt>, then : | To import it, execute previous steps for <tt>pyarrow</tt>, then : | ||
{{Command|python -c "import pyarrow.parquet"}} | {{Command|python -c "import pyarrow.parquet"}} | ||
<!--T:11--> | |||
The command display nothing. You have successfully imported the Parquet module. | The command display nothing. You have successfully imported the Parquet module. | ||
== R bindings == | == R bindings == <!--T:12--> | ||
The arrow package exposes an interface to the Arrow C++ library to access many of its features in R. This includes support for analyzing large, multi-file datasets ([https://arrow.apache.org/docs/r/reference/open_dataset.html open_dataset()]), working with individual Parquet ([https://arrow.apache.org/docs/r/reference/read_parquet.html read_parquet()], [https://arrow.apache.org/docs/r/reference/write_parquet.html write_parquet()]) and Feather ([https://arrow.apache.org/docs/r/reference/read_feather.html read_feather()], [https://arrow.apache.org/docs/r/reference/write_feather.html write_feather()]) files, as well as lower-level access to Arrow memory and messages. | The arrow package exposes an interface to the Arrow C++ library to access many of its features in R. This includes support for analyzing large, multi-file datasets ([https://arrow.apache.org/docs/r/reference/open_dataset.html open_dataset()]), working with individual Parquet ([https://arrow.apache.org/docs/r/reference/read_parquet.html read_parquet()], [https://arrow.apache.org/docs/r/reference/write_parquet.html write_parquet()]) and Feather ([https://arrow.apache.org/docs/r/reference/read_feather.html read_feather()], [https://arrow.apache.org/docs/r/reference/write_feather.html write_feather()]) files, as well as lower-level access to Arrow memory and messages. | ||
=== Installation === | === Installation === <!--T:13--> | ||
1. Load the required modules: | 1. Load the required modules: | ||
{{Command|module load gcc/8.3.0 arrow/0.16.0 r/3.6 boost/1.68.0}} | {{Command|module load gcc/8.3.0 arrow/0.16.0 r/3.6 boost/1.68.0}} | ||
<!--T:14--> | |||
2. Specify the local installation directory: | 2. Specify the local installation directory: | ||
{{Commands | {{Commands | ||
Line 45: | Line 53: | ||
}} | }} | ||
<!--T:15--> | |||
3. Export the required variables to ensure we are using the system installation: | 3. Export the required variables to ensure we are using the system installation: | ||
{{Commands | {{Commands | ||
Line 52: | Line 61: | ||
}} | }} | ||
<!--T:16--> | |||
4. Install the bindings | 4. Install the bindings | ||
{{Command|R -e 'install.packages("arrow", repos{{=}}"https://cloud.r-project.org/")'}} | {{Command|R -e 'install.packages("arrow", repos{{=}}"https://cloud.r-project.org/")'}} | ||
=== Usage === | === Usage === <!--T:17--> | ||
Once installed, you can load the bindings. | Once installed, you can load the bindings. | ||
<!--T:18--> | |||
1. Load the required modules: | 1. Load the required modules: | ||
{{Command|module load gcc/8.3.0 arrow/0.16.0 r/3.6}} | {{Command|module load gcc/8.3.0 arrow/0.16.0 r/3.6}} | ||
<!--T:19--> | |||
2. Load the library: | 2. Load the library: | ||
{{Command | {{Command | ||
Line 69: | Line 81: | ||
}} | }} | ||
<!--T:20--> | |||
For more information on its usage, see [https://arrow.apache.org/docs/r/index.html Arrow R documentation] | For more information on its usage, see [https://arrow.apache.org/docs/r/index.html Arrow R documentation] | ||
</translate> | </translate> |