Arrow: Difference between revisions

Jump to navigation Jump to search
231 bytes added ,  4 years ago
Marked this version for translation
m (Translate)
(Marked this version for translation)
Line 1: Line 1:
<translate>
<translate>
<!--T:1-->
[https://arrow.apache.org/ Apache Arrow] is a cross-language development platform for in-memory data. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware. It also provides computational libraries and zero-copy streaming messaging and interprocess communication. Languages currently supported include C, C++, C#, Go, Java, JavaScript, MATLAB, Python, R, Ruby, and Rust.
[https://arrow.apache.org/ Apache Arrow] is a cross-language development platform for in-memory data. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware. It also provides computational libraries and zero-copy streaming messaging and interprocess communication. Languages currently supported include C, C++, C#, Go, Java, JavaScript, MATLAB, Python, R, Ruby, and Rust.


== CUDA ==
== CUDA == <!--T:2-->
Arrow is also available with CUDA.
Arrow is also available with CUDA.
{{Command|module load gcc/8.3.0 arrow/0.16.0 cuda/10.1}}
{{Command|module load gcc/8.3.0 arrow/0.16.0 cuda/10.1}}


== Python bindings ==
== Python bindings == <!--T:3-->
The module contains bindings for multiple python versions.  
The module contains bindings for multiple python versions.  
To discover which are the compatible Python versions:
To discover which are the compatible Python versions:
{{Command|module spider arrow/0.16.0}}
{{Command|module spider arrow/0.16.0}}


=== PyArrow ===
=== PyArrow === <!--T:4-->
The Arrow Python bindings (also named ''PyArrow'') have first-class integration with NumPy, pandas, and built-in Python objects. They are based on the C++ implementation of Arrow.
The Arrow Python bindings (also named ''PyArrow'') have first-class integration with NumPy, pandas, and built-in Python objects. They are based on the C++ implementation of Arrow.


<!--T:5-->
1. Load the required modules:
1. Load the required modules:
{{Command|module load gcc/8.3.0 arrow/0.16.0 python/3.7 scipy-stack}}
{{Command|module load gcc/8.3.0 arrow/0.16.0 python/3.7 scipy-stack}}


<!--T:6-->
2. Import PyArrow
2. Import PyArrow
{{Command|python -c "import pyarrow"}}
{{Command|python -c "import pyarrow"}}


<!--T:7-->
The command display nothing. You have successfully imported PyArrow.
The command display nothing. You have successfully imported PyArrow.


<!--T:8-->
For more information, see the [https://arrow.apache.org/docs/python/ Arrow Python] documentation.
For more information, see the [https://arrow.apache.org/docs/python/ Arrow Python] documentation.


==== Apache Parquet Format ====
==== Apache Parquet Format ==== <!--T:9-->
The [http://parquet.apache.org/ Parquet] file format is available.  
The [http://parquet.apache.org/ Parquet] file format is available.  


<!--T:10-->
To import it, execute previous steps for <tt>pyarrow</tt>, then :
To import it, execute previous steps for <tt>pyarrow</tt>, then :
{{Command|python -c "import pyarrow.parquet"}}
{{Command|python -c "import pyarrow.parquet"}}


<!--T:11-->
The command display nothing. You have successfully imported the Parquet module.
The command display nothing. You have successfully imported the Parquet module.


== R bindings ==
== R bindings == <!--T:12-->
The arrow package exposes an interface to the Arrow C++ library to access many of its features in R. This includes support for analyzing large, multi-file datasets ([https://arrow.apache.org/docs/r/reference/open_dataset.html open_dataset()]), working with individual Parquet ([https://arrow.apache.org/docs/r/reference/read_parquet.html read_parquet()], [https://arrow.apache.org/docs/r/reference/write_parquet.html write_parquet()]) and Feather ([https://arrow.apache.org/docs/r/reference/read_feather.html read_feather()], [https://arrow.apache.org/docs/r/reference/write_feather.html write_feather()]) files, as well as lower-level access to Arrow memory and messages.
The arrow package exposes an interface to the Arrow C++ library to access many of its features in R. This includes support for analyzing large, multi-file datasets ([https://arrow.apache.org/docs/r/reference/open_dataset.html open_dataset()]), working with individual Parquet ([https://arrow.apache.org/docs/r/reference/read_parquet.html read_parquet()], [https://arrow.apache.org/docs/r/reference/write_parquet.html write_parquet()]) and Feather ([https://arrow.apache.org/docs/r/reference/read_feather.html read_feather()], [https://arrow.apache.org/docs/r/reference/write_feather.html write_feather()]) files, as well as lower-level access to Arrow memory and messages.


=== Installation ===
=== Installation === <!--T:13-->
1. Load the required modules:
1. Load the required modules:
{{Command|module load gcc/8.3.0 arrow/0.16.0 r/3.6 boost/1.68.0}}
{{Command|module load gcc/8.3.0 arrow/0.16.0 r/3.6 boost/1.68.0}}


<!--T:14-->
2. Specify the local installation directory:
2. Specify the local installation directory:
{{Commands
{{Commands
Line 45: Line 53:
}}
}}


<!--T:15-->
3. Export the required variables to ensure we are using the system installation:
3. Export the required variables to ensure we are using the system installation:
{{Commands
{{Commands
Line 52: Line 61:
}}
}}


<!--T:16-->
4. Install the bindings
4. Install the bindings
{{Command|R -e 'install.packages("arrow", repos{{=}}"https://cloud.r-project.org/")'}}
{{Command|R -e 'install.packages("arrow", repos{{=}}"https://cloud.r-project.org/")'}}


=== Usage ===
=== Usage === <!--T:17-->
Once installed, you can load the bindings.
Once installed, you can load the bindings.


<!--T:18-->
1. Load the required modules:
1. Load the required modules:
{{Command|module load gcc/8.3.0 arrow/0.16.0 r/3.6}}
{{Command|module load gcc/8.3.0 arrow/0.16.0 r/3.6}}


<!--T:19-->
2. Load the library:
2. Load the library:
{{Command
{{Command
Line 69: Line 81:
}}
}}


<!--T:20-->
For more information on its usage, see [https://arrow.apache.org/docs/r/index.html Arrow R documentation]
For more information on its usage, see [https://arrow.apache.org/docs/r/index.html Arrow R documentation]
</translate>
</translate>
cc_staff
284

edits

Navigation menu