Arrow: Difference between revisions

9 bytes removed ,  4 years ago
no edit summary
No edit summary
No edit summary
Line 2: Line 2:
<translate>
<translate>
<!--T:1-->
<!--T:1-->
[https://arrow.apache.org/ Apache Arrow] is a cross-language development platform for in-memory data. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware. It also provides computational libraries and zero-copy streaming messaging and interprocess communication. Languages currently supported include C, C++, C#, Go, Java, JavaScript, MATLAB, Python, R, Ruby, and Rust.
[https://arrow.apache.org/ Apache Arrow] is a cross-language development platform for in-memory data. It uses a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations. It also provides computational libraries and zero-copy streaming messaging and interprocess communication. Languages currently supported include C, C++, C#, Go, Java, JavaScript, MATLAB, Python, R, Ruby, and Rust.


== CUDA == <!--T:2-->
== CUDA == <!--T:2-->
Line 9: Line 9:


== Python bindings == <!--T:3-->
== Python bindings == <!--T:3-->
The module contains bindings for multiple python versions.  
The module contains bindings for multiple Python versions.  
To discover which are the compatible Python versions:
To discover which are the compatible Python versions, run
{{Command|module spider arrow/0.16.0}}
{{Command|module spider arrow/0.16.0}}


=== PyArrow === <!--T:4-->
=== PyArrow === <!--T:4-->
The Arrow Python bindings (also named ''PyArrow'') have first-class integration with NumPy, pandas, and built-in Python objects. They are based on the C++ implementation of Arrow.
The Arrow Python bindings (also named ''PyArrow'') have first-class integration with NumPy, Pandas, and built-in Python objects. They are based on the C++ implementation of Arrow.


<!--T:5-->
<!--T:5-->
1. Load the required modules:
1. Load the required modules.
{{Command|module load gcc/8.3.0 arrow/0.16.0 python/3.7 scipy-stack}}
{{Command|module load gcc/8.3.0 arrow/0.16.0 python/3.7 scipy-stack}}


<!--T:6-->
<!--T:6-->
2. Import PyArrow:
2. Import PyArrow.
{{Command|python -c "import pyarrow"}}
{{Command|python -c "import pyarrow"}}


<!--T:7-->
<!--T:7-->
The command display nothing. You have successfully imported PyArrow.
If the command displays nothing, the import was successful.


<!--T:8-->
<!--T:8-->
For more information, see the [https://arrow.apache.org/docs/python/ Arrow Python] documentation.
For more information, see the [https://arrow.apache.org/docs/python/ Arrow Python] documentation.


==== Apache Parquet Format ==== <!--T:9-->
==== Apache Parquet format ==== <!--T:9-->
The [http://parquet.apache.org/ Parquet] file format is available.  
The [http://parquet.apache.org/ Parquet] file format is available.  


<!--T:10-->
<!--T:10-->
To import it, execute previous steps for <tt>pyarrow</tt>, then:
To import the Parquet module, execute the previous steps for <tt>pyarrow</tt>, then run
{{Command|python -c "import pyarrow.parquet"}}
{{Command|python -c "import pyarrow.parquet"}}


<!--T:11-->
<!--T:11-->
The command display nothing. You have successfully imported the Parquet module.
If the command displays nothing, the import was successful.


== R bindings == <!--T:12-->
== R bindings == <!--T:12-->
The arrow package exposes an interface to the Arrow C++ library to access many of its features in R. This includes support for analyzing large, multi-file datasets ([https://arrow.apache.org/docs/r/reference/open_dataset.html open_dataset()]), working with individual Parquet ([https://arrow.apache.org/docs/r/reference/read_parquet.html read_parquet()], [https://arrow.apache.org/docs/r/reference/write_parquet.html write_parquet()]) and Feather ([https://arrow.apache.org/docs/r/reference/read_feather.html read_feather()], [https://arrow.apache.org/docs/r/reference/write_feather.html write_feather()]) files, as well as lower-level access to Arrow memory and messages.
The Arrow package exposes an interface to the Arrow C++ library to access many of its features in R. This includes support for analyzing large, multi-file datasets ([https://arrow.apache.org/docs/r/reference/open_dataset.html open_dataset()]), working with individual Parquet files ([https://arrow.apache.org/docs/r/reference/read_parquet.html read_parquet()], [https://arrow.apache.org/docs/r/reference/write_parquet.html write_parquet()]) and Feather files ([https://arrow.apache.org/docs/r/reference/read_feather.html read_feather()], [https://arrow.apache.org/docs/r/reference/write_feather.html write_feather()]), as well as lower-level access to the Arrow memory and messages.


=== Installation === <!--T:13-->
=== Installation === <!--T:13-->
1. Load the required modules:
1. Load the required modules.
{{Command|module load gcc/8.3.0 arrow/0.16.0 r/3.6 boost/1.68.0}}
{{Command|module load gcc/8.3.0 arrow/0.16.0 r/3.6 boost/1.68.0}}


<!--T:14-->
<!--T:14-->
2. Specify the local installation directory:
2. Specify the local installation directory.
{{Commands
{{Commands
|mkdir -p ~/.local/R/$EBVERSIONR/
|mkdir -p ~/.local/R/$EBVERSIONR/
Line 55: Line 55:


<!--T:15-->
<!--T:15-->
3. Export the required variables to ensure we are using the system installation:
3. Export the required variables to ensure you are using the system installation.
{{Commands
{{Commands
|export PKG_CONFIG_PATH{{=}}$EBROOTARROW/lib/pkgconfig
|export PKG_CONFIG_PATH{{=}}$EBROOTARROW/lib/pkgconfig
Line 63: Line 63:


<!--T:16-->
<!--T:16-->
4. Install the bindings:
4. Install the bindings.
{{Command|R -e 'install.packages("arrow", repos{{=}}"https://cloud.r-project.org/")'}}
{{Command|R -e 'install.packages("arrow", repos{{=}}"https://cloud.r-project.org/")'}}


=== Usage === <!--T:17-->
=== Usage === <!--T:17-->
Once installed, you can load the bindings.
After the bindings are installed, they have to be loaded.


<!--T:18-->
<!--T:18-->
1. Load the required modules:
1. Load the required modules.
{{Command|module load gcc/8.3.0 arrow/0.16.0 r/3.6}}
{{Command|module load gcc/8.3.0 arrow/0.16.0 r/3.6}}


<!--T:19-->
<!--T:19-->
2. Load the library:
2. Load the library.
{{Command
{{Command
|R -e "library(arrow)"
|R -e "library(arrow)"
Line 83: Line 83:


<!--T:20-->
<!--T:20-->
For more information on its usage, see [https://arrow.apache.org/docs/r/index.html Arrow R documentation]
For more information, see the [https://arrow.apache.org/docs/r/index.html Arrow R documentation]
</translate>
</translate>
rsnt_translations
56,430

edits