pygmt.clib.Session.virtualfile_to_dataset

Session.virtualfile_to_dataset(vfname, output_type='pandas', header=None, column_names=None, dtype=None, index_col=None)[source]

Output a tabular dataset stored in a virtual file to a different format.

The format of the dataset is determined by the output_type parameter.

Parameters:

vfname (str) – The virtual file name that stores the result data.
output_type (Literal['pandas', 'numpy', 'file', 'strings'], default: 'pandas') –
Desired output type of the result data.
- "pandas" will return a pandas.DataFrame object.
- "numpy" will return a numpy.ndarray object.
- "file" means the result was saved to a file and will return None.
- "strings" will return the trailing text only as an array of strings.
header (int | None, default: None) – Row number containing column names for the pandas.DataFrame output. header=None means not to parse the column names from table header. Ignored if the row number is larger than the number of headers in the table.
column_names (list[str] | None, default: None) – The column names for the pandas.DataFrame output.
dtype (type | dict[str, type] | None, default: None) – Data type for the columns of the pandas.DataFrame output. Can be a single type for all columns or a dictionary mapping column names to types.
index_col (str | int | None, default: None) – Column to set as the index of the pandas.DataFrame output.

Return type:

DataFrame | ndarray | None

Returns:

result – The result dataset. If output_type="file" returns None.

Examples

>>> from pathlib import Path
>>> import numpy as np
>>> import pandas as pd
>>>
>>> from pygmt.helpers import GMTTempFile
>>> from pygmt.clib import Session
>>>
>>> with GMTTempFile(suffix=".txt") as tmpfile:
...     # prepare the sample data file
...     with Path(tmpfile.name).open(mode="w") as fp:
...         print(">", file=fp)
...         print("1.0 2.0 3.0 TEXT1 TEXT23", file=fp)
...         print("4.0 5.0 6.0 TEXT4 TEXT567", file=fp)
...         print(">", file=fp)
...         print("7.0 8.0 9.0 TEXT8 TEXT90", file=fp)
...         print("10.0 11.0 12.0 TEXT123 TEXT456789", file=fp)
...
...     # file output
...     with Session() as lib:
...         with GMTTempFile(suffix=".txt") as outtmp:
...             with lib.virtualfile_out(
...                 kind="dataset", fname=outtmp.name
...             ) as vouttbl:
...                 lib.call_module("read", [tmpfile.name, vouttbl, "-Td"])
...                 result = lib.virtualfile_to_dataset(
...                     vfname=vouttbl, output_type="file"
...                 )
...                 assert result is None
...                 assert Path(outtmp.name).stat().st_size > 0
...
...     # strings, numpy and pandas outputs
...     with Session() as lib:
...         with lib.virtualfile_out(kind="dataset") as vouttbl:
...             lib.call_module("read", [tmpfile.name, vouttbl, "-Td"])
...
...             # strings output
...             outstr = lib.virtualfile_to_dataset(
...                 vfname=vouttbl, output_type="strings"
...             )
...             assert isinstance(outstr, np.ndarray)
...             assert outstr.dtype.kind in ("S", "U")
...
...             # numpy output
...             outnp = lib.virtualfile_to_dataset(
...                 vfname=vouttbl, output_type="numpy"
...             )
...             assert isinstance(outnp, np.ndarray)
...
...             # pandas output
...             outpd = lib.virtualfile_to_dataset(
...                 vfname=vouttbl, output_type="pandas"
...             )
...             assert isinstance(outpd, pd.DataFrame)
...
...             # pandas output with specified column names
...             outpd2 = lib.virtualfile_to_dataset(
...                 vfname=vouttbl,
...                 output_type="pandas",
...                 column_names=["col1", "col2", "col3", "coltext"],
...             )
...             assert isinstance(outpd2, pd.DataFrame)
>>> outstr
array(['TEXT1 TEXT23', 'TEXT4 TEXT567', 'TEXT8 TEXT90',
   'TEXT123 TEXT456789'], dtype='<U18')
>>> outnp
array([[1.0, 2.0, 3.0, 'TEXT1 TEXT23'],
       [4.0, 5.0, 6.0, 'TEXT4 TEXT567'],
       [7.0, 8.0, 9.0, 'TEXT8 TEXT90'],
       [10.0, 11.0, 12.0, 'TEXT123 TEXT456789']], dtype=object)
>>> outpd
      0     1     2                   3
0   1.0   2.0   3.0        TEXT1 TEXT23
1   4.0   5.0   6.0       TEXT4 TEXT567
2   7.0   8.0   9.0        TEXT8 TEXT90
3  10.0  11.0  12.0  TEXT123 TEXT456789
>>> outpd2
   col1  col2  col3             coltext
0   1.0   2.0   3.0        TEXT1 TEXT23
1   4.0   5.0   6.0       TEXT4 TEXT567
2   7.0   8.0   9.0        TEXT8 TEXT90
3  10.0  11.0  12.0  TEXT123 TEXT456789