pyobsmod.Dataset#

class pyobsmod.Dataset(obs: ndarray | None = None, mod: ndarray | None = None, df: DataFrame | None = None, time: ndarray | Index | DatetimeIndex | None = None)#

Dataset object.

Dataset is a class built on pandas DataFrames that allows for quick plotting and analyzing of observational vs modelled data.

It inherits functionalities from the BaseDataset class.

Parameters:
  • obs (numpy.ndarray | None) – Observations data. Should preferably be a 1-dimensional np.ndarray, but lists and tuples can be passed as well.

  • mod (list[str] | None) – Modelled data. Should preferably be a 1-dimensional np.ndarray, but lists and tuples can be passed as well.

  • df (pandas.DataFrame | None) – It is also possible to pass directly a DataFrame that must contain at least he columns obs and mod and associates string values. Note also that if df is None, both obs and mod must not be None.

  • time (numpy.ndarray | None) – A 1-dimensional array with the time steps of the observation and model data. Will be used automatically as ticks in certain plots like Dataset.time_series_plot.

Examples

Create a dataset from some data.

import numpy as np
import pyobsmod as pom

obs = np.sin(np.arange(100))
mod = obs + np.random.normal(size=100)
ds = pom.Dataset(obs, mod)
print(ds)
pyobsmod.Dataset(
         obs       mod
0   0.000000 -1.068269
1   0.841471  1.518350
2   0.909297  0.386428
3   0.141120 -0.093398
4  -0.756802 -0.878878
..       ...       ...
95  0.683262 -0.840737
96  0.983588 -0.575326
97  0.379608 -0.525374
98 -0.573382 -0.351681
99 -0.999207 -1.411664

[100 rows x 2 columns]
)
__init__(obs: ndarray | None = None, mod: ndarray | None = None, df: DataFrame | None = None, time: ndarray | Index | DatetimeIndex | None = None) None#

Initialize the Dataset object.

Parameters:
  • obs (numpy.ndarray | None) – Observations data. Should preferably be a 1-dimensional np.ndarray, but lists and tuples can be passed as well.

  • mod (list[str] | None) – Modelled data. Should preferably be a 1-dimensional np.ndarray, but lists and tuples can be passed as well.

  • df (pandas.DataFrame | None) – It is also possible to pass directly a DataFrame that must contain at least the columns obs and mod and associates string values. Note also that if df is None, both obs and mod must not be None.

  • time (numpy.ndarray | None) – A 1-dimensional array with the time steps of the observation and modelling data. Will be used automatically as ticks in certain plots like Dataset.time_series_plot.

Methods

__init__([obs, mod, df, time])

Initialize the Dataset object.

bias()

Bias.

compute_stats(which_stats[, names])

Compute a list of statistics parameters.

describe_dataset()

Compute the bias, rmse, nrmse, and r2.

lr()

Perform linear regression (y=ax+b).

nrmse([norm])

Normalize root mean squared error.

r([method])

Correlation coefficient.

r2()

Coefficient of determination.

rmse()

Root mean squared error.

save(path)

Save this class as a pickle file.

scatter_plot([which_stats, names, fmt, ax, ...])

Scatter plot of observed data against modelled data.

scatter_plot_joint([which_stats, names, ...])

Scatter plot sns observed data against modelled data.

scatter_plot_sns([which_stats, names, fmt, ...])

Scatter plot sns observed data against modelled data.

stats_plot(which_stats[, names, decimals, ...])

Textbox plot summarizing specified statistics.

time_series_plot([which_stats, names, fmt, ...])

Time series plot of observed data against modelled data.

Attributes

data

Alias for self.values.

df

Retrieves the DataFrame stored in the object.

values

Get the dataset as a numpy array.