Files
2026-02-08 11:20:43 -10:00

74 lines
3.3 KiB
Markdown
Vendored
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
[//]: # (title: std)
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Analyze-->
Computes the [standard deviation (std, σ)](https://en.wikipedia.org/wiki/Standard_deviation) of values.
`null` values are ignored.
All primitive numeric types are supported: `Byte`, `Short`, `Int`, `Long`, `Float`, and `Double`.
`std` also supports the "mixed" `Number` type, as long as the column consists only of the aforementioned
primitive numbers.
The numbers are automatically converted to a [common type](numberUnification.md) for the operation.
The return type is always `Double`; `Double.NaN` for empty columns.
All operations on `Double`/`Float`/`Number` have the `skipNaN` option, which is
set to `false` by default. This means that if a `NaN` is present in the input, it will be propagated to the result.
When it's set to `true`, `NaN` values are ignored.
### Delta Degrees of Freedom: DDoF
All `std` operations also have the `ddof`
([Delta Degrees of Freedom](https://en.wikipedia.org/wiki/Degrees_of_freedom_%28statistics%29)) argument.
The default is set to `1`, meaning DataFrame uses
[Bessels correction](https://en.wikipedia.org/wiki/Bessel%27s_correction)
to calculate the "unbiased sample standard deviation" by default.
This is also the standard in languages like [R](https://www.r-project.org/).
However, it is different from the "population standard deviation" (where `ddof = 0`),
which is used in libraries like [Numpy](https://numpy.org/doc/stable/reference/generated/numpy.std.html).
<!---FUN stdModes-->
```kotlin
df.std() // std of values per every numeric column
df.std { age and weight } // std of all values in `age` and `weight`
df.stdFor(skipNaN = true) { age and weight } // std of values per `age` and `weight` separately, skips NA
df.stdOf { (weight ?: 0) / age } // std of expression evaluated for every row
```
<!---END-->
<!---FUN stdAggregations-->
```kotlin
df.std()
df.age.std()
df.groupBy { city }.std()
df.pivot { city }.std()
df.pivot { city }.groupBy { name.lastName }.std()
```
<!---END-->
See [statistics](summaryStatistics.md#groupby-statistics) for details on complex data aggregations.
See [column selectors](ColumnSelectors.md) for how to select the columns for this operation.
### Type Conversion
The following automatic type conversions are performed for the `mean` operation:
| Conversion | Result for Empty Input |
|----------------------------------------------------------------------------|------------------------|
| Int -> Double | Double.NaN |
| Byte -> Double | Double.NaN |
| Short -> Double | Double.NaN |
| Long -> Double | Double.NaN |
| Double -> Double | Double.NaN |
| Float -> Double | Double.NaN |
| Number -> Conversion([Common number type](numberUnification.md)) -> Double | Double.NaN |
| Nothing -> Double | Double.NaN |