df-research/dataframe/docs/StardustDocs/topics/parse.md

[//]: # (title: parse)
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Modify-->

Returns a [`DataFrame`](DataFrame.md) in which the given `String` and `Char` columns are parsed into other types.

This is a special case of the [](convert.md) operation.

This parsing operation is sometimes executed implicitly, for example, when [reading from CSV](read.md) or
[type converting from `String`/`Char` columns](convert.md).
You can recognize this by the `locale` or `parserOptions` arguments in these functions.

Related operations: [](updateConvert.md)

<!---FUN parseAll-->

```kotlin
df.parse()
```

<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Modify.parseAll.html" width="100%"/>
<!---END-->

When no columns are specified, all `String` and `Char` columns are parsed,
even those inside [column groups](DataColumn.md#columngroup) and inside [frame columns](DataColumn.md#framecolumn).

To parse only particular columns, use a [column selector](ColumnSelectors.md):

<!---FUN parseSome-->

```kotlin
df.parse { age and weight }
```

<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Modify.parseSome.html" width="100%"/>
<!---END-->

### Parsing Order

`parse` tries to parse every `String`/`Char` column into one of the supported types in the following order:
* `Int`
* `Long`
* `Instant` (`kotlin.time`) (requires `parseExperimentalInstant = true`, enabled by default in DataFrame 1.0.0-Beta5)
* `Instant` (`kotlinx.datetime` and `java.time`) (requires `parseExperimentalInstant = false`)
* `LocalDateTime` (`kotlinx.datetime` and `java.time`)
* `LocalDate` (`kotlinx.datetime` and `java.time`)
* `Duration` (`kotlin.time` and `java.time`)
* `LocalTime` (`java.time`)
* `URL` (`java.net`)
* [`Double` (with optional locale settings)](#parsing-doubles)
* `Boolean`
* `Uuid` ([`kotlin.uuid.Uuid`](https://kotlinlang.org/api/core/kotlin-stdlib/kotlin.uuid/-uuid/)) (requires `parseExperimentalUuid = true`)
* `BigDecimal`
* `JSON` (arrays and objects) (requires the `org.jetbrains.kotlinx:dataframe-json` dependency)
* `Char`
* `String`

When `.parse()` is called on a single column and the input (`String`/`Char`) type is the same as the output type,
(a.k.a., it cannot be parsed further) an `IllegalStateException` is thrown.
To avoid this, use `col.tryParse()` instead.

### Parser Options

DataFrame supports multiple parser options that can be used to customize the parsing behavior.
These can be supplied to the `parse` function (or any other function that can implicitly parse `Strings`)
as an argument.

For each option you don't supply (or supply `null`) DataFrame will take the value from the
[Global Parser Options](#global-parser-options).

Available parser options:
* `locale: Locale` is used to [parse doubles](#parsing-doubles)
  * Global default locale is `Locale.getDefault()`
* `dateTimePattern: String` is used to parse date and time
  * Global default supports ISO (local) date-time
* `dateTimeFormatter: DateTimeFormatter` is used to parse date and time
  * Is derived from `dateTimePattern` and/or `locale` if `null`
* `nullStrings: List<String>` is used to treat particular strings as `null` value
  * Global default null strings are **"null"** and **"NULL"**
  * When [reading from CSV](read.md), we include even more defaults, like **""**, and **"NA"**.
  See the KDocs there for the exact details
* `skipTypes: Set<KType>` types that should be skipped during parsing
  * Empty set by global default; parsing can result in any supported type
* `useFastDoubleParser: Boolean` is used to enable or disable the [new fast double parser](#parsing-doubles)
  * Enabled by global default
* `parseExperimentalUuid: Boolean` is used to enable or disable parsing to the experimental [`kotlin.uuid.Uuid` class](https://kotlinlang.org/api/core/kotlin-stdlib/kotlin.uuid/-uuid/).
  * Disabled by global default
* `parseExperimentalInstant: Boolean` is used to enable or disable parsing to the
  [`kotlin.time.Instant` class](https://kotlinlang.org/api/core/kotlin-stdlib/kotlin.time/-instant/), available from Kotlin 2.1+. Will parse to `kotlinx.datetime.Instant` if `false`.
  * Disabled by global default, enabled in DataFrame 1.0.0-Beta5.

<!---FUN parseWithOptions-->

```kotlin
df.parse(options = ParserOptions(locale = Locale.CHINA, dateTimeFormatter = DateTimeFormatter.ISO_WEEK_DATE))
```

<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Modify.parseWithOptions.html" width="100%"/>
<!---END-->

### Global Parser Options

As mentioned before, you can change the default global parser options that will be used by [`read`](read.md),
[`convert`](convert.md), and other `parse` operations.
Whenever you don't explicitly provide [parser options](#parser-options) to a function call,
DataFrame will use these global options instead.

For example, to change the locale to French and add a custom date-time pattern for all following DataFrame calls, do:

<!---FUN globalParserOptions-->

```kotlin
DataFrame.parser.locale = Locale.FRANCE
DataFrame.parser.addDateTimePattern("dd.MM.uuuu HH:mm:ss")
```

<!---END-->

For `locale`, this means that the one being used by the parser is defined as:

↪ The locale given as function argument directly, or in `parserOptions`, if it is not `null`, else

&nbsp;&nbsp;&nbsp;&nbsp;↪ The locale set by `DataFrame.parser.locale = ...`, if it is not `null`, else

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;↪ `Locale.getDefault()`, which is the system's default locale that can be changed with `Locale.setDefault()`.

### Parsing Doubles

DataFrame has a new fast and powerful double parser enabled by default.
It is based on [the FastDoubleParser library](https://github.com/wrandelshofer/FastDoubleParser) for its
high performance and configurability
(in the future, we might expand this support to `Float`, `BigDecimal`, and `BigInteger` as well).

The parser is locale-aware; it will use the locale set by the
[(global)](#global-parser-options) [parser options](#parser-options) to parse the doubles.
It also has a fallback mechanism built in, meaning it can recognize characters from
all other locales (and some from [Wikipedia](https://en.wikipedia.org/wiki/Decimal_separator))
and parse them correctly as long as they don't conflict with the current locale.

For example, if your locale uses ',' as decimal separator, it will not recognize ',' as thousands separator, but it will
recognize ''', ' ', '٬', '_', ' ', etc. as such.
The same holds for characters like "e", "inf", "×10^", "NaN", etc. (ignoring case).

This means you can safely parse `"123'456 789,012.345×10^6"` with a US locale but not `"1.234,5"`.

Aside from this, DataFrame also explicitly recognizes "∞", "inf", "infinity", and "infty" as `Double.POSITIVE_INFINITY`
(as well as their negative counterparts), "nan", "na", and "n/a" as `Double.NaN`,
and all forms of whitespace are treated equally.

If `FastDoubleParser` fails to parse a `String` as `Double`, DataFrame will try
to parse it using the standard `NumberFormat.parse()` function as a last resort.

If you experience any issues with the new parser, you can turn it off by setting
`useFastDoubleParser = false`, which will use the old `NumberFormat.parse()` function instead.

Please [report](https://github.com/Kotlin/dataframe/issues) any issues you encounter.