Column API
A column in tablecloth is a named sequence of typed data. This special type is defined in the tech.ml.dataset. It is roughly comparable to a R vector.
Column Creation
Empty column
(tcc/column)#tech.v3.dataset.column<boolean>[0]
null
[]Column from a vector or a sequence
(tcc/column [1 2 3 4 5])#tech.v3.dataset.column<int64>[5]
null
[1, 2, 3, 4, 5](tcc/column `(1 2 3 4 5))#tech.v3.dataset.column<int64>[5]
null
[1, 2, 3, 4, 5]Ones & Zeros
You can also quickly create columns of ones or zeros:
(tcc/ones 10)#tech.v3.dataset.column<int64>[10]
null
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1](tcc/zeros 10)#tech.v3.dataset.column<int64>[10]
null
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]Column?
Finally, you can use the column? function to check if an item is a column:
(tcc/column? [1 2 3 4 5])false(tcc/column? (tcc/column))trueTablecloth’s datasets of course consists of columns:
(tcc/column? (-> (tc/dataset {:a [1 2 3 4 5]})
:a))trueTypes and Type detection
The default set of types for a column are defined in the underlying “tech ml” system. We can see the set here:
(tech.v3.datatype.casting/all-datatypes)(:int32
:int16
:float32
:float64
:int64
:uint64
:string
:uint16
:int8
:uint32
:keyword
:decimal
:uuid
:boolean
:object
:char
:uint8)Typeof & Typeof?
When you create a column, the underlying system will try to autodetect its type. We can see that here using the tcc/typeof function to check the type of a column:
(-> (tcc/column [1 2 3 4 5])
(tcc/typeof)):int64(-> (tcc/column [:a :b :c :d :e])
(tcc/typeof)):keywordColumns containing heterogenous data will receive type :object:
(-> (tcc/column [1 :b 3 :c 5])
(tcc/typeof)):objectYou can also use the tcc/typeof? function to check the value of a function as an asssertion:
(-> (tcc/column [1 2 3 4 6])
(tcc/typeof? :boolean))false(-> (tcc/column [1 2 3 4 6])
(tcc/typeof? :int64))trueTablecloth has a concept of “concrete” and “general” types. A general type is the broad category of type and the concrete type is the actual type in memory. For example, a concrete type is a 64-bit integer :int64, which is also of the general type :integer. The typeof? function supports checking both.
(-> (tcc/column [1 2 3 4 6])
(tcc/typeof? :int64))true(-> (tcc/column [1 2 3 4 6])
(tcc/typeof? :integer))trueColumn Access & Manipulation
Column Access
The method for accessing a particular index position in a column is the same as for Clojure vectors:
(-> (tcc/column [1 2 3 4 5])
(get 3))4(-> (tcc/column [1 2 3 4 5])
(nth 3))4Slice
You can also slice a column
(-> (tcc/column (range 10))
(tcc/slice 5))#tech.v3.dataset.column<int64>[5]
null
[5, 6, 7, 8, 9](-> (tcc/column (range 10))
(tcc/slice 1 4))#tech.v3.dataset.column<int64>[4]
null
[1, 2, 3, 4](-> (tcc/column (range 10))
(tcc/slice 0 9 2))#tech.v3.dataset.column<int64>[5]
null
[0, 2, 4, 6, 8]For clarity, the slice method supports the :end and :start keywords:
(-> (tcc/column (range 10))
(tcc/slice :start :end 2))#tech.v3.dataset.column<int64>[5]
null
[0, 2, 4, 6, 8]If you need to create a discontinuous subset of the column, you can use the select function. This method accepts an array of index positions or an array of booleans. When using boolean select, a true value will select the value at the index positions containing true values:
Select
Select the values at index positions 1 and 9:
(-> (tcc/column (range 10))
(tcc/select [1 9]))#tech.v3.dataset.column<int64>[2]
null
[1, 9]Select the values at index positions 0 and 2 using booelan select:
(-> (tcc/column (range 10))
(tcc/select (tcc/column [true false true])))#tech.v3.dataset.column<int64>[2]
null
[0, 2]Sort
Use sort-column to sort a column: Default sort is in ascending order:
(-> (tcc/column [:c :z :a :f])
(tcc/sort-column))#tech.v3.dataset.column<keyword>[4]
null
[:a, :c, :f, :z]You can provide the :desc and :asc keywords to change the default behavior:
(-> (tcc/column [:c :z :a :f])
(tcc/sort-column :desc))#tech.v3.dataset.column<keyword>[4]
null
[:z, :f, :c, :a]You can also provide a comparator fn:
(-> (tcc/column [{:position 2
:text "and then stopped"}
{:position 1
:text "I ran fast"}])
(tcc/sort-column (fn [a b] (< (:position a) (:position b)))))#tech.v3.dataset.column<persistent-map>[2]
null
[{:position 1, :text "I ran fast"}, {:position 2, :text "and then stopped"}]Column Operations
The Column API contains a large number of operations. These operations all take one or more columns as an argument, and they return either a scalar value or a new column, depending on the operations. These operations all take a column as the first argument so they are easy to use with the pipe -> macro, as with all functions in Tablecloth.
(def a (tcc/column [20 30 40 50]))(def b (tcc/column (range 4)))(tcc/- a b)#tech.v3.dataset.column<int64>[4]
null
[20, 29, 38, 47](tcc/pow a 2)#tech.v3.dataset.column<float64>[4]
null
[400.0, 900.0, 1600, 2500](tcc/* 10 (tcc/sin a))#tech.v3.dataset.column<float64>[4]
null
[9.129, -9.880, 7.451, -2.624](tcc/< a 35)#tech.v3.dataset.column<boolean>[4]
null
[true, true, false, false]All these operations take a column as their first argument and return a column, so they can be chained easily.
(-> a
(tcc/* b)
(tcc/< 70))#tech.v3.dataset.column<boolean>[4]
null
[true, true, false, false]source: notebooks/column_api.clj