using Pipe
missing, typeof(missing)
Arrays automatically create an appropriate union type
x = [1,2,missing,3]
1 |>ismissing, missing |> ismissing, x|>ismissing, x .|> ismissing
x |> eltype, x |> eltype |> nonmissingtype
missing
comparisons produce missing
missing === missing, missing == missing, missing != missing, missing < missing
@pipe missing |> isequal(_,missing)
isequal(missing,missing)
1 == missing, 1 != missing, 1 < missing
missing
is considered greater than any numeric value
missing
¶map(x->x(missing),[sin,cos,zero,sqrt]) # part 1
map(x->x(missing,1),[+,-,*,/,div]) # part 2
using Statistics # needed for mean
map(x->x([1,2,missing]),[minimum,maximum,extrema,mean,float]) # part 3
[1,missing,2,missing] |> skipmissing |> collect
@time @pipe [1.0,missing,2.0,missing] |> replace(_,missing=>NaN)
Another way to do this,
# 첫번째 파라미터가 missing이 아니면 첫번째를 리턴하고 missing이면
# 두번째 파라미터를 리턴 한다.
@time @pipe [1.0, missing, 2.0, missing] .|> coalesce(_,NaN)
You can also use recode
from CategoricalArrays.jl
if you have a default output value.
using CategoricalArrays
@pipe [1.0,missing,2.0,missing] |> recode(_,0,missing=>1)
using DataFrames
df = DataFrame(a=[1,2,missing],b=["a","b",missing])
replace!(df.a,missing=>100)
df.b = @pipe df.b .|> coalesce(_,100)
df
You can use unique
or levels
to get unique values with or without missings, repectivery.
[1,missing,2,missing] |> unique
[1,missing,2,missing] |> levels
x = [1,2,3]
y = allowmissing(x)
push!(y,missing)
x = [1,2,3]
y = allowmissing(x)
z = disallowmissing(y)
push!(z,missing)
disallowmissing
has error
keyword argument that can be used to decide
how it should behave when it encounters a column that actually contains a missing
value
@time df = allowmissing(DataFrame(ones(2,3),:auto))
@time df = @pipe ones(2,3) |> DataFrame(_,:auto) |> allowmissing
@time df = (allowmissing ∘ DataFrame)(ones(2,3),:auto)
df[1,1] = missing
disallowmissing(df) # an error is thrown
# column :x1 is left untouched as it contains missing
disallowmissing(df, error=false)
In this next example,we show that the type of each column in x is initially Int64
. After using allowmissing!
to accept missing values in column 1 and 3, the types of those columns become Union{Int64, Missing}
.
x = DataFrame(rand(Int,2,3),:auto)
@pipe x |> eachcol .|> eltype |> println("Before : ",_)
@pipe x |> allowmissing!(_,1) # make first column accept missings
@pipe x |> allowmissing!(_,:x3) # make :x3 column accept missings
@pipe x |> eachcol .|> eltype |> println("After : ",_)
In this next example, we'll use completecase
to find all the rows of a DataFrame
that have complete data.
x = DataFrame(A=[1,missing,3,4], B=["A","B",missing,"C"])
@pipe x |> completecases |> println("Complete cases:\n",_)
We can use dropmissing
or dropmissing!
to remove the rows with incomplete data from a DataFrame
and either create a new DataFrame
or mutable the original in-place.
y = x |> dropmissing
x |> dropmissing!
;
x
y
x |> describe
Alternatively you can pass disallowmissing
keyword argument to dropmissing
and dropmissing!
x = DataFrame(A=[1,missing,3,4],B=["A","B",missing,"C"])
@pipe x |> dropmissing!(_,disallowmissing=false)
missing
-aware¶If we have a function that does not handle missing
values we can wrap it using passmissing
function so that if any of its positional arguments is missing we will get a missing
value in return. In the example below we change how string
function behaves:
missing을 취급할 수 없는 함수를 wrap하여 입력값중에 missing이 있는 경우 missing을 리턴하도록 처리 할 수 있는 passmissing
을 제공한다.
예를 들어 string의 경우 아래 처럼 재대로 missing을 처리 하지 못하는데 passmissing
를 이용하여 string
함수를 wrap하여 missing을 처리 할 수 있게 한다.
string(missing)
@time string(missing," ", missing)
@time @pipe (missing, " ", missing)... |> string
@time @pipe (1,2,3)...|>string
@time string(1,2,3)
lift_string = passmissing(string)
missing |> lift_string
@pipe (missing," ",missing)... |> lift_string
lift_string(1,2,3)
df = DataFrame(a=[1,missing,missing], b=[1,2,missing])
If we just sum
on the rows we get two missing entries:
@pipe df |> eachrow .|> sum
One can apply skipmissing
on the rows to avoid this problem:
@pipe df |> eachrow .|> skipmissing .|> sum
However, we get an error. The problem is that the last row of df
contains only missing values, and since eachrow
is type unstable the eltype
of the result of skipmissing
is unknown
df의 마지막 row는 missing값만 두개가 있어,각 eachrow는 type unstable이기 때문에 skipmissing의 결과의 eltype이 unkown이다(그래서 Any 로 표시 된다)
@pipe df |> eachrow(_)[end] |> skipmissing
@pipe df |> eachrow(_)[end] |> skipmissing |> collect
마지막 row를 제외하면 잘 나오는것을 확인 할 수 있다.
@pipe df[1:2,:] |> eachrow .|> skipmissing .|> sum
In such case it is useful to switch to Tables.namedtupleiterator
which is type stable as discussed in 01_constructors.ipynb notebook
@pipe df |> Tables.namedtupleiterator |> collect
@pipe df |> Tables.namedtupleiterator .|> skipmissing .|> sum