using Pipe
missing, typeof(missing)
Arrays automatically create an appropriate union type
x = [1,2,missing,3]
1 |>ismissing, missing |> ismissing, x|>ismissing, x .|> ismissing
x |> eltype, x |> eltype |> nonmissingtype
missing comparisons produce missing
missing === missing, missing == missing, missing != missing, missing < missing
@pipe missing |> isequal(_,missing)
isequal(missing,missing)
1 == missing, 1 != missing, 1 < missing
missing is considered greater than any numeric value
missing¶map(x->x(missing),[sin,cos,zero,sqrt]) # part 1
map(x->x(missing,1),[+,-,*,/,div]) # part 2
using Statistics # needed for mean
map(x->x([1,2,missing]),[minimum,maximum,extrema,mean,float]) # part 3
[1,missing,2,missing] |> skipmissing |> collect
@time @pipe [1.0,missing,2.0,missing] |> replace(_,missing=>NaN)
Another way to do this,
# 첫번째 파라미터가 missing이 아니면 첫번째를 리턴하고 missing이면
# 두번째 파라미터를 리턴 한다.
@time @pipe [1.0, missing, 2.0, missing] .|> coalesce(_,NaN)
You can also use recode from CategoricalArrays.jl if you have a default output value.
using CategoricalArrays
@pipe [1.0,missing,2.0,missing] |> recode(_,0,missing=>1)
using DataFrames
df = DataFrame(a=[1,2,missing],b=["a","b",missing])
replace!(df.a,missing=>100)
df.b = @pipe df.b .|> coalesce(_,100)
df
You can use unique or levels to get unique values with or without missings, repectivery.
[1,missing,2,missing] |> unique
[1,missing,2,missing] |> levels
x = [1,2,3]
y = allowmissing(x)
push!(y,missing)
x = [1,2,3]
y = allowmissing(x)
z = disallowmissing(y)
push!(z,missing)
disallowmissing has error keyword argument that can be used to decide
how it should behave when it encounters a column that actually contains a missing value
@time df = allowmissing(DataFrame(ones(2,3),:auto))
@time df = @pipe ones(2,3) |> DataFrame(_,:auto) |> allowmissing
@time df = (allowmissing ∘ DataFrame)(ones(2,3),:auto)
df[1,1] = missing
disallowmissing(df) # an error is thrown
# column :x1 is left untouched as it contains missing
disallowmissing(df, error=false)
In this next example,we show that the type of each column in x is initially Int64. After using allowmissing! to accept missing values in column 1 and 3, the types of those columns become Union{Int64, Missing}.
x = DataFrame(rand(Int,2,3),:auto)
@pipe x |> eachcol .|> eltype |> println("Before : ",_)
@pipe x |> allowmissing!(_,1) # make first column accept missings
@pipe x |> allowmissing!(_,:x3) # make :x3 column accept missings
@pipe x |> eachcol .|> eltype |> println("After : ",_)
In this next example, we'll use completecase to find all the rows of a DataFrame that have complete data.
x = DataFrame(A=[1,missing,3,4], B=["A","B",missing,"C"])
@pipe x |> completecases |> println("Complete cases:\n",_)
We can use dropmissing or dropmissing! to remove the rows with incomplete data from a DataFrame and either create a new DataFrame or mutable the original in-place.
y = x |> dropmissing
x |> dropmissing!
;
x
y
x |> describe
Alternatively you can pass disallowmissing keyword argument to dropmissing and dropmissing!
x = DataFrame(A=[1,missing,3,4],B=["A","B",missing,"C"])
@pipe x |> dropmissing!(_,disallowmissing=false)
missing-aware¶If we have a function that does not handle missing values we can wrap it using passmissing function so that if any of its positional arguments is missing we will get a missing value in return. In the example below we change how string function behaves:
missing을 취급할 수 없는 함수를 wrap하여 입력값중에 missing이 있는 경우 missing을 리턴하도록 처리 할 수 있는 passmissing을 제공한다.
예를 들어 string의 경우 아래 처럼 재대로 missing을 처리 하지 못하는데 passmissing를 이용하여 string 함수를 wrap하여 missing을 처리 할 수 있게 한다.
string(missing)
@time string(missing," ", missing)
@time @pipe (missing, " ", missing)... |> string
@time @pipe (1,2,3)...|>string
@time string(1,2,3)
lift_string = passmissing(string)
missing |> lift_string
@pipe (missing," ",missing)... |> lift_string
lift_string(1,2,3)
df = DataFrame(a=[1,missing,missing], b=[1,2,missing])
If we just sum on the rows we get two missing entries:
@pipe df |> eachrow .|> sum
One can apply skipmissing on the rows to avoid this problem:
@pipe df |> eachrow .|> skipmissing .|> sum
However, we get an error. The problem is that the last row of df contains only missing values, and since eachrow is type unstable the eltype of the result of skipmissing is unknown
df의 마지막 row는 missing값만 두개가 있어,각 eachrow는 type unstable이기 때문에 skipmissing의 결과의 eltype이 unkown이다(그래서 Any 로 표시 된다)
@pipe df |> eachrow(_)[end] |> skipmissing
@pipe df |> eachrow(_)[end] |> skipmissing |> collect
마지막 row를 제외하면 잘 나오는것을 확인 할 수 있다.
@pipe df[1:2,:] |> eachrow .|> skipmissing .|> sum
In such case it is useful to switch to Tables.namedtupleiterator which is type stable as discussed in 01_constructors.ipynb notebook
@pipe df |> Tables.namedtupleiterator |> collect
@pipe df |> Tables.namedtupleiterator .|> skipmissing .|> sum