Introduction to DataFrames¶

DataFrame v1.2, Julia 1.6.1

Reshaping DataFrames¶

Wide to long¶

using DataFrames,Pipe

x = DataFrame(id=[1,2,3,4], id2=[1,1,2,2], M1 = [11,12,13,14], M2 = [111,112,113,114])

Stack a data frame x, i.e. convert it from wide to long format. Stack(df, [아래로 펼칠 컬럼], 표시할 컬럼)

아래 예에서는 M1,M2 variable과 value를 아래로 표시하고 추가적으로 표시할 컬럼 id를 지정 했음

stack(x,[:M1,:M2],:id) # first pass measure variables and then id-variable

add view=true arguement to make a view; in that case columns of the resulting data frame share memory with columns of the source data frame, so the operation is potentially unsafe.

# optionally you can rename columns
stack(x,["M1","M2"], "id", variable_name="key", value_name="observed", 
  view=true)

if second argument is omitted in stack, all other columns are assumed to be the id-variables

stack(x,[:M1,:M2])

stack(x,Not([:id,:id2]))

stack(x,Not([1,2]))

x = DataFrame(id=[1,1,1],id2=['a','b','c'],a1=rand(3),a2=rand())

x = DataFrame(id=[1,1,1],id2=['a','b','c'],a1=rand(3),a2=rand(3))

if `stack` is not passed any measure variables by default numeric variables are selected as measures

stack(x)

here all columns are treated as measures:

stack(DataFrame(rand(3,2),:auto))

df = DataFrame(rand(3,2),:auto)

df.key = [1,1,1]

3-element Vector{Int64}:
 1
 1
 1

df

mdf = stack(df) # duplicates in key are silenetly accepted

Long to wide¶

x = DataFrame(id=[1,1,1],id2='a':'c',a1=rand(3),a2 = rand(3))

y = stack(x)

unstack(y,:id2,:variable,:value)

unstack(y,:variable,:value) # all other columns are treated as keys

all columns other than named :variable an :value are treated as keys

unstack(y)

# you can rename the unstacked columns
unstack(y,renamecols=n->string("unstacked_",n))

df = stack(DataFrame(rand(3,2),:auto))

unable to unstack when no key column is presented

unstack(df)

ArgumentError: No key column found

Stacktrace:
 [1] unstack(df::DataFrame, rowkeys::InvertedIndex{InvertedIndices.TupleVector{Tuple{Int64, Int64}}}, colkey::Int64, value::Int64; renamecols::Function, allowmissing::Bool, allowduplicates::Bool)
   @ DataFrames ~/.julia/packages/DataFrames/vuMM8/src/abstractdataframe/reshape.jl:342
 [2] unstack(df::DataFrame, colkey::Symbol, value::Symbol; renamecols::Function, allowmissing::Bool, allowduplicates::Bool)
   @ DataFrames ~/.julia/packages/DataFrames/vuMM8/src/abstractdataframe/reshape.jl:355
 [3] #unstack#510
   @ ~/.julia/packages/DataFrames/vuMM8/src/abstractdataframe/reshape.jl:361 [inlined]
 [4] unstack(df::DataFrame)
   @ DataFrames ~/.julia/packages/DataFrames/vuMM8/src/abstractdataframe/reshape.jl:361
 [5] top-level scope
   @ In[49]:1
 [6] eval
   @ ./boot.jl:360 [inlined]
 [7] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
   @ Base ./loading.jl:1094

	id	id2	a1	a2
	Int64	Char	Float64	Float64
1	1	a	0.561598	0.736461
2	1	b	0.133463	0.736461
3	1	c	0.441498	0.736461

	id	id2	a1	a2
	Int64	Char	Float64	Float64
1	1	a	0.489998	0.824482
2	1	b	0.706125	0.221454
3	1	c	0.285326	0.564364

	variable	value
	String	Float64
1	x1	0.738212
2	x1	0.936584
3	x1	0.374822
4	x2	0.178436
5	x2	0.13522
6	x2	0.93944

	x1	x2
	Float64	Float64
1	0.896134	0.946699
2	0.920383	0.776507
3	0.731829	0.0190809

	x1	x2	key
	Float64	Float64	Int64
1	0.896134	0.946699	1
2	0.920383	0.776507	1
3	0.731829	0.0190809	1

	id2	a1	a2
	Char	Float64?	Float64?
1	a	0.373428	0.566582
2	b	0.255027	0.706737
3	c	0.432603	0.0557234

	variable	value
	String	Float64
1	x1	0.257075
2	x1	0.489404
3	x1	0.957308
4	x2	0.496959
5	x2	0.939764
6	x2	0.483036

	id	key	observed
	Int64	String	Int64
1	1	M1	11
2	2	M1	12
3	3	M1	13
4	4	M1	14
5	1	M2	111
6	2	M2	112
7	3	M2	113
8	4	M2	114