using DataFrames
x = DataFrame(A=[1,2],B=[1.0,missing], C= ["a","b"])
size(x,1)
: row의 크기size(x,2)
: column의 크기size(x), size(x,1),size(x,2)
nrow(x), ncol(x)
describe(x)
describe(x,cols=1:2)
names(x)
names(x,String)
names(x,Number)
propertynames(x)
eltype.(eachcol(x))
y = DataFrame(rand(1:10,1000,10),:auto)
first(y,2)
df = DataFrame(rand(100,100),:auto)
we can see that 92 of its columns were not printed. Also we get its first 30 rows. You can easily change this behavior by changing the value of ENV["LINES"] and ENV["COLUMNS"].
ENV["LINES"] = 10
ENV["COLUMNS"] = 4000
df
x
# all get the vector stored in our DataFrame without copying it
x.A, x[!,1],x[!,:A]
# the same using string indexing
x."A",x[!,"A"]
# note that this creates a copy
x[:,1]
x[!,2] === x[!,2]
x[!,1] === x[:,1],x[!,1] == x[:,1]
x[1:1,:]
# this produces a DataFrameRow which is treated as 1-dimensional object similar to a NamedTuple
x[1,:]
x[1,1]
x[1:2,2]
#You can also use Regex to select columns and
# Not from InvertedIndices.jl both to select rows and columns
# 첫번째 row가 1이 아니고 column명이 첫번째 글자가 "A"로 시작하는 것을 선택
# r은 정규식을 나타냄
x[Not(1),r"^A"]
# ! indicates that underlying column are not copied
# 1번column을 제외하고 표시
x[!,Not(1)]
x[:,Not(1)] # means that the columns will get copied
Assignment of a scalar to a data frame can be done in ranges using broadcasting:
x[1:2,1:2]
x[1:2,1:2] .= 2
x
Assignment of a vector of length equal to the nummber of assigned rows using broadcasting
x[1:2,1:2] .= [1,2]
x
Assignment or of another data frame of matching size and column names, again using broadcasting
x[1:2,1:2] .= DataFrame([5 6; 7 8], [:A, :B])
x = DataFrame(rand(4,5),:auto)
select(df,Between(:x2,:x4);copycols=false)
x[!,Between(:x2,:x4)]
x[:,Cols("x1",Between("x2","x4"))]
You can simply create a view of a DataFrame
(it is more efficient than creating a materialized selection). Here are the possible return value options.
vdf01 = @view x[1:2,1]
vdf01 === view(x,1:2,1)
vdf02 = @view x[1,1:2]
vdf02 === x[1,1:2]
vdf03 = @view x[1:2,1:2]
vdf03 === view(x,1:2,1:2)
df = DataFrame()
using setproperty!
x = [1,2,3]
df.a = x
df
df.a === x # no copy is performed
x[2] = 10
df
using setindex!
df[!,:b] = x
df[:,:c] = x
df
df.b === x # no copy
df.c === x # copy
df[!,:d] .= x
df[:,:e] .= x
df
both copy, so in this case !
and :
has the same effect
df.d === x, df.e === x
note that in our data frame columns :a
and :b
store the vector x
(not a copy)
df.a === df.b === x
This can lead to silent errors. For example this code leads to a bug
(note that calling pairs
on eachcol(df)
creates an iterator of (column name, column) pairs):
for (n,c) in pairs(eachcol(df))
@show n,c
end
for (n,c) in pairs(eachcol(df))
println("$n:", pop!(c))
end
note that for column :b
we printed 10 as 3 was removed from it when used pop!
on column :a
Such mistakes somtimes happen. Because of this DataFrame.jl performs consistency check before doing an expensive operation(most notably before showing a data frame)
df
We can investigate the columns to find out what happend:
collect(pairs(eachcol(df)))
The output confirms that the data frame df
got corrupted.
DataFrames.jl supports a complete getindex,getproperty,setindex!, setproperty!, view
, broadcasting, and broadcasting assignment operations.
The details are explained here: (http://juliadata.github.io/DataFrames.jl/latest/lib/indexing/.)
df = DataFrame(rand(2,3),:auto)
df2 = copy(df)
df === df2, df==df2
create a minimally different data frame and use isapprox
for comparison
df3 = df .+ eps()
df == df3
isapprox(df,df3)
df ≈ df3
isapprox(df,df3,atol = eps()/2)
missings
are handled as in Julia Base
df = DataFrame(a=missing)
df == df
df === df
isequal(df,df)