Introduction to DataFrames

02_basicinfo

In [2]:
using DataFrames

Getting basic information about a data frame

In [3]:
x = DataFrame(A=[1,2],B=[1.0,missing], C= ["a","b"])
Out[3]:

2 rows × 3 columns

ABC
Int64Float64?String
111.0a
22missingb
  • size(x,1) : row의 크기
  • size(x,2) : column의 크기
In [11]:
size(x), size(x,1),size(x,2)
Out[11]:
((2, 3), 2, 3)
In [12]:
nrow(x), ncol(x)
Out[12]:
(2, 3)
In [13]:
describe(x)
Out[13]:

3 rows × 7 columns

variablemeanminmedianmaxnmissingeltype
SymbolUnion…AnyUnion…AnyInt64Type
1A1.511.520Int64
2B1.01.01.01.01Union{Missing, Float64}
3Cab0String
In [14]:
describe(x,cols=1:2)
Out[14]:

2 rows × 7 columns

variablemeanminmedianmaxnmissingeltype
SymbolFloat64RealFloat64RealInt64Type
1A1.511.520Int64
2B1.01.01.01.01Union{Missing, Float64}
In [16]:
names(x)
Out[16]:
3-element Vector{String}:
 "A"
 "B"
 "C"
In [25]:
names(x,String)
Out[25]:
1-element Vector{String}:
 "C"
In [36]:
names(x,Number)
Out[36]:
1-element Vector{String}:
 "A"
In [26]:
propertynames(x)
Out[26]:
3-element Vector{Symbol}:
 :A
 :B
 :C
In [37]:
eltype.(eachcol(x))
Out[37]:
3-element Vector{Type}:
 Int64
 Union{Missing, Float64}
 String
In [38]:
y = DataFrame(rand(1:10,1000,10),:auto)
Out[38]:

1,000 rows × 10 columns

x1x2x3x4x5x6x7x8x9x10
Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64
18876354456
2329510537910
37534516127
463581010106410
56756889712
610135969742
76104128108610
865596281071
915347645101
104841278726
1163169331071
1236364882310
1384910141846
149101065108997
1549871945410
1683584102844
17102938241083
1837229257910
192918389933
206768836985
2110919515362
22101519104532
236447635256
249691341946
256465981146
26861011551017
2710169196484
287674324752
2983665221105
301621261127
In [39]:
first(y,2)
Out[39]:

2 rows × 10 columns

x1x2x3x4x5x6x7x8x9x10
Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64
18876354456
2329510537910

Displaying large data frames

In [40]:
df = DataFrame(rand(100,100),:auto)
Out[40]:

100 rows × 100 columns (omitted printing of 92 columns)

x1x2x3x4x5x6x7x8
Float64Float64Float64Float64Float64Float64Float64Float64
10.9572630.4552240.9451150.7054330.4459050.4561180.4620580.148022
20.9449340.4843890.1238860.6271740.596160.5554420.4764390.592309
30.04589190.7242650.316050.8921370.3283440.4043160.5974670.417723
40.523040.2986060.6361310.7227540.2710110.09638640.7871560.297574
50.9800640.9722110.04287850.9454720.6153990.4247440.5258450.41345
60.2981370.2927720.1411310.4954310.258790.7765570.5162550.524106
70.2426640.7841540.8923050.691720.3294770.1602720.8440560.230043
80.1859340.9928420.5242440.6430480.09429260.1458440.3530550.257157
90.04158580.1515590.3493590.8448660.007283190.2175360.7935050.0120244
100.2532710.5670790.1902260.05523810.8605860.9035410.2018930.326234
110.2536050.2301540.2564340.2111130.3234950.1670410.1285250.786461
120.7640350.9872420.9666220.4574370.8604130.6404880.5056920.436351
130.7679150.1474050.1259660.8718920.4813220.8405010.9395390.93401
140.05864350.9250440.810440.5669680.05318590.3215820.1827180.097922
150.5438390.8084720.4750410.857890.7201570.03914430.9379560.327706
160.9117890.7144040.6816150.6098580.6355940.8908260.4144080.475618
170.9017660.5922330.1522740.3037170.2678280.1888240.5554830.71353
180.9906390.5341130.2005180.5865440.6383370.8124310.7800580.374515
190.06399830.8590260.4220.6173540.5194880.6970550.2098790.00593683
200.7495320.4448410.8883960.7473340.2082640.5894410.3759460.0368572
210.5882270.8100140.1864750.678160.5636310.313840.9640040.332566
220.1025430.9480780.3836980.5183220.3066410.740190.3036410.285749
230.276350.1041220.5061550.7531030.1343520.5397880.2921530.673093
240.8953740.6230010.3069890.02790470.3789220.1883220.09296440.354991
250.6501870.5995340.6465810.2985680.6766950.8227980.9256780.408842
260.263480.6238110.6510660.1021440.1036280.6578030.7334330.447902
270.5230920.08782090.6709930.4064410.8222140.6882950.9644380.30442
280.06934890.496710.3392730.08274520.6614210.8225030.3371650.668738
290.1870840.7260020.9971980.3069420.04820720.4580630.5781820.959805
300.2995910.5516330.8743240.7967360.1308520.9712250.1880070.835433

we can see that 92 of its columns were not printed. Also we get its first 30 rows. You can easily change this behavior by changing the value of ENV["LINES"] and ENV["COLUMNS"].

In [41]:
ENV["LINES"] = 10
Out[41]:
10
In [46]:
ENV["COLUMNS"] = 4000
Out[46]:
4000
In [47]:
df
Out[47]:

100 rows × 100 columns

x1x2x3x4x5x6x7x8x9x10x11x12x13x14x15x16x17x18x19x20x21x22x23x24x25x26x27x28x29x30x31x32x33x34x35x36x37x38x39x40x41x42x43x44x45x46x47x48x49x50x51x52x53x54x55x56x57x58x59x60x61x62x63x64x65x66x67x68x69x70x71x72x73x74x75x76x77x78x79x80x81x82x83x84x85x86x87x88x89x90x91x92x93x94x95x96x97x98x99x100
Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64
10.9572630.4552240.9451150.7054330.4459050.4561180.4620580.1480220.3501180.538650.8325370.2826650.2644350.3596530.8423630.3798210.09773770.1409350.147520.7000220.8348590.1113690.5979850.6814180.7842520.1500570.2937710.1563720.6747750.4882740.2397640.692760.536190.1911350.3774510.4528290.03202120.5621980.7527690.2302490.6783750.4600180.6080190.7893670.5847640.2402180.6540430.1922630.8171530.02298790.6194490.4336060.6748310.4859040.8989350.4227820.6249970.9911010.07516510.3766870.1248080.8676070.5984420.5970490.3852860.6681910.4071930.5948160.06869240.6209580.9524940.3558480.140170.174960.2760550.6925230.4149770.8271120.4036330.1672510.5929470.09061760.944960.6099560.599330.9002440.6775270.2655890.05946310.8220360.2915730.1172850.5799360.4611840.4610890.6198030.7839790.2564490.9892090.391823
20.9449340.4843890.1238860.6271740.596160.5554420.4764390.5923090.6554010.62020.9961350.7460430.3772170.9696710.09900220.06108860.2105120.08910570.01583170.7434650.5503510.9816850.890410.6096070.1314340.722890.3993840.3844910.1691960.3079980.5389960.7513460.7381320.8821660.8891580.2470480.5059960.05196910.5770990.424870.736450.785740.816210.4139580.4685880.9383560.3687390.3240120.3558380.6977050.3280320.1430590.003035560.7360840.61580.995870.7504590.6368590.6405420.5968680.4042880.4375950.3909920.3861550.3612460.3774560.1851340.8571370.9865080.6255950.297820.0917840.376120.7040570.1417160.8128780.8357820.2413720.2400170.8494820.4346420.06438110.4420710.4213580.5610480.1218440.4391110.6534810.6662150.5125440.9562040.9308390.5151190.07918690.3058920.4408310.7393350.4341380.5921170.88691
30.04589190.7242650.316050.8921370.3283440.4043160.5974670.4177230.9657170.1686970.277920.1055020.3096690.1633020.8972340.6571630.4120330.6689840.9781020.9209420.1800480.2227530.1772140.4592030.5457310.9570460.3478040.8792080.7182650.2616140.8255190.6721280.4052410.9465570.2986950.0612580.3358890.4071610.7201340.2886260.5111530.8892440.1636790.3628780.4481870.232140.6809590.1950520.7667330.1835610.9767690.03576380.815860.4721550.7427410.1205090.1469060.4079660.04978910.63610.4914150.3699790.3168870.3072440.9921130.627620.7446890.4564560.004546720.9096020.947260.5338270.3771360.5941260.985490.9410280.8831390.7633920.40150.06686220.5608130.9850420.5721310.8740990.2797710.3447640.1791090.4733330.5994340.4935930.4268930.1581130.6157540.3519990.8823340.1798270.5833770.6657850.9511220.689684
40.523040.2986060.6361310.7227540.2710110.09638640.7871560.2975740.07630240.8057260.6882130.787320.7529730.1286980.9059490.2786830.7853350.2495080.2667220.9882040.3645540.07450060.2204720.7515410.5529260.9510890.3907780.4376120.2071740.5774110.471020.07383280.3389340.4371230.3590460.1542740.7009050.9796670.4572020.9841630.356890.6040520.5748890.004950070.8392360.6682780.6936330.5345680.9701470.397460.988030.9690480.2920570.2621130.2570910.4396670.201760.2472090.1785390.06335160.5711040.7442010.09549480.2573980.3993320.6579090.6779870.4895250.02377480.5003460.9039880.8018410.5084160.6308190.5848350.7903140.06357540.9765940.1028830.8523140.4042280.6517320.1937160.2643970.3369870.5722430.3321640.7027250.683620.2018960.1437560.8812240.7337840.3555910.2560030.2604860.6790820.04649220.6743050.829941
50.9800640.9722110.04287850.9454720.6153990.4247440.5258450.413450.4121680.7057870.3235850.7613730.6298930.973060.6788260.3865160.7679670.6876210.9131520.04082170.5804910.7635390.4427120.2510130.4068150.1559970.6972860.7471480.8773450.7905520.2982880.4516590.7651120.37390.8287870.8360380.4393240.5549660.06597890.8122440.516210.7768340.5353430.5077790.7407970.2760810.6211520.4095360.9299260.1460020.1424010.953760.5849590.4238760.04580440.157860.9502320.438050.9313850.1522050.9316920.8702820.8596230.8251240.3870970.6541380.72680.3703720.3648190.1327230.7912950.5191770.8652530.2613820.3278070.06088870.5332840.1516280.4072450.480270.7805930.7939710.3844190.5554660.5149880.4608310.3923370.9745480.5067370.6785910.889820.07390520.3755630.2407850.3929850.6778720.8097080.5237150.09314880.70664
60.2981370.2927720.1411310.4954310.258790.7765570.5162550.5241060.8231440.7511510.7787530.8742230.4518850.8550050.81080.3459350.2896640.9363650.2220290.5586820.4432690.633510.7020490.8313080.1771450.548480.6717740.952570.07739570.980890.2156380.2162050.1064330.8738330.4113230.8073930.3751870.1307090.08864240.8637510.06639650.4626440.3770570.8262880.6252440.1065170.08813590.5089680.03210160.9997420.05735640.2711920.01992110.8669040.8917790.9978180.6744480.1826910.571480.4407940.9779280.46410.04233840.2584340.006557990.642990.4511680.08012230.7144430.3732070.1163650.9448420.3522860.6480780.9181990.2217060.3840090.3136070.8999240.3164460.4622090.546750.9595150.06156960.9471040.3866610.9216650.8453810.86350.6671170.5554450.0669210.5663550.00979260.8154140.8068190.5497010.7210630.5427940.0920968
70.2426640.7841540.8923050.691720.3294770.1602720.8440560.2300430.9518790.1966140.5365450.4636290.5651040.7506310.5041060.4183150.482420.3813830.4019970.7267530.6992030.5650150.04498660.5521430.5999090.1799110.5948360.3397020.8872370.01719610.2506420.5678370.2945190.6912980.7569240.2851240.5891030.8763390.07900860.6848690.4713560.4375160.7145160.1033210.3675960.4833810.4892450.2752490.3797340.4009710.5082550.8159910.3594690.3763660.5417860.1756080.3203020.50520.7570770.5828830.6570920.1472180.7830990.8610360.4124670.4412860.1883210.5173940.5183630.5913730.9616070.1384840.006453530.574170.08989440.7234940.3307420.1604860.1258590.4919460.9449440.08945180.3784570.8974020.8455240.1286080.7133590.1208790.6384030.1143510.05543180.4299810.2943670.1385670.4527140.9390.5178540.7983030.8029540.955417
80.1859340.9928420.5242440.6430480.09429260.1458440.3530550.2571570.01130040.8442760.5999670.8368240.783620.5693230.8431670.5262750.5049390.08978080.4648470.9967740.7416610.121480.3876060.6164690.6867550.7143980.7544380.07043640.1472230.5036750.9077250.4109910.8445120.158060.3050780.1211150.9764580.853110.9039620.4287130.733290.6275970.0129490.7061890.9004940.2383980.3097830.2191040.5468240.8252270.03700790.7228470.2067810.0984840.9286310.9805790.5020520.1655660.06296920.09773240.9918430.3068210.2266870.6275850.8387950.4435260.05680370.5464630.940190.4012950.1892610.5707230.9806830.9890640.863190.4154840.3072140.3873370.2111140.6529230.7478890.9551580.521710.287840.1822020.3939510.5100080.5641620.01936810.3237170.9694070.8532790.01604920.1146680.3022610.6505510.2906490.2229350.3752990.833172
90.04158580.1515590.3493590.8448660.007283190.2175360.7935050.01202440.8104340.05207520.7634850.8928740.1628830.04242020.5004670.9011630.9035750.7578420.8592540.8902010.5398560.8198060.9079840.6552170.953670.1852280.1013340.2871620.300780.6910240.5682030.4098980.6282460.5452740.5389550.3653360.6413680.01263770.8462070.2868330.5074650.9254450.8161840.8045080.3509420.3303160.354450.9356080.6087220.6587710.1934910.4109740.9121480.4539220.347780.9062020.7404350.5585330.1724420.08014580.9824220.5272420.1172750.4079660.554910.8027560.1000460.7451640.5549650.7793150.552120.9504380.04019640.2155190.03898160.3707920.9607170.5784830.8946210.4612240.2867350.185880.8058310.4667910.285170.4624650.1928890.3593610.1085930.2875250.9940240.2864530.9599720.1145690.8433290.3407870.7345390.5963310.103320.150316
100.2532710.5670790.1902260.05523810.8605860.9035410.2018930.3262340.8125550.0002031440.07998740.6484140.3439930.8949960.9654650.3199850.007859620.3995470.3926780.031010.02873420.1566830.4792550.8396950.01045890.8592640.1198740.4591010.6325740.937350.2500690.7758060.3346170.009234440.985310.1468110.4173480.3631070.2909940.5916130.6987070.3245470.6458710.09913620.4542650.7796370.9031610.2776530.2300550.3256280.283140.06018110.2553190.5142130.8902020.975850.1990810.4555610.2974820.4800770.7600580.3797160.8459690.1619320.1668050.3005760.9121560.5725890.2350770.0221340.7505760.2963210.4684030.09958060.4729260.1235740.7493470.04237990.302810.3010730.6294090.5558680.2548990.8762050.5039720.05008290.5853390.526490.9138770.8704840.9109250.5112880.7763970.2947190.5414190.2256410.798530.9586220.5937160.904527

Most elementary get and set operations

In [48]:
x
Out[48]:

2 rows × 3 columns

ABC
Int64Float64?String
111.0a
22missingb
In [50]:
# all get the vector stored in our DataFrame without copying it
x.A, x[!,1],x[!,:A]
Out[50]:
([1, 2], [1, 2], [1, 2])
In [52]:
# the same using string indexing
x."A",x[!,"A"]
Out[52]:
([1, 2], [1, 2])
In [53]:
# note that this creates a copy
x[:,1]
Out[53]:
2-element Vector{Int64}:
 1
 2
In [56]:
x[!,2] === x[!,2]
Out[56]:
true
In [59]:
x[!,1] === x[:,1],x[!,1] == x[:,1]
Out[59]:
(false, true)
In [64]:
x[1:1,:]
Out[64]:

1 rows × 3 columns

ABC
Int64Float64?String
111.0a
In [65]:
# this produces a DataFrameRow which is treated as 1-dimensional object similar to a NamedTuple
x[1,:]
Out[65]:

DataFrameRow (3 columns)

ABC
Int64Float64?String
111.0a
In [66]:
x[1,1]
Out[66]:
1
In [67]:
x[1:2,2]
Out[67]:
2-element Vector{Union{Missing, Float64}}:
 1.0
  missing
In [85]:
#You can also use Regex to select columns and 
# Not from InvertedIndices.jl both to select rows and columns
# 첫번째 row가 1이 아니고 column명이 첫번째 글자가 "A"로 시작하는 것을 선택
# r은 정규식을 나타냄
x[Not(1),r"^A"]
Out[85]:

1 rows × 1 columns

A
Int64
12
In [87]:
# ! indicates that underlying column are not copied
# 1번column을 제외하고 표시
x[!,Not(1)]
Out[87]:

2 rows × 2 columns

BC
Float64?String
11.0a
2missingb
In [88]:
x[:,Not(1)] # means that the columns will get copied
Out[88]:

2 rows × 2 columns

BC
Float64?String
11.0a
2missingb

Assignment of a scalar to a data frame can be done in ranges using broadcasting:

In [89]:
x[1:2,1:2]
Out[89]:

2 rows × 2 columns

AB
Int64Float64?
111.0
22missing
In [91]:
x[1:2,1:2] .= 2
Out[91]:

2 rows × 2 columns

AB
Int64Float64?
122.0
222.0
In [92]:
x
Out[92]:

2 rows × 3 columns

ABC
Int64Float64?String
122.0a
222.0b

Assignment of a vector of length equal to the nummber of assigned rows using broadcasting

In [94]:
x[1:2,1:2] .= [1,2]
x
Out[94]:

2 rows × 3 columns

ABC
Int64Float64?String
111.0a
222.0b

Assignment or of another data frame of matching size and column names, again using broadcasting

In [98]:
x[1:2,1:2] .= DataFrame([5 6; 7 8], [:A, :B])
Out[98]:

2 rows × 2 columns

AB
Int64Float64?
156.0
278.0
In [99]:
x = DataFrame(rand(4,5),:auto)
Out[99]:

4 rows × 5 columns

x1x2x3x4x5
Float64Float64Float64Float64Float64
10.6666970.9705750.4443330.4761910.951002
20.6480250.5962670.4092280.9311820.0815248
30.05856790.0735780.171170.3566490.724832
40.9055920.5324940.6950470.3040770.610859
In [102]:
select(df,Between(:x2,:x4);copycols=false)
Out[102]:

100 rows × 3 columns

x2x3x4
Float64Float64Float64
10.4552240.9451150.705433
20.4843890.1238860.627174
30.7242650.316050.892137
40.2986060.6361310.722754
50.9722110.04287850.945472
60.2927720.1411310.495431
70.7841540.8923050.69172
80.9928420.5242440.643048
90.1515590.3493590.844866
100.5670790.1902260.0552381
In [112]:
x[!,Between(:x2,:x4)]
Out[112]:

4 rows × 3 columns

x2x3x4
Float64Float64Float64
10.9705750.4443330.476191
20.5962670.4092280.931182
30.0735780.171170.356649
40.5324940.6950470.304077
In [114]:
x[:,Cols("x1",Between("x2","x4"))]
Out[114]:

4 rows × 4 columns

x1x2x3x4
Float64Float64Float64Float64
10.6666970.9705750.4443330.476191
20.6480250.5962670.4092280.931182
30.05856790.0735780.171170.356649
40.9055920.5324940.6950470.304077

Views

You can simply create a view of a DataFrame(it is more efficient than creating a materialized selection). Here are the possible return value options.

In [125]:
vdf01 = @view x[1:2,1]
Out[125]:
2-element view(::Vector{Float64}, 1:2) with eltype Float64:
 0.6666972907738189
 0.6480247676604174
In [126]:
vdf01 === view(x,1:2,1)
Out[126]:
true
In [127]:
vdf02 = @view x[1,1:2]
Out[127]:

DataFrameRow (2 columns)

x1x2
Float64Float64
10.6666970.970575
In [128]:
vdf02 === x[1,1:2]
Out[128]:
true
In [131]:
vdf03 = @view x[1:2,1:2]
Out[131]:

2 rows × 2 columns

x1x2
Float64Float64
10.6666970.970575
20.6480250.596267
In [132]:
vdf03 === view(x,1:2,1:2)
Out[132]:
true

Adding new columns to a data frame

In [133]:
df = DataFrame()
Out[133]:

0 rows × 0 columns

using setproperty!

In [134]:
x = [1,2,3]
df.a = x
df
Out[134]:

3 rows × 1 columns

a
Int64
11
22
33
In [138]:
df.a === x # no copy is performed
Out[138]:
true
In [136]:
x[2] = 10
Out[136]:
10
In [137]:
df
Out[137]:

3 rows × 1 columns

a
Int64
11
210
33

using setindex!

In [139]:
df[!,:b] = x
df[:,:c] = x
df
Out[139]:

3 rows × 3 columns

abc
Int64Int64Int64
1111
2101010
3333
In [140]:
df.b === x # no copy
Out[140]:
true
In [141]:
df.c === x # copy
Out[141]:
false
In [143]:
df[!,:d] .= x
df[:,:e] .= x
df
Out[143]:

3 rows × 5 columns

abcde
Int64Int64Int64Int64Int64
111111
21010101010
333333

both copy, so in this case ! and : has the same effect

In [144]:
df.d === x, df.e === x
Out[144]:
(false, false)

note that in our data frame columns :a and :b store the vector x (not a copy)

In [145]:
df.a === df.b === x
Out[145]:
true

This can lead to silent errors. For example this code leads to a bug (note that calling pairs on eachcol(df) creates an iterator of (column name, column) pairs):

In [146]:
for (n,c) in pairs(eachcol(df))
  @show n,c
end
(n, c) = (:a, [1, 10, 3])
(n, c) = (:b, [1, 10, 3])
(n, c) = (:c, [1, 10, 3])
(n, c) = (:d, [1, 10, 3])
(n, c) = (:e, [1, 10, 3])
In [147]:
for (n,c) in pairs(eachcol(df))
  println("$n:", pop!(c))
end
a:3
b:10
c:3
d:3
e:3

note that for column :b we printed 10 as 3 was removed from it when used pop! on column :a

Such mistakes somtimes happen. Because of this DataFrame.jl performs consistency check before doing an expensive operation(most notably before showing a data frame)

In [148]:
df
AssertionError: Data frame is corrupt: length of column :c (2) does not match length of column 1 (1). The column vector has likely been resized unintentionally (either directly or because it is shared with another data frame).

Stacktrace:
  [1] _check_consistency(df::DataFrame)
    @ DataFrames ~/.julia/packages/DataFrames/pVFzb/src/dataframe/dataframe.jl:447
  [2] _show(io::IOContext{IOBuffer}, df::DataFrame; allrows::Bool, allcols::Bool, rowlabel::Symbol, summary::Bool, eltypes::Bool, rowid::Nothing, truncate::Int64, kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ DataFrames ~/.julia/packages/DataFrames/pVFzb/src/abstractdataframe/show.jl:162
  [3] #show#692
    @ ~/.julia/packages/DataFrames/pVFzb/src/abstractdataframe/show.jl:348 [inlined]
  [4] show(io::IOContext{IOBuffer}, df::DataFrame)
    @ DataFrames ~/.julia/packages/DataFrames/pVFzb/src/abstractdataframe/show.jl:348
  [5] #show#707
    @ ~/.julia/packages/DataFrames/pVFzb/src/abstractdataframe/io.jl:138 [inlined]
  [6] show
    @ ~/.julia/packages/DataFrames/pVFzb/src/abstractdataframe/io.jl:138 [inlined]
  [7] limitstringmime(mime::MIME{Symbol("text/plain")}, x::DataFrame)
    @ IJulia ~/.julia/packages/IJulia/e8kqU/src/inline.jl:43
  [8] display_mimestring
    @ ~/.julia/packages/IJulia/e8kqU/src/display.jl:71 [inlined]
  [9] display_dict(x::DataFrame)
    @ IJulia ~/.julia/packages/IJulia/e8kqU/src/display.jl:102
 [10] #invokelatest#2
    @ ./essentials.jl:708 [inlined]
 [11] invokelatest
    @ ./essentials.jl:706 [inlined]
 [12] execute_request(socket::ZMQ.Socket, msg::IJulia.Msg)
    @ IJulia ~/.julia/packages/IJulia/e8kqU/src/execute_request.jl:112
 [13] #invokelatest#2
    @ ./essentials.jl:708 [inlined]
 [14] invokelatest
    @ ./essentials.jl:706 [inlined]
 [15] eventloop(socket::ZMQ.Socket)
    @ IJulia ~/.julia/packages/IJulia/e8kqU/src/eventloop.jl:8
 [16] (::IJulia.var"#15#18")()
    @ IJulia ./task.jl:411

We can investigate the columns to find out what happend:

In [149]:
collect(pairs(eachcol(df)))
Out[149]:
5-element Vector{Pair{Symbol, AbstractVector{T} where T}}:
 :a => [1]
 :b => [1]
 :c => [1, 10]
 :d => [1, 10]
 :e => [1, 10]

The output confirms that the data frame df got corrupted.

DataFrames.jl supports a complete getindex,getproperty,setindex!, setproperty!, view, broadcasting, and broadcasting assignment operations.

The details are explained here: (http://juliadata.github.io/DataFrames.jl/latest/lib/indexing/.)

Comparisons

In [150]:
df = DataFrame(rand(2,3),:auto)
Out[150]:

2 rows × 3 columns

x1x2x3
Float64Float64Float64
10.7728110.19090.0316249
20.2396660.9785270.320424
In [151]:
df2 = copy(df)
Out[151]:

2 rows × 3 columns

x1x2x3
Float64Float64Float64
10.7728110.19090.0316249
20.2396660.9785270.320424
In [152]:
df === df2, df==df2
Out[152]:
(false, true)

create a minimally different data frame and use isapprox for comparison

In [153]:
df3 = df .+ eps()
Out[153]:

2 rows × 3 columns

x1x2x3
Float64Float64Float64
10.7728110.19090.0316249
20.2396660.9785270.320424
In [154]:
df == df3
Out[154]:
false
In [155]:
isapprox(df,df3)
Out[155]:
true
In [156]:
df  df3
Out[156]:
true
In [157]:
isapprox(df,df3,atol = eps()/2)
Out[157]:
false

missings are handled as in Julia Base

In [159]:
df = DataFrame(a=missing)
Out[159]:

1 rows × 1 columns

a
Missing
1missing
In [160]:
df == df
Out[160]:
missing
In [161]:
df === df
Out[161]:
true
In [162]:
isequal(df,df)
Out[162]:
true
In [ ]: