forked from Huawei-Hadoop/hindex
-
Notifications
You must be signed in to change notification settings - Fork 0
/
book.xml
3739 lines (3575 loc) · 201 KB
/
book.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<?xml version="1.0" encoding="UTF-8"?>
<!--
/**
*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
-->
<book version="5.0" xmlns="http://docbook.org/ns/docbook"
xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:xi="http://www.w3.org/2001/XInclude"
xmlns:svg="http://www.w3.org/2000/svg"
xmlns:m="http://www.w3.org/1998/Math/MathML"
xmlns:html="http://www.w3.org/1999/xhtml"
xmlns:db="http://docbook.org/ns/docbook" xml:id="book">
<info>
<title><link xlink:href="http://www.hbase.org">
The Apache HBase™ Reference Guide
</link></title>
<subtitle><link xlink:href="http://www.hbase.org">
<inlinemediaobject>
<imageobject>
<imagedata align="middle" valign="middle" fileref="hbase_logo.png" />
</imageobject>
</inlinemediaobject>
</link>
</subtitle>
<copyright><year>2012</year><holder>Apache Software Foundation.
All Rights Reserved. Apache Hadoop, Hadoop, MapReduce, HDFS, Zookeeper, HBase, and the HBase project logo are trademarks of the Apache Software Foundation.
</holder>
</copyright>
<abstract>
<para>This is the official reference guide of
<link xlink:href="http://www.hbase.org">Apache HBase (TM)</link>,
a distributed, versioned, column-oriented database built on top of
<link xlink:href="http://hadoop.apache.org/">Apache Hadoop</link> and
<link xlink:href="http://zookeeper.apache.org/">Apache ZooKeeper</link>.
</para>
</abstract>
<revhistory>
<revision>
<revnumber>
<?eval ${project.version}?>
</revnumber>
<date>
<?eval ${buildDate}?>
</date>
</revision>
</revhistory>
</info>
<!--XInclude some chapters-->
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="preface.xml" />
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="getting_started.xml" />
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="configuration.xml" />
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="upgrading.xml" />
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="shell.xml" />
<chapter xml:id="datamodel">
<title>Data Model</title>
<para>In short, applications store data into an HBase table.
Tables are made of rows and columns.
All columns in HBase belong to a particular column family.
Table cells -- the intersection of row and column
coordinates -- are versioned.
A cell’s content is an uninterpreted array of bytes.
</para>
<para>Table row keys are also byte arrays so almost anything can
serve as a row key from strings to binary representations of longs or
even serialized data structures. Rows in HBase tables
are sorted by row key. The sort is byte-ordered. All table accesses are
via the table row key -- its primary key.
</para>
<section xml:id="conceptual.view"><title>Conceptual View</title>
<para>
The following example is a slightly modified form of the one on page
2 of the <link xlink:href="http://research.google.com/archive/bigtable.html">BigTable</link> paper.
There is a table called <varname>webtable</varname> that contains two column families named
<varname>contents</varname> and <varname>anchor</varname>.
In this example, <varname>anchor</varname> contains two
columns (<varname>anchor:cssnsi.com</varname>, <varname>anchor:my.look.ca</varname>)
and <varname>contents</varname> contains one column (<varname>contents:html</varname>).
<note>
<title>Column Names</title>
<para>
By convention, a column name is made of its column family prefix and a
<emphasis>qualifier</emphasis>. For example, the
column
<emphasis>contents:html</emphasis> is of the column family <varname>contents</varname>
The colon character (<literal
moreinfo="none">:</literal>) delimits the column family from the
column family <emphasis>qualifier</emphasis>.
</para>
</note>
<table frame='all'><title>Table <varname>webtable</varname></title>
<tgroup cols='4' align='left' colsep='1' rowsep='1'>
<colspec colname='c1'/>
<colspec colname='c2'/>
<colspec colname='c3'/>
<colspec colname='c4'/>
<thead>
<row><entry>Row Key</entry><entry>Time Stamp</entry><entry>ColumnFamily <varname>contents</varname></entry><entry>ColumnFamily <varname>anchor</varname></entry></row>
</thead>
<tbody>
<row><entry>"com.cnn.www"</entry><entry>t9</entry><entry></entry><entry><varname>anchor:cnnsi.com</varname> = "CNN"</entry></row>
<row><entry>"com.cnn.www"</entry><entry>t8</entry><entry></entry><entry><varname>anchor:my.look.ca</varname> = "CNN.com"</entry></row>
<row><entry>"com.cnn.www"</entry><entry>t6</entry><entry><varname>contents:html</varname> = "<html>..."</entry><entry></entry></row>
<row><entry>"com.cnn.www"</entry><entry>t5</entry><entry><varname>contents:html</varname> = "<html>..."</entry><entry></entry></row>
<row><entry>"com.cnn.www"</entry><entry>t3</entry><entry><varname>contents:html</varname> = "<html>..."</entry><entry></entry></row>
</tbody>
</tgroup>
</table>
</para>
</section>
<section xml:id="physical.view"><title>Physical View</title>
<para>
Although at a conceptual level tables may be viewed as a sparse set of rows.
Physically they are stored on a per-column family basis. New columns
(i.e., <varname>columnfamily:column</varname>) can be added to any
column family without pre-announcing them.
<table frame='all'><title>ColumnFamily <varname>anchor</varname></title>
<tgroup cols='3' align='left' colsep='1' rowsep='1'>
<colspec colname='c1'/>
<colspec colname='c2'/>
<colspec colname='c3'/>
<thead>
<row><entry>Row Key</entry><entry>Time Stamp</entry><entry>Column Family <varname>anchor</varname></entry></row>
</thead>
<tbody>
<row><entry>"com.cnn.www"</entry><entry>t9</entry><entry><varname>anchor:cnnsi.com</varname> = "CNN"</entry></row>
<row><entry>"com.cnn.www"</entry><entry>t8</entry><entry><varname>anchor:my.look.ca</varname> = "CNN.com"</entry></row>
</tbody>
</tgroup>
</table>
<table frame='all'><title>ColumnFamily <varname>contents</varname></title>
<tgroup cols='3' align='left' colsep='1' rowsep='1'>
<colspec colname='c1'/>
<colspec colname='c2'/>
<colspec colname='c3'/>
<thead>
<row><entry>Row Key</entry><entry>Time Stamp</entry><entry>ColumnFamily "contents:"</entry></row>
</thead>
<tbody>
<row><entry>"com.cnn.www"</entry><entry>t6</entry><entry><varname>contents:html</varname> = "<html>..."</entry></row>
<row><entry>"com.cnn.www"</entry><entry>t5</entry><entry><varname>contents:html</varname> = "<html>..."</entry></row>
<row><entry>"com.cnn.www"</entry><entry>t3</entry><entry><varname>contents:html</varname> = "<html>..."</entry></row>
</tbody>
</tgroup>
</table>
It is important to note in the diagram above that the empty cells shown in the
conceptual view are not stored since they need not be in a column-oriented
storage format. Thus a request for the value of the <varname>contents:html</varname>
column at time stamp <literal>t8</literal> would return no value. Similarly, a
request for an <varname>anchor:my.look.ca</varname> value at time stamp
<literal>t9</literal> would return no value. However, if no timestamp is
supplied, the most recent value for a particular column would be returned
and would also be the first one found since timestamps are stored in
descending order. Thus a request for the values of all columns in the row
<varname>com.cnn.www</varname> if no timestamp is specified would be:
the value of <varname>contents:html</varname> from time stamp
<literal>t6</literal>, the value of <varname>anchor:cnnsi.com</varname>
from time stamp <literal>t9</literal>, the value of
<varname>anchor:my.look.ca</varname> from time stamp <literal>t8</literal>.
</para>
<para>For more information about the internals of how Apache HBase stores data, see <xref linkend="regions.arch" />.
</para>
</section>
<section xml:id="table">
<title>Table</title>
<para>
Tables are declared up front at schema definition time.
</para>
</section>
<section xml:id="row">
<title>Row</title>
<para>Row keys are uninterrpreted bytes. Rows are
lexicographically sorted with the lowest order appearing first
in a table. The empty byte array is used to denote both the
start and end of a tables' namespace.</para>
</section>
<section xml:id="columnfamily">
<title>Column Family<indexterm><primary>Column Family</primary></indexterm></title>
<para>
Columns in Apache HBase are grouped into <emphasis>column families</emphasis>.
All column members of a column family have the same prefix. For example, the
columns <emphasis>courses:history</emphasis> and
<emphasis>courses:math</emphasis> are both members of the
<emphasis>courses</emphasis> column family.
The colon character (<literal
moreinfo="none">:</literal>) delimits the column family from the
<indexterm>column family <emphasis>qualifier</emphasis><primary>Column Family Qualifier</primary></indexterm>.
The column family prefix must be composed of
<emphasis>printable</emphasis> characters. The qualifying tail, the
column family <emphasis>qualifier</emphasis>, can be made of any
arbitrary bytes. Column families must be declared up front
at schema definition time whereas columns do not need to be
defined at schema time but can be conjured on the fly while
the table is up an running.</para>
<para>Physically, all column family members are stored together on the
filesystem. Because tunings and
storage specifications are done at the column family level, it is
advised that all column family members have the same general access
pattern and size characteristics.</para>
<para></para>
</section>
<section xml:id="cells">
<title>Cells<indexterm><primary>Cells</primary></indexterm></title>
<para>A <emphasis>{row, column, version} </emphasis>tuple exactly
specifies a <literal>cell</literal> in HBase.
Cell content is uninterrpreted bytes</para>
</section>
<section xml:id="data_model_operations">
<title>Data Model Operations</title>
<para>The four primary data model operations are Get, Put, Scan, and Delete. Operations are applied via
<link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html">HTable</link> instances.
</para>
<section xml:id="get">
<title>Get</title>
<para><link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html">Get</link> returns
attributes for a specified row. Gets are executed via
<link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#get%28org.apache.hadoop.hbase.client.Get%29">
HTable.get</link>.
</para>
</section>
<section xml:id="put">
<title>Put</title>
<para><link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Put.html">Put</link> either
adds new rows to a table (if the key is new) or can update existing rows (if the key already exists). Puts are executed via
<link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#put%28org.apache.hadoop.hbase.client.Put%29">
HTable.put</link> (writeBuffer) or <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#batch%28java.util.List%29">
HTable.batch</link> (non-writeBuffer).
</para>
</section>
<section xml:id="scan">
<title>Scans</title>
<para><link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html">Scan</link> allow
iteration over multiple rows for specified attributes.
</para>
<para>The following is an example of a
on an HTable table instance. Assume that a table is populated with rows with keys "row1", "row2", "row3",
and then another set of rows with the keys "abc1", "abc2", and "abc3". The following example shows how startRow and stopRow
can be applied to a Scan instance to return the rows beginning with "row".
<programlisting>
HTable htable = ... // instantiate HTable
Scan scan = new Scan();
scan.addColumn(Bytes.toBytes("cf"),Bytes.toBytes("attr"));
scan.setStartRow( Bytes.toBytes("row")); // start key is inclusive
scan.setStopRow( Bytes.toBytes("row" + (char)0)); // stop key is exclusive
ResultScanner rs = htable.getScanner(scan);
try {
for (Result r = rs.next(); r != null; r = rs.next()) {
// process result...
} finally {
rs.close(); // always close the ResultScanner!
}
</programlisting>
</para>
</section>
<section xml:id="delete">
<title>Delete</title>
<para><link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Delete.html">Delete</link> removes
a row from a table. Deletes are executed via
<link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#delete%28org.apache.hadoop.hbase.client.Delete%29">
HTable.delete</link>.
</para>
<para>HBase does not modify data in place, and so deletes are handled by creating new markers called <emphasis>tombstones</emphasis>.
These tombstones, along with the dead values, are cleaned up on major compactions.
</para>
<para>See <xref linkend="version.delete"/> for more information on deleting versions of columns, and see
<xref linkend="compaction"/> for more information on compactions.
</para>
</section>
</section>
<section xml:id="versions">
<title>Versions<indexterm><primary>Versions</primary></indexterm></title>
<para>A <emphasis>{row, column, version} </emphasis>tuple exactly
specifies a <literal>cell</literal> in HBase. It's possible to have an
unbounded number of cells where the row and column are the same but the
cell address differs only in its version dimension.</para>
<para>While rows and column keys are expressed as bytes, the version is
specified using a long integer. Typically this long contains time
instances such as those returned by
<code>java.util.Date.getTime()</code> or
<code>System.currentTimeMillis()</code>, that is: <quote>the difference,
measured in milliseconds, between the current time and midnight, January
1, 1970 UTC</quote>.</para>
<para>The HBase version dimension is stored in decreasing order, so that
when reading from a store file, the most recent values are found
first.</para>
<para>There is a lot of confusion over the semantics of
<literal>cell</literal> versions, in HBase. In particular, a couple
questions that often come up are:<itemizedlist>
<listitem>
<para>If multiple writes to a cell have the same version, are all
versions maintained or just the last?<footnote>
<para>Currently, only the last written is fetchable.</para>
</footnote></para>
</listitem>
<listitem>
<para>Is it OK to write cells in a non-increasing version
order?<footnote>
<para>Yes</para>
</footnote></para>
</listitem>
</itemizedlist></para>
<para>Below we describe how the version dimension in HBase currently
works<footnote>
<para>See <link
xlink:href="https://issues.apache.org/jira/browse/HBASE-2406">HBASE-2406</link>
for discussion of HBase versions. <link
xlink:href="http://outerthought.org/blog/417-ot.html">Bending time
in HBase</link> makes for a good read on the version, or time,
dimension in HBase. It has more detail on versioning than is
provided here. As of this writing, the limiitation
<emphasis>Overwriting values at existing timestamps</emphasis>
mentioned in the article no longer holds in HBase. This section is
basically a synopsis of this article by Bruno Dumon.</para>
</footnote>.</para>
<section xml:id="versions.ops">
<title>Versions and HBase Operations</title>
<para>In this section we look at the behavior of the version dimension
for each of the core HBase operations.</para>
<section>
<title>Get/Scan</title>
<para>Gets are implemented on top of Scans. The below discussion of
<link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html">Get</link> applies equally to <link
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html">Scans</link>.</para>
<para>By default, i.e. if you specify no explicit version, when
doing a <literal>get</literal>, the cell whose version has the
largest value is returned (which may or may not be the latest one
written, see later). The default behavior can be modified in the
following ways:</para>
<itemizedlist>
<listitem>
<para>to return more than one version, see <link
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html#setMaxVersions()">Get.setMaxVersions()</link></para>
</listitem>
<listitem>
<para>to return versions other than the latest, see <link
xlink:href="???">Get.setTimeRange()</link></para>
<para>To retrieve the latest version that is less than or equal
to a given value, thus giving the 'latest' state of the record
at a certain point in time, just use a range from 0 to the
desired version and set the max versions to 1.</para>
</listitem>
</itemizedlist>
</section>
<section xml:id="default_get_example">
<title>Default Get Example</title>
<para>The following Get will only retrieve the current version of the row
<programlisting>
Get get = new Get(Bytes.toBytes("row1"));
Result r = htable.get(get);
byte[] b = r.getValue(Bytes.toBytes("cf"), Bytes.toBytes("attr")); // returns current version of value
</programlisting>
</para>
</section>
<section xml:id="versioned_get_example">
<title>Versioned Get Example</title>
<para>The following Get will return the last 3 versions of the row.
<programlisting>
Get get = new Get(Bytes.toBytes("row1"));
get.setMaxVersions(3); // will return last 3 versions of row
Result r = htable.get(get);
byte[] b = r.getValue(Bytes.toBytes("cf"), Bytes.toBytes("attr")); // returns current version of value
List<KeyValue> kv = r.getColumn(Bytes.toBytes("cf"), Bytes.toBytes("attr")); // returns all versions of this column
</programlisting>
</para>
</section>
<section>
<title>Put</title>
<para>Doing a put always creates a new version of a
<literal>cell</literal>, at a certain timestamp. By default the
system uses the server's <literal>currentTimeMillis</literal>, but
you can specify the version (= the long integer) yourself, on a
per-column level. This means you could assign a time in the past or
the future, or use the long value for non-time purposes.</para>
<para>To overwrite an existing value, do a put at exactly the same
row, column, and version as that of the cell you would
overshadow.</para>
<section xml:id="implicit_version_example">
<title>Implicit Version Example</title>
<para>The following Put will be implicitly versioned by HBase with the current time.
<programlisting>
Put put = new Put(Bytes.toBytes(row));
put.add(Bytes.toBytes("cf"), Bytes.toBytes("attr1"), Bytes.toBytes( data));
htable.put(put);
</programlisting>
</para>
</section>
<section xml:id="explicit_version_example">
<title>Explicit Version Example</title>
<para>The following Put has the version timestamp explicitly set.
<programlisting>
Put put = new Put( Bytes.toBytes(row));
long explicitTimeInMs = 555; // just an example
put.add(Bytes.toBytes("cf"), Bytes.toBytes("attr1"), explicitTimeInMs, Bytes.toBytes(data));
htable.put(put);
</programlisting>
Caution: the version timestamp is internally by HBase for things like time-to-live calculations.
It's usually best to avoid setting this timestamp yourself. Prefer using a separate
timestamp attribute of the row, or have the timestamp a part of the rowkey, or both.
</para>
</section>
</section>
<section xml:id="version.delete">
<title>Delete</title>
<para>There are three different types of internal delete markers
<footnote><para>See Lars Hofhansl's blog for discussion of his attempt
adding another, <link xlink:href="http://hadoop-hbase.blogspot.com/2012/01/scanning-in-hbase.html">Scanning in HBase: Prefix Delete Marker</link></para></footnote>:
<itemizedlist>
<listitem><para>Delete: for a specific version of a column.</para>
</listitem>
<listitem><para>Delete column: for all versions of a column.</para>
</listitem>
<listitem><para>Delete family: for all columns of a particular ColumnFamily</para>
</listitem>
</itemizedlist>
When deleting an entire row, HBase will internally create a tombstone for each ColumnFamily (i.e., not each individual column).
</para>
<para>Deletes work by creating <emphasis>tombstone</emphasis>
markers. For example, let's suppose we want to delete a row. For
this you can specify a version, or else by default the
<literal>currentTimeMillis</literal> is used. What this means is
<quote>delete all cells where the version is less than or equal to
this version</quote>. HBase never modifies data in place, so for
example a delete will not immediately delete (or mark as deleted)
the entries in the storage file that correspond to the delete
condition. Rather, a so-called <emphasis>tombstone</emphasis> is
written, which will mask the deleted values<footnote>
<para>When HBase does a major compaction, the tombstones are
processed to actually remove the dead values, together with the
tombstones themselves.</para>
</footnote>. If the version you specified when deleting a row is
larger than the version of any value in the row, then you can
consider the complete row to be deleted.</para>
<para>For an informative discussion on how deletes and versioning interact, see
the thread <link xlink:href="http://comments.gmane.org/gmane.comp.java.hadoop.hbase.user/28421">Put w/ timestamp -> Deleteall -> Put w/ timestamp fails</link>
up on the user mailing list.</para>
<para>Also see <xref linkend="keyvalue"/> for more information on the internal KeyValue format.
</para>
</section>
</section>
<section>
<title>Current Limitations</title>
<section>
<title>Deletes mask Puts</title>
<para>Deletes mask puts, even puts that happened after the delete
was entered<footnote>
<para><link
xlink:href="https://issues.apache.org/jira/browse/HBASE-2256">HBASE-2256</link></para>
</footnote>. Remember that a delete writes a tombstone, which only
disappears after then next major compaction has run. Suppose you do
a delete of everything <= T. After this you do a new put with a
timestamp <= T. This put, even if it happened after the delete,
will be masked by the delete tombstone. Performing the put will not
fail, but when you do a get you will notice the put did have no
effect. It will start working again after the major compaction has
run. These issues should not be a problem if you use
always-increasing versions for new puts to a row. But they can occur
even if you do not care about time: just do delete and put
immediately after each other, and there is some chance they happen
within the same millisecond.</para>
</section>
<section>
<title>Major compactions change query results</title>
<para><quote>...create three cell versions at t1, t2 and t3, with a
maximum-versions setting of 2. So when getting all versions, only
the values at t2 and t3 will be returned. But if you delete the
version at t2 or t3, the one at t1 will appear again. Obviously,
once a major compaction has run, such behavior will not be the case
anymore...<footnote>
<para>See <emphasis>Garbage Collection</emphasis> in <link
xlink:href="http://outerthought.org/blog/417-ot.html">Bending
time in HBase</link> </para>
</footnote></quote></para>
</section>
</section>
</section>
<section xml:id="dm.sort">
<title>Sort Order</title>
<para>All data model operations HBase return data in sorted order. First by row,
then by ColumnFamily, followed by column qualifier, and finally timestamp (sorted
in reverse, so newest records are returned first).
</para>
</section>
<section xml:id="dm.column.metadata">
<title>Column Metadata</title>
<para>There is no store of column metadata outside of the internal KeyValue instances for a ColumnFamily.
Thus, while HBase can support not only a wide number of columns per row, but a heterogenous set of columns
between rows as well, it is your responsibility to keep track of the column names.
</para>
<para>The only way to get a complete set of columns that exist for a ColumnFamily is to process all the rows.
For more information about how HBase stores data internally, see <xref linkend="keyvalue" />.
</para>
</section>
<section xml:id="joins"><title>Joins</title>
<para>Whether HBase supports joins is a common question on the dist-list, and there is a simple answer: it doesn't,
at not least in the way that RDBMS' support them (e.g., with equi-joins or outer-joins in SQL). As has been illustrated
in this chapter, the read data model operations in HBase are Get and Scan.
</para>
<para>However, that doesn't mean that equivalent join functionality can't be supported in your application, but
you have to do it yourself. The two primary strategies are either denormalizing the data upon writing to HBase,
or to have lookup tables and do the join between HBase tables in your application or MapReduce code (and as RDBMS'
demonstrate, there are several strategies for this depending on the size of the tables, e.g., nested loops vs.
hash-joins). So which is the best approach? It depends on what you are trying to do, and as such there isn't a single
answer that works for every use case.
</para>
</section>
<section xml:id="acid"><title>ACID</title>
<pre>See <link xlink:href="http://hbase.apache.org/acid-semantics.html">ACID Semantics</link>.
Lars Hofhansl has also written a note on
<link xlink:href="http://hadoop-hbase.blogspot.com/2012/03/acid-in-hbase.html">ACID in HBase</link>.</pre>
</section>
</chapter> <!-- data model -->
<chapter xml:id="schema">
<title>HBase and Schema Design</title>
<para>A good general introduction on the strength and weaknesses modelling on
the various non-rdbms datastores is Ian Varley's Master thesis,
<link xlink:href="http://ianvarley.com/UT/MR/Varley_MastersReport_Full_2009-08-07.pdf">No Relation: The Mixed Blessings of Non-Relational Databases</link>.
Recommended. Also, read <xref linkend="keyvalue"/> for how HBase stores data internally.
</para>
<section xml:id="schema.creation">
<title>
Schema Creation
</title>
<para>HBase schemas can be created or updated with <xref linkend="shell" />
or by using <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html">HBaseAdmin</link> in the Java API.
</para>
<para>Tables must be disabled when making ColumnFamily modifications, for example..
<programlisting>
Configuration config = HBaseConfiguration.create();
HBaseAdmin admin = new HBaseAdmin(conf);
String table = "myTable";
admin.disableTable(table);
HColumnDescriptor cf1 = ...;
admin.addColumn(table, cf1); // adding new ColumnFamily
HColumnDescriptor cf2 = ...;
admin.modifyColumn(table, cf2); // modifying existing ColumnFamily
admin.enableTable(table);
</programlisting>
</para>See <xref linkend="client_dependencies"/> for more information about configuring client connections.
<para>Note: online schema changes are supported in the 0.92.x codebase, but the 0.90.x codebase requires the table
to be disabled.
</para>
<section xml:id="schema.updates"><title>Schema Updates</title>
<para>When changes are made to either Tables or ColumnFamilies (e.g., region size, block size), these changes
take effect the next time there is a major compaction and the StoreFiles get re-written.
</para>
<para>See <xref linkend="store"/> for more information on StoreFiles.
</para>
</section>
</section>
<section xml:id="number.of.cfs">
<title>
On the number of column families
</title>
<para>
HBase currently does not do well with anything above two or three column families so keep the number
of column families in your schema low. Currently, flushing and compactions are done on a per Region basis so
if one column family is carrying the bulk of the data bringing on flushes, the adjacent families
will also be flushed though the amount of data they carry is small. When many column families the
flushing and compaction interaction can make for a bunch of needless i/o loading (To be addressed by
changing flushing and compaction to work on a per column family basis). For more information
on compactions, see <xref linkend="compaction"/>.
</para>
<para>Try to make do with one column family if you can in your schemas. Only introduce a
second and third column family in the case where data access is usually column scoped;
i.e. you query one column family or the other but usually not both at the one time.
</para>
<section xml:id="number.of.cfs.card"><title>Cardinality of ColumnFamilies</title>
<para>Where multiple ColumnFamilies exist in a single table, be aware of the cardinality (i.e., number of rows).
If ColumnFamilyA has 1 million rows and ColumnFamilyB has 1 billion rows, ColumnFamilyA's data will likely be spread
across many, many regions (and RegionServers). This makes mass scans for ColumnFamilyA less efficient.
</para>
</section>
</section>
<section xml:id="rowkey.design"><title>Rowkey Design</title>
<section xml:id="timeseries">
<title>
Monotonically Increasing Row Keys/Timeseries Data
</title>
<para>
In the HBase chapter of Tom White's book <link xlink:url="http://oreilly.com/catalog/9780596521981">Hadoop: The Definitive Guide</link> (O'Reilly) there is a an optimization note on watching out for a phenomenon where an import process walks in lock-step with all clients in concert pounding one of the table's regions (and thus, a single node), then moving onto the next region, etc. With monotonically increasing row-keys (i.e., using a timestamp), this will happen. See this comic by IKai Lan on why monotonically increasing row keys are problematic in BigTable-like datastores:
<link xlink:href="http://ikaisays.com/2011/01/25/app-engine-datastore-tip-monotonically-increasing-values-are-bad/">monotonically increasing values are bad</link>. The pile-up on a single region brought on
by monotonically increasing keys can be mitigated by randomizing the input records to not be in sorted order, but in general it's best to avoid using a timestamp or a sequence (e.g. 1, 2, 3) as the row-key.
</para>
<para>If you do need to upload time series data into HBase, you should
study <link xlink:href="http://opentsdb.net/">OpenTSDB</link> as a
successful example. It has a page describing the <link xlink:href=" http://opentsdb.net/schema.html">schema</link> it uses in
HBase. The key format in OpenTSDB is effectively [metric_type][event_timestamp], which would appear at first glance to contradict the previous advice about not using a timestamp as the key. However, the difference is that the timestamp is not in the <emphasis>lead</emphasis> position of the key, and the design assumption is that there are dozens or hundreds (or more) of different metric types. Thus, even with a continual stream of input data with a mix of metric types, the Puts are distributed across various points of regions in the table.
</para>
</section>
<section xml:id="keysize">
<title>Try to minimize row and column sizes</title>
<subtitle>Or why are my StoreFile indices large?</subtitle>
<para>In HBase, values are always freighted with their coordinates; as a
cell value passes through the system, it'll be accompanied by its
row, column name, and timestamp - always. If your rows and column names
are large, especially compared to the size of the cell value, then
you may run up against some interesting scenarios. One such is
the case described by Marc Limotte at the tail of
<link xlink:url="https://issues.apache.org/jira/browse/HBASE-3551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13005272#comment-13005272">HBASE-3551</link>
(recommended!).
Therein, the indices that are kept on HBase storefiles (<xref linkend="hfile" />)
to facilitate random access may end up occupyng large chunks of the HBase
allotted RAM because the cell value coordinates are large.
Mark in the above cited comment suggests upping the block size so
entries in the store file index happen at a larger interval or
modify the table schema so it makes for smaller rows and column
names.
Compression will also make for larger indices. See
the thread <link xlink:href="http://search-hadoop.com/m/hemBv1LiN4Q1/a+question+storefileIndexSize&subj=a+question+storefileIndexSize">a question storefileIndexSize</link>
up on the user mailing list.
</para>
<para>Most of the time small inefficiencies don't matter all that much. Unfortunately,
this is a case where they do. Whatever patterns are selected for ColumnFamilies, attributes, and rowkeys they could be repeated
several billion times in your data. </para>
<para>See <xref linkend="keyvalue"/> for more information on HBase stores data internally to see why this is important.</para>
<section xml:id="keysize.cf"><title>Column Families</title>
<para>Try to keep the ColumnFamily names as small as possible, preferably one character (e.g. "d" for data/default).
</para>
<para>See <xref linkend="keyvalue"/> for more information on HBase stores data internally to see why this is important.</para>
</section>
<section xml:id="keysize.atttributes"><title>Attributes</title>
<para>Although verbose attribute names (e.g., "myVeryImportantAttribute") are easier to read, prefer shorter attribute names (e.g., "via")
to store in HBase.
</para>
<para>See <xref linkend="keyvalue"/> for more information on HBase stores data internally to see why this is important.</para>
</section>
<section xml:id="keysize.row"><title>Rowkey Length</title>
<para>Keep them as short as is reasonable such that they can still be useful for required data access (e.g., Get vs. Scan).
A short key that is useless for data access is not better than a longer key with better get/scan properties. Expect tradeoffs
when designing rowkeys.
</para>
</section>
<section xml:id="keysize.patterns"><title>Byte Patterns</title>
<para>A long is 8 bytes. You can store an unsigned number up to 18,446,744,073,709,551,615 in those eight bytes.
If you stored this number as a String -- presuming a byte per character -- you need nearly 3x the bytes.
</para>
<para>Not convinced? Below is some sample code that you can run on your own.
<programlisting>
// long
//
long l = 1234567890L;
byte[] lb = Bytes.toBytes(l);
System.out.println("long bytes length: " + lb.length); // returns 8
String s = "" + l;
byte[] sb = Bytes.toBytes(s);
System.out.println("long as string length: " + sb.length); // returns 10
// hash
//
MessageDigest md = MessageDigest.getInstance("MD5");
byte[] digest = md.digest(Bytes.toBytes(s));
System.out.println("md5 digest bytes length: " + digest.length); // returns 16
String sDigest = new String(digest);
byte[] sbDigest = Bytes.toBytes(sDigest);
System.out.println("md5 digest as string length: " + sbDigest.length); // returns 26
</programlisting>
</para>
</section>
</section>
<section xml:id="reverse.timestamp"><title>Reverse Timestamps</title>
<para>A common problem in database processing is quickly finding the most recent version of a value. A technique using reverse timestamps
as a part of the key can help greatly with a special case of this problem. Also found in the HBase chapter of Tom White's book Hadoop: The Definitive Guide (O'Reilly),
the technique involves appending (<code>Long.MAX_VALUE - timestamp</code>) to the end of any key, e.g., [key][reverse_timestamp].
</para>
<para>The most recent value for [key] in a table can be found by performing a Scan for [key] and obtaining the first record. Since HBase keys
are in sorted order, this key sorts before any older row-keys for [key] and thus is first.
</para>
<para>This technique would be used instead of using <xref linkend="schema.versions">HBase Versioning</xref> where the intent is to hold onto all versions
"forever" (or a very long time) and at the same time quickly obtain access to any other version by using the same Scan technique.
</para>
</section>
<section xml:id="rowkey.scope">
<title>Rowkeys and ColumnFamilies</title>
<para>Rowkeys are scoped to ColumnFamilies. Thus, the same rowkey could exist in each ColumnFamily that exists in a table without collision.
</para>
</section>
<section xml:id="changing.rowkeys"><title>Immutability of Rowkeys</title>
<para>Rowkeys cannot be changed. The only way they can be "changed" in a table is if the row is deleted and then re-inserted.
This is a fairly common question on the HBase dist-list so it pays to get the rowkeys right the first time (and/or before you've
inserted a lot of data).
</para>
</section>
<section xml:id="rowkey.regionsplits"><title>Relationship Between RowKeys and Region Splits</title>
<para>If you pre-split your table, it is <emphasis>critical</emphasis> to understand how your rowkey will be distributed across
the region boundaries. As an example of why this is important, consider the example of using displayable hex characters as the
lead position of the key (e.g., ""0000000000000000" to "ffffffffffffffff"). Running those key ranges through <code>Bytes.split</code>
(which is the split strategy used when creating regions in <code>HBaseAdmin.createTable(byte[] startKey, byte[] endKey, numRegions)</code>
for 10 regions will generate the following splits...
</para>
<para>
<programlisting>
48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 // 0
54 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 // 6
61 -67 -67 -67 -67 -67 -67 -67 -67 -67 -67 -67 -67 -67 -67 -68 // =
68 -124 -124 -124 -124 -124 -124 -124 -124 -124 -124 -124 -124 -124 -124 -126 // D
75 75 75 75 75 75 75 75 75 75 75 75 75 75 75 72 // K
82 18 18 18 18 18 18 18 18 18 18 18 18 18 18 14 // R
88 -40 -40 -40 -40 -40 -40 -40 -40 -40 -40 -40 -40 -40 -40 -44 // X
95 -97 -97 -97 -97 -97 -97 -97 -97 -97 -97 -97 -97 -97 -97 -102 // _
102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 // f
</programlisting>
... (note: the lead byte is listed to the right as a comment.) Given that the first split is a '0' and the last split is an 'f',
everything is great, right? Not so fast.
</para>
<para>The problem is that all the data is going to pile up in the first 2 regions and the last region thus creating a "lumpy" (and
possibly "hot") region problem. To understand why, refer to an <link xlink:href="http://www.asciitable.com">ASCII Table</link>.
'0' is byte 48, and 'f' is byte 102, but there is a huge gap in byte values (bytes 58 to 96) that will <emphasis>never appear in this
keyspace</emphasis> because the only values are [0-9] and [a-f]. Thus, the middle regions regions will
never be used. To make pre-spliting work with this example keyspace, a custom definition of splits (i.e., and not relying on the
built-in split method) is required.
</para>
<para>Lesson #1: Pre-splitting tables is generally a best practice, but you need to pre-split them in such a way that all the
regions are accessible in the keyspace. While this example demonstrated the problem with a hex-key keyspace, the same problem can happen
with <emphasis>any</emphasis> keyspace. Know your data.
</para>
<para>Lesson #2: While generally not advisable, using hex-keys (and more generally, displayable data) can still work with pre-split
tables as long as all the created regions are accessible in the keyspace.
</para>
<para>To conclude this example, the following is an example of how appropriate splits can be pre-created for hex-keys:.
</para>
<programlisting>public static boolean createTable(HBaseAdmin admin, HTableDescriptor table, byte[][] splits)
throws IOException {
try {
admin.createTable( table, splits );
return true;
} catch (TableExistsException e) {
logger.info("table " + table.getNameAsString() + " already exists");
// the table already exists...
return false;
}
}
public static byte[][] getHexSplits(String startKey, String endKey, int numRegions) {
byte[][] splits = new byte[numRegions-1][];
BigInteger lowestKey = new BigInteger(startKey, 16);
BigInteger highestKey = new BigInteger(endKey, 16);
BigInteger range = highestKey.subtract(lowestKey);
BigInteger regionIncrement = range.divide(BigInteger.valueOf(numRegions));
lowestKey = lowestKey.add(regionIncrement);
for(int i=0; i < numRegions-1;i++) {
BigInteger key = lowestKey.add(regionIncrement.multiply(BigInteger.valueOf(i)));
byte[] b = String.format("%016x", key).getBytes();
splits[i] = b;
}
return splits;
}</programlisting>
</section>
</section> <!-- rowkey design -->
<section xml:id="schema.versions">
<title>
Number of Versions
</title>
<section xml:id="schema.versions.max"><title>Maximum Number of Versions</title>
<para>The maximum number of row versions to store is configured per column
family via <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html">HColumnDescriptor</link>.
The default for max versions is 3.
This is an important parameter because as described in <xref linkend="datamodel" />
section HBase does <emphasis>not</emphasis> overwrite row values, but rather
stores different values per row by time (and qualifier). Excess versions are removed during major
compactions. The number of max versions may need to be increased or decreased depending on application needs.
</para>
<para>It is not recommended setting the number of max versions to an exceedingly high level (e.g., hundreds or more) unless those old values are
very dear to you because this will greatly increase StoreFile size.
</para>
</section>
<section xml:id="schema.minversions">
<title>
Minimum Number of Versions
</title>
<para>Like maximum number of row versions, the minimum number of row versions to keep is configured per column
family via <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html">HColumnDescriptor</link>.
The default for min versions is 0, which means the feature is disabled.
The minimum number of row versions parameter is used together with the time-to-live parameter and can be combined with the
number of row versions parameter to allow configurations such as
"keep the last T minutes worth of data, at most N versions, <emphasis>but keep at least M versions around</emphasis>"
(where M is the value for minimum number of row versions, M<N).
This parameter should only be set when time-to-live is enabled for a column family and must be less than the
number of row versions.
</para>
</section>
</section>
<section xml:id="supported.datatypes">
<title>
Supported Datatypes
</title>
<para>HBase supports a "bytes-in/bytes-out" interface via <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Put.html">Put</link> and
<link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Result.html">Result</link>, so anything that can be
converted to an array of bytes can be stored as a value. Input could be strings, numbers, complex objects, or even images as long as they can rendered as bytes.
</para>
<para>There are practical limits to the size of values (e.g., storing 10-50MB objects in HBase would probably be too much to ask);
search the mailling list for conversations on this topic. All rows in HBase conform to the <xref linkend="datamodel">datamodel</xref>, and
that includes versioning. Take that into consideration when making your design, as well as block size for the ColumnFamily.
</para>
<section xml:id="counters">
<title>Counters</title>
<para>
One supported datatype that deserves special mention are "counters" (i.e., the ability to do atomic increments of numbers). See
<link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#increment%28org.apache.hadoop.hbase.client.Increment%29">Increment</link> in HTable.
</para>
<para>Synchronization on counters are done on the RegionServer, not in the client.
</para>
</section>
</section>
<section xml:id="schema.joins"><title>Joins</title>
<para>If you have multiple tables, don't forget to factor in the potential for <xref linkend="joins"/> into the schema design.
</para>
</section>
<section xml:id="ttl">
<title>Time To Live (TTL)</title>
<para>ColumnFamilies can set a TTL length in seconds, and HBase will automatically delete rows once the expiration time is reached.
This applies to <emphasis>all</emphasis> versions of a row - even the current one. The TTL time encoded in the HBase for the row is specified in UTC.
</para>
<para>See <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html">HColumnDescriptor</link> for more information.
</para>
</section>
<section xml:id="cf.keep.deleted">
<title>
Keeping Deleted Cells
</title>
<para>ColumnFamilies can optionally keep deleted cells. That means deleted cells can still be retrieved with
<link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html">Get</link> or
<link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html">Scan</link> operations,
as long these operations have a time range specified that ends before the timestamp of any delete that would affect the cells.
This allows for point in time queries even in the presence of deletes.
</para>
<para>
Deleted cells are still subject to TTL and there will never be more than "maximum number of versions" deleted cells.
A new "raw" scan options returns all deleted rows and the delete markers.
</para>
<para>See <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html">HColumnDescriptor</link> for more information.
</para>
</section>
<section xml:id="secondary.indexes">
<title>
Secondary Indexes and Alternate Query Paths
</title>
<para>This section could also be titled "what if my table rowkey looks like <emphasis>this</emphasis> but I also want to query my table like <emphasis>that</emphasis>."
A common example on the dist-list is where a row-key is of the format "user-timestamp" but there are reporting requirements on activity across users for certain
time ranges. Thus, selecting by user is easy because it is in the lead position of the key, but time is not.
</para>
<para>There is no single answer on the best way to handle this because it depends on...
<itemizedlist>
<listitem>Number of users</listitem>
<listitem>Data size and data arrival rate</listitem>
<listitem>Flexibility of reporting requirements (e.g., completely ad-hoc date selection vs. pre-configured ranges) </listitem>
<listitem>Desired execution speed of query (e.g., 90 seconds may be reasonable to some for an ad-hoc report, whereas it may be too long for others) </listitem>
</itemizedlist>
... and solutions are also influenced by the size of the cluster and how much processing power you have to throw at the solution.
Common techniques are in sub-sections below. This is a comprehensive, but not exhaustive, list of approaches.
</para>
<para>It should not be a surprise that secondary indexes require additional cluster space and processing.
This is precisely what happens in an RDBMS because the act of creating an alternate index requires both space and processing cycles to update. RBDMS products
are more advanced in this regard to handle alternative index management out of the box. However, HBase scales better at larger data volumes, so this is a feature trade-off.
</para>
<para>Pay attention to <xref linkend="performance"/> when implementing any of these approaches.</para>
<para>Additionally, see the David Butler response in this dist-list thread <link xlink:href="http://search-hadoop.com/m/nvbiBp2TDP/Stargate%252Bhbase&subj=Stargate+hbase">HBase, mail # user - Stargate+hbase</link>
</para>
<section xml:id="secondary.indexes.filter">
<title>
Filter Query
</title>
<para>Depending on the case, it may be appropriate to use <xref linkend="client.filter"/>. In this case, no secondary index is created.
However, don't try a full-scan on a large table like this from an application (i.e., single-threaded client).
</para>
</section>
<section xml:id="secondary.indexes.periodic">
<title>
Periodic-Update Secondary Index
</title>
<para>A secondary index could be created in an other table which is periodically updated via a MapReduce job. The job could be executed intra-day, but depending on
load-strategy it could still potentially be out of sync with the main data table.</para>
<para>See <xref linkend="mapreduce.example.readwrite"/> for more information.</para>
</section>
<section xml:id="secondary.indexes.dualwrite">
<title>
Dual-Write Secondary Index
</title>
<para>Another strategy is to build the secondary index while publishing data to the cluster (e.g., write to data table, write to index table).
If this is approach is taken after a data table already exists, then bootstrapping will be needed for the secondary index with a MapReduce job (see <xref linkend="secondary.indexes.periodic"/>).</para>
</section>
<section xml:id="secondary.indexes.summary">
<title>
Summary Tables
</title>
<para>Where time-ranges are very wide (e.g., year-long report) and where the data is voluminous, summary tables are a common approach.
These would be generated with MapReduce jobs into another table.</para>
<para>See <xref linkend="mapreduce.example.summary"/> for more information.</para>
</section>
<section xml:id="secondary.indexes.coproc">
<title>
Coprocessor Secondary Index
</title>
<para>Coprocessors act like RDBMS triggers. These were added in 0.92. For more information, see <xref linkend="coprocessors"/>
</para>
</section>
</section>
<section xml:id="schema.smackdown"><title>Schema Design Smackdown</title>
<para>This section will describe common schema design questions that appear on the dist-list. These are
general guidelines and not laws - each application must consider its own needs.
</para>
<section xml:id="schema.smackdown.rowsversions"><title>Rows vs. Versions</title>
<para>A common question is whether one should prefer rows or HBase's built-in-versioning. The context is typically where there are
"a lot" of versions of a row to be retained (e.g., where it is significantly above the HBase default of 3 max versions). The
rows-approach would require storing a timstamp in some portion of the rowkey so that they would not overwite with each successive update.
</para>
<para>Preference: Rows (generally speaking).
</para>
</section>
<section xml:id="schema.smackdown.rowscols"><title>Rows vs. Columns</title>
<para>Another common question is whether one should prefer rows or columns. The context is typically in extreme cases of wide
tables, such as having 1 row with 1 million attributes, or 1 million rows with 1 columns apiece.
</para>
<para>Preference: Rows (generally speaking). To be clear, this guideline is in the context is in extremely wide cases, not in the
standard use-case where one needs to store a few dozen or hundred columns. But there is also a middle path between these two
options, and that is "Rows as Columns."
</para>
</section>
<section xml:id="schema.smackdown.rowsascols"><title>Rows as Columns</title>
<para>The middle path between Rows vs. Columns is packing data that would be a separate row into columns, for certain rows.
OpenTSDB is the best example of this case where a single row represents a defined time-range, and then discrete events are treated as
columns. This approach is often more complex, and may require the additional complexity of re-writing your data, but has the
advantage of being I/O efficient. For an overview of this approach, see
<link xlink:href="http://www.cloudera.com/content/cloudera/en/resources/library/hbasecon/video-hbasecon-2012-lessons-learned-from-opentsdb.html">Lessons Learned from OpenTSDB</link>
from HBaseCon2012.
</para>
</section>
</section>
<section xml:id="schema.ops"><title>Operational and Performance Configuration Options</title>
<para>See the Performance section <xref linkend="perf.schema"/> for more information operational and performance
schema design options, such as Bloom Filters, Table-configured regionsizes, compression, and blocksizes.
</para>
</section>
<section xml:id="constraints"><title>Constraints</title>
<para>HBase currently supports 'constraints' in traditional (SQL) database parlance. The advised usage for Constraints is in enforcing business rules for attributes in the table (eg. make sure values are in the range 1-10).