๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ

๐Ÿ“š ์ „๊ณต ๊ณต๋ถ€/DB๊ธฐ์ดˆ

[DB] Indexing structures for files / ๋ฌผ๋ฆฌ์  DB์„ค๊ณ„

Index structures for files

  • index : ์ถ”๊ฐ€์ ์ธ ๋ณด์กฐ์ ‘๊ทผ๊ตฌ์กฐ, ๊ฒ€์ƒ‰์†๋„ ์ฆ๊ฐ€, ๋Œ€์ฒด๋ฐฉ๋ฒ•์„ ์ œ๊ณตํ•จ, ํšจ์œจ์  ๊ฒ€์ƒ‰

Single-level ordered indexes

  • primary, secondary, clustering
  • ISAM : Indexed Sequential Access Method

Multilevel indexes

  • B trees, B+ trees

<Single level ordered>

Types of Single-level ordered indexes

  • Primary index : ํ‚ค ํ•„๋“œ ์ •๋ ฌ ํŒŒ์ผ์—์„œ ์ •๋ ฌ ํ‚ค ํ•„๋“œ์— ๋Œ€ํ•ด ์ •์˜๋œ ์ธ๋ฑ์Šค
  • Clustering index : ํ‚ค๊ฐ€ ์•„๋‹Œ๊ฑธ๋กœ ์ •๋ ฌ๋œ ํŒŒ์ผ์—์„œ ์ •๋ ฌ ํ•„๋“œ์— ๋Œ€ํ•ด ์ •์˜๋œ ์ธ๋ฑ์Šค
  • Secondary index : any nonordering field

Primary Indexes

  • < K(i), P(i) >
  • K = ์ •๋ ฌ ํ‚ค ํ•„๋“œ
  • P = ๋””์Šคํฌ๋ธ”๋Ÿญ์˜ ํฌ์ธํ„ฐ
  • one index entry for each block
  • Anchor record of the block, or the block anchor : ๊ฐ ๋ธ”๋Ÿญ์˜ ์ฒซ ๋ ˆ์ฝ”๋“œ
  • Dense index : has an index entry for every search key value
  • Sparse index : non-dense. (primary index)

if) 300,000 ๋ ˆ์ฝ”๋“œ, B=4096 bytes, R=100 bytes

→ bfr = B/R = 4096/100 = 40

→ ํ•„์š”ํ•œ ๋ธ”๋Ÿญ ์ˆ˜ b = 300000/40 = 7500

→ ์ •๋ ฌ ํŒŒ์ผ์ด๋ฏ€๋กœ ์‹œ๊ฐ„์€ log 7500 = 13 block accesses

 

to insert a record

  1. move records to make space
  2. change some index entries

ํ˜„์‹ค์ ์œผ๋กœ ๋ถˆ๊ฐ€๋Šฅํ•œ ๋ฐฉ๋ฒ•๋“ค์ด๋ฏ€๋กœ, overflow ํŒŒ์ผ์„ ์‚ฌ์šฉํ•˜๊ฑฐ๋‚˜ ์—ฐ๊ฒฐ๋ฆฌ์ŠคํŠธ ์‚ฌ์šฉ.

deletion ⇒ deletion markers

Clustering index

nonkey field

ordering nonkey field vs separate block cluster for each group →๋ธ”๋Ÿญ๋‹น ๊ฐ™์€ ๋ ˆ์ฝ”๋“œ๋งŒ ์ €์žฅ. ๊ณต๊ฐ„๋‚ญ๋น„ ๊ฐ€๋Šฅ์„ฑ

→ ๊ณต๊ฐ„๋‚ญ๋น„ ์—†์ง€๋งŒ ๋‹ค์Œ๋ธ”๋Ÿญ๊นŒ์ง€ ๊ฒ€์ƒ‰ ํ•„์š”

 

Secondary indexes : ๋ณด์กฐ์ธ๋ฑ์Šค

need more storage space and longer search time

dense index : ๋ ˆ์ฝ”๋“œ ๋‹น ํ•˜๋‚˜์˜ ์—”ํŠธ๋ฆฌ

์ž„์˜ ๋ ˆ์ฝ”๋“œ์˜ ๊ฒ€์ƒ‰์‹œ๊ฐ„ ๊ฐœ์„ 

on a nonkey field

can have the same value for the indexing field

  1. index entry ์—ฌ๋Ÿฌ๊ฐœ ์‚ฌ์šฉ
  2. ๊ฐ€๋ณ€๊ธธ์ด ๋ ˆ์ฝ”๋“œ ์‚ฌ์šฉ
  3. ๋ ˆ๋ฒจ์„ ํ•˜๋‚˜ ๋”

r=300,000 records fixed length, R=100bytes, B=4096bytes

secondary key, nonordering key field of the file that is V=9 bytes long

→ ๋ณด์กฐ์ธ๋ฑ์Šค ์—†๋‹ค๋ฉด : ์„ ํ˜•ํƒ์ƒ‰, b/2 = 7500/2 = 3750 block accesses

→ ๋ณด์กฐ์ธ๋ฑ์Šค ์žˆ์Œ

  • R(i) = V+P = 9+6 = 15 = ์ธ๋ฑ์Šค ๋ธ”๋Ÿญ์˜ ํฌ๊ธฐ
  • bfr(i) = 4096/15 = 273
  • b(i) = 300000/273 = 1099
  • ์ด์ง„ํƒ์ƒ‰ ์‹œ๊ฐ„ : log 1099 = 11
  • ์ด ์ ‘๊ทผ ๋ธ”๋Ÿญ ์ˆ˜ = 11 index blocks + 1 data block = 12

1. Indexing filed๊ฐ€ key ์ผ๋•Œ

: ์ธ๋ฑ์Šค ํ•„๋“œ๊ฐ€ ์ •๋ ฌ์— ์‚ฌ์šฉ๋˜๋ฉด Primary index, ์ •๋ ฌ์— ์‚ฌ์šฉ๋˜์ง€ ์•Š์€ ํ‚ค๋ผ๋ฉด secondary index(key)

 

2. Indexing filed๊ฐ€ nonkey ์ผ๋•Œ

: ์ธ๋ฑ์Šค ํ•„๋“œ๊ฐ€ ์ •๋ ฌ์— ์‚ฌ์šฉ๋˜๋ฉด Clustering index, ์ •๋ ฌ์— ์‚ฌ์šฉ๋˜์ง€ ์•Š์€ ํ‚ค๋ผ๋ฉด secondary index(nonkey)

<Multilevel indexes>

bfr(i) = the blocking factor for the index

= the fan-out = fo (์ธ๋ฑ์Šค ์ฐจ์ˆ˜)

  • searching a multilevel index = log (๋ฐ‘=fo) (index block ์ˆ˜) ๊ฐœ์˜ ๋ธ”๋Ÿญ ์ ‘๊ทผ ํ•„์š”

ISAM = Indexed Sequential Access Method

 

bfr(i) = 273 = fan-out = fo

the number of first level index blocks b1 = 1099

block accesses = log(273) 1099

 

< Dynamic Multilevel indexes B Trees and B+ Trees >

B trees and B+ trees are special cases of the tree data structure

root node = level 0

search trees : ๋ ˆ์ฝ”๋“œ ๊ฒ€์ƒ‰์„ ๊ฐ€์ด๋“œํ•ด์ฃผ๋Š” ํŠน๋ณ„ํ•œ ํ˜•ํƒœ์˜ ํŠธ๋ฆฌ

  • p=3 → ํฌ์ธํ„ฐ๊ฐ€ 3๊ฐœ์ธ 3์ฐจ ํŠธ๋ฆฌ

B-tree

: ํŠธ๋ฆฌ๊ฐ€ ํ•ญ์ƒ ๊ท ํ˜•์„ ์œ ์ง€ํ•˜๊ณ , ์‚ญ์ œ๋กœ ๋‚ญ๋น„๋˜๋Š” ๊ณต๊ฐ„์ด ์žˆ๋‹ค๋ฉด ์ ˆ๋Œ€ ๊ณผ๋„ํ•ด์ง€์ง€ ์•Š๋„๋ก ํ•˜๋Š” ์ถ”๊ฐ€์ ์ธ ์ œ์•ฝ ์กฐ๊ฑด์ด ์žˆ์Šต๋‹ˆ๋‹ค.

์ฐจ์ˆ˜ p์ธ BํŠธ๋ฆฌ์˜ ํŠน์„ฑ

  1. ๋ฃจํŠธ์™€ ๋ฆฌํ”„๋ฅผ ์ œ์™ธํ•œ ๋…ธ๋“œ์˜ ์„œ๋ธŒํŠธ๋ฆฌ ์ˆ˜ : p/2์ด์ƒ ≤ ๊ฐœ์ˆ˜ ≤ p → ์ ์–ด๋„ ๋…ธ๋“œ์˜ ๋ฐ˜์„ ์ฑ„์›Œ์•ผํ•จ
  2. ๋ฃจํŠธ(๋ฆฌํ”„๊ฐ€ ์•„๋‹Œ)์˜ ์„œ๋ธŒํŠธ๋ฆฌ ์ˆ˜ > 2 → ๋ฃจํŠธ ๋…ธ๋“œ๋ถ€ํ„ฐ ๋ถ„๊ธฐ
  3. ๋ชจ๋“  ๋ฆฌํ”„๋Š” ๊ฐ™์€ ๋ ˆ๋ฒจ → ๊ท ํ˜•ํŠธ๋ฆฌ
  4. ํ‚ค ๊ฐ’์˜ ์ˆ˜ : ๋ฆฌํ”„๋Š” (p/2์ด์ƒ - 1) ~ (p-1)
    1. ๋ฆฌํ”„๊ฐ€ ์•„๋‹Œ ๋…ธ๋“œ : ์„œ๋ธŒํŠธ๋ฆฌ - 1
  5. ํ•œ ๋…ธ๋“œ ๋‚ด์˜ ํ‚ค๊ฐ’์€ ์˜ค๋ฆ„์ฐจ์ˆœ ์ •๋ ฌ. → p-์› ํƒ์ƒ‰ํŠธ๋ฆฌ

key ๊ฐœ์ˆ˜ = ํฌ์ธํ„ฐ ๊ฐœ์ˆ˜ - 1

์—ฐ์‚ฐ : ์ง์ ‘ํƒ์ƒ‰ = ํ‚ค๊ฐ’์— ์˜์กดํ•œ ๋ถ„๊ธฐ, ํ•œ ๋…ธ๋“œ ๋ธ”๋Ÿญ ๋‚ด์—์„œ๋Š” ์ˆœ์ฐจํƒ์ƒ‰

์ˆœ์ฐจํƒ์ƒ‰ = ์ค‘์œ„ ์ˆœํšŒ

์‚ฝ์ž… = ํŠธ๋ฆฌ ๊ท ํ˜• ์œ ์ง€, ๋ถ„ํ•  ์‹œ ๋†’์ด ์ฆ๊ฐ€

์‚ญ์ œ = ๊ท ํ˜• ์œ ์ง€, ํ•ฉ๋ณ‘ ์‹œ ๋†’์ด ๊ฐ์†Œ

 

B+ tree : BํŠธ๋ฆฌ์˜ ๋ณ€ํ˜•

data pointers are stored only at the leaf nodes of the tree

leaf node๋“ค์€ ์—ฐ๊ฒฐ๋˜์–ด์žˆ๋‹ค

  • ์ธ๋ฑ์Šค ์„ธํŠธ : ๋‚ด๋ถ€ ๋…ธ๋“œ (์œ„์ชฝ์˜ ํŠธ๋ฆฌ)
  • ์ˆœ์ฐจ ์„ธํŠธ : ๋ฆฌํ”„ ๋…ธ๋“œ (๋ชจ๋“  ํ‚ค ๊ฐ’๋“ค์„ ํฌํ•จํ•˜๋ฉฐ, ์ˆœ์ฐจ์ ์œผ๋กœ ์—ฐ๊ฒฐ)

B+ ํŠธ๋ฆฌ์˜ ํŠน์„ฑ

  1. ๋ฃจํŠธ์˜ ์„œ๋ธŒํŠธ๋ฆฌ : 0, 2, p/2 ~ p
  2. ๋…ธ๋“œ์˜ ์„œ๋ธŒํŠธ๋ฆฌ (๋ฃจํŠธ, ๋ฆฌํ”„ ์ œ์™ธ) : p/2 ~ p
  3. ๋ชจ๋“  ๋ฆฌํ”„๋Š” ๊ฐ™์€ ๋ ˆ๋ฒจ
  4. ๋ฆฌํ”„๊ฐ€ ์•„๋‹Œ ๋…ธ๋“œ์˜ ํ‚ท๊ฐ’ ์ˆ˜ : ์„œ๋ธŒํŠธ๋ฆฌ - 1
  5. ๋ฆฌํ”„ ๋…ธ๋“œ : ๋ฐ์ดํ„ฐ ํŒŒ์ผ์˜ ์ˆœ์ฐจ์„ธํŠธ(๋ฆฌ์ŠคํŠธ๋กœ ์—ฐ๊ฒฐ)

์—ฐ์‚ฐ

ํƒ์ƒ‰ : ์ธ๋ฑ์Šค ์„ธํŠธ = p-์› ํƒ์ƒ‰ํŠธ๋ฆฌ, ๋ฆฌํ”„์—์„œ ๊ฒ€์ƒ‰

์‚ฝ์ž… : BํŠธ๋ฆฌ์™€ ์œ ์‚ฌ, ์˜ค๋ฒ„ํ”Œ๋กœ์šฐ→๋ถ„์—ด→ ๋ถ€๋ชจ๋…ธ๋“œ์™€ ๋ฆฌํ”„ ๋‘˜ ๋‹ค์— ํ‚ค๊ฐ’ ์กด์žฌ

์‚ญ์ œ : ๋ฆฌํ”„์—์„œ๋งŒ ์‚ญ์ œ or ํ•ฉ๋ณ‘ or ์žฌ๋ถ„๋ฐฐ

 

Index Creation (in sql)

 

CREATE [Unique] INDEX <index name>

ON <table name> (~~~)

[CLUSTER];

ex)

CREATE INDEX Dno_index

ON EMPLOYEE (Dno)

CLUSTER;