๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ

๐Ÿ“š ์ „๊ณต ๊ณต๋ถ€/DB๊ธฐ์ดˆ

[DB] Disk, ํŒŒ์ผ๊ตฌ์กฐ, Hashing, ์ €์žฅ์†Œ ์„ค๊ณ„

DBs are stored physicaly as files of records stored on magnetic disks.

 

storage hierarchy

  • primary storage : ์ฃผ๊ธฐ์–ต์žฅ์น˜
    • operated on directly by the CPU
  • Secondary storage : ๋ณด์กฐ๊ธฐ์–ต์žฅ์น˜, ์˜จ๋ผ์ธ
  • Tertiary storage : ์˜คํ”„๋ผ์ธ ์•„์นด์ด๋ธŒ (์˜จ๋ผ์ธ์œผ๋กœ ๋กœ๋“œ ํ•„์š”)

Memory Hierarchies and storage devices

์ƒ์œ„ ์ €์žฅ์†Œ : ๋น ๋ฆ„, ์ž‘์Œ, ๋น„์Œˆ

ํ•˜์œ„ ์ €์žฅ์†Œ : ๋Š๋ฆผ, ํผ, ์Œˆ

Storage of DBs

  • ๋ฌผ๋ฆฌ์  DB ์„ค๊ณ„
  • files of records : locate them efficiently

Primary FIle organizations

  1. heap file(unordered file) : ์ฒจ๊ฐ€ ์ˆœ์„œ๋Œ€๋กœ ์ €์žฅ (์‚ฝ์ž… ์‰ฌ์›€)
  2. sorted file (sequential file) : ํ‚ค๊ฐ’์— ๋”ฐ๋ผ ์ •๋ ฌ
  3. hashed file : ํ•ด์‹œ ํ•จ์ˆ˜๋กœ ์ €์žฅ์œ„์น˜ ๊ฒฐ์ •

Secondary File organizations

  • auxiliary access structure (๋ณด์กฐ ์ ‘๊ทผ ๊ตฌ์กฐ)
  • ๋ณด์กฐ ํ•„๋“œ์˜ ํšจ์œจ์ ์ธ ์ ‘๊ทผ. ์ฃผ ํŒŒ์ผ์—์„œ ์‚ฌ์šฉ๋œ.

<secondary storage devices>

hardware description of disk

  • fixed angle : ๊ณ ์ •๊ฐ, ๊ฐ„ํŽธํ•˜์—ฌ ์ž์ฃผ ์‚ฌ์šฉ๋จ
  • uniform recording density : ์ผ์ •ํ•œ density๋กœ ๋ถ„๋ฆฌ, ๊ฐ™์€ ๊ฐ๋„์ง€๋งŒ ๊ฐฏ์ˆ˜๊ฐ€ ๋‹ค๋ฆ„

์ž๊ธฐ๋””์ŠคํŠธ (magnetic disk) : ์ง์ ‘ ์ ‘๊ทผ ์ €์žฅ ์žฅ์น˜ (DASD : direct access storage device)

  • ํŠธ๋ž™ ์ˆ˜ ๋งŒํผ์˜ ์‹ค๋ฆฐ๋”๊ฐ€ ์กด์žฌ.
  • blocks / pages : ๋””์Šคํฌ ํฌ๋งท ์ค‘ OS์— ์˜ํ•ด ์„ค์ •๋œ ํŠธ๋ž™์˜ ๋™์ผํฌ๊ธฐ ๋ถ„ํ• 
  • ๋ธ”๋ก์€ interblock gaps(๋ธ”๋ก ๊ฐ„๊ฒฉ) ๋กœ ๊ตฌ๋ถ„๋จ

๋””์Šคํฌ ์ฃผ์†Œ๋ฒ• : addressing

  • random access addressable device
  • transfer unit between main memory and disk
  1. ์‹ค๋ฆฐ๋” ์ฃผ์†Œ๋ฒ• : ๊ฐ ํŠธ๋ž™์˜ ๊ธฐ์–ต์šฉ๋Ÿ‰์€ ์ผ์ •ํ•˜๊ณ  ๊ธฐ๋ก ๋ฐ€๋„๊ฐ€ ๋‹ค๋ฆ„
  2. ์„นํ„ฐ ๋ฐฉ๋ฒ• : ์„นํ„ฐ๋ฒˆํ˜ธ๋ฅผ ์ฃผ๊ณ  ์„นํ„ฐ ์ˆ˜์˜ ์ฆ๊ฐ€์— ๋”ฐ๋ผ ์—ฐ์†๋œ ์„นํ„ฐ(ํด๋Ÿฌ์Šคํ„ฐ)๋ฅผ ๋‹จ์œ„๋กœ ์ž…์ถœ๋ ฅ

buffer

cluster : ๋‹ค์ˆ˜์˜ ์—ฐ์† ์„นํ„ฐ

extent : ์—ฐ์†๋œ ํด๋Ÿฌ์Šคํ„ฐ, ํ•œ ๊ตฌ์—ญ. ๋ถ„์‚ฐ๋„์™€ ๋น„๋ก€

Disk controller : SCSI, SATA, SAS

data transfer operation

  • ์—ฐ์‚ฐ ์‹œ๊ฐ„ = seek time + (head ํ™œ๋™์‹œ๊ฐ„ = ๊ฑฐ์˜ 0์ด๋ผ์„œ ๊ณ ๋ ค์•ˆํ•จ) + rd + btt

ํšŒ์ „์ง€์—ฐ (rd) ⇒ ์ตœ๋Œ€:๋ฐฉ๊ธˆ ์ง€๋‚ฌ์„๋–„, ์ตœ์†Œ: ๋ฐ”๋กœ ๋‹ค์Œ์ผ๋•Œ

Bulk transfer : ๋Œ€๋Ÿ‰ ์ „์†ก = btr

  • ์—ฐ์†๋œ ๋ธ”๋Ÿญ ์ „์†ก ์‹œ ์ถ”๊ฐ€์ ์ธ btt๋งŒ ์†Œ์š”.

Making Data access more efficient on disk : ํšจ์œจ์„ฑ ํ–ฅ์ƒ

  1. ๋ฐ์ดํ„ฐ์˜ ๋ฒ„ํผ๋ง
  2. ๋””์Šคํฌ์— ์žˆ๋Š” ๋ฐ์ดํ„ฐ์˜ ์ ์ ˆํ•œ ๊ตฌ์„ฑ
  3. ์š”์ฒญ์— ์•ž์„œ์„œ ๋ฐ์ดํ„ฐ ์ฝ๊ธฐ
  4. I/O์š”์ฒญ์˜ ์ ์ ˆํ•œ ์Šค์ผ€์ค„๋ง
  5. ๋กœ๊ทธ ๋””์Šคํฌ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์“ฐ๊ธฐ๋ฅผ ์ผ์‹œ์ค‘์ง€
  6. ๋ณต๊ตฌ๋ฅผ ์œ„ํ•œ SSD๋‚˜ ํ”Œ๋ž˜์‹œ ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ

Solid State Device (SSD) storage

  • use flash memories as an intermediate layer between main memory and HDDs

HDD : ๋ฐ์ดํ„ฐ ์—ฐ์†์„ฑ์ด ์ค‘์š”, ์—…๋ฐ์ดํŠธ ์‹œ ๋ฎ์–ด์“ฐ๊ธฐ ๋จ

SSD : ์–ด๋–ค ์ฃผ์†Œ์ด๋“  ์ ‘๊ทผ์‹œ๊ฐ„์ด ๋™์ผ, ์ฝ๊ณ ์“ฐ๋Š” ํšŸ์ˆ˜๊ฐ€ ์ •ํ•ด์ ธ์žˆ์œผ๋ฏ€๋กœ ์ˆ˜๋ช…์„ ์œ„ํ•ด ๊ณจ๊ณ ๋ฃจ ์—…๋ฐ์ดํŠธ ํ•„์š”. ๋ฎ์–ด์“ฐ๊ธฐ ์—†๊ณ  ์ƒˆ๋กœ์šด ์žฅ์†Œ์— writing

Magnetic Tape Storage devices

  • disks : random access
  • magnetic tape : sequential access, blocking

๊ฐ€๋กœ : tracks, ์„ธ๋กœ : frames

  • IBG = inter block gap = inter record gap, IBG๋ณด๋‹ค record์‹œ๊ฐ„์ด ์ž‘์•„์งˆ์ˆ˜๋ก ๋น„ํšจ์œจ์ .

CD-ROM : ๋ฐ˜์‚ฌ์œจ์— ๋”ฐ๋ผ ๊ฒฐ์ •.

  • ๋‹จ์ผ ๋‚˜์„ ํ˜• ํŠธ๋ž™, ์ €์žฅ๋ฐ€๋„ ์ผ์ •, ๊ท ์ผ ์„ ํ˜• ์†๋„

DVD : cd๋ณด๋‹ค ๋” ํฐ ์šฉ๋Ÿ‰ ์ €์žฅ (๋ธ”๋ฃจ๋ ˆ์ด)

< Buffering of blocks > : ๋ฏธ๋ฆฌ ์‚ฌ์šฉํ•  ๋ธ”๋Ÿญ๋“ค์„ ์•Œ ๋•Œ ๋ฏธ๋ฆฌ ๋ฒ„ํผ๋ฅผ ํ• ๋‹นํ•˜๋ฉด ๋นจ๋ฆฌ ์ฒ˜๋ฆฌ ๊ฐ€๋Šฅ

  • ๋ฒ„ํผ๋ฅผ ๊ฐ–๋‹ค๋†“๊ณ  ๊ทธ ๋ ˆ์ฝ”๋“œ๋ฅผ ์ฒ˜๋ฆฌํ•˜๋Š” ๋™์•ˆ ๋‹ค๋ฅธ ๋ฒ„ํผ๋„ ์‹คํ–‰๊ฐ€๋Šฅ(๋ณ‘๋ ฌ ์‹œํ–‰)
  • double buffering

Buffer Management

  • buffer manager : a sw component of a DBMS
  • buffer pool
  • pin count
  • dirty bit

Buffer replacements strategies : ๋ฒ„ํผ ๊ต์ฒด ์ „๋žต

  • Least recently used : LRU
  • clock policy
  • first in first out : FIFO
  • most recently used : MRU

<Placing file records on disk>

records and record types

file, fixed length records, viriable length record

record blocking and spanned vs unspanned

์‹ ์žฅ ๋ฐฉ์‹ : ๋ ˆ์ฝ”๋“œ๋ฅผ ์ชผ๊ฐœ์„œ ๋นˆํ‹ˆ์—†์ด / ๋น„์‹ ์žฅ ๋ฐฉ์‹ : ๋ ˆ์ฝ”๋“œ ๋ณด์กด, ๊ณต๊ฐ„๋‚ญ๋น„ ์žˆ์Œ

๊ณ ์ •๊ธธ์ด blocking factor : bfr = B/R ์ดํ•˜์˜ ์ตœ๋Œ€์ •์ˆ˜

๊ฐ€๋ณ€๊ธธ์ด bfr : ๋ธ”๋ก์— ์žˆ๋Š” ๋ ˆ์ฝ”๋“œ ์ˆ˜์˜ ํ‰๊ท 

  • ํ•„์š”ํ•œ ๋ธ”๋ก ์ˆ˜ b = ๋ ˆ์ฝ”๋“œ ์ˆ˜ / bfr

allocating file blocks on disk : ๋ธ”๋ก ํ• ๋‹น

  • contiguous allocation
  • linked allocation
  • clusters
  • indexed allocation

file headers / file descriptor

  • ๋ชฉํ‘œ : ์ตœ์†Œ์˜ ์ „์†ก์‹œ๊ฐ„์„ ์œ„ํ•œ ๋ธ”๋ก ๋ฐฐ์น˜

<Operations on files> : ํŒŒ์ผ ์—ฐ์‚ฐ

  • Retrieval : ๊ฒ€์ƒ‰
  • Update : ๊ฐฑ์‹ 
  • Open/close
  • record at a time (ํ•œ๋ฒˆ์— ํ•œ ๋ ˆ์ฝ”๋“œ์”ฉ๋งŒ ์—ฐ์‚ฐ๋˜๋Š”๊ฒƒ) : reset,find,read,findnext,delete,modify,insert
  • set at a time (ํ•œ๋ฒˆ์— ์—ฌ๋Ÿฌ๊ฐœ) : findall, find n, findOrdered, Reorganize
  • Scan : find + findNext + read

<Files of unordered records (Heap files)>

heap or pile file : ์‚ฝ์ž…๋œ ์ˆœ์„œ๋Œ€๋กœ ์ €์žฅ. ์„ ํ˜•ํƒ์ƒ‰(๋ธ”๋Ÿญ์ˆ˜/2)

  • ์‚ฝ์ž… : ๋ฒ„ํผ์— ๋ธ”๋Ÿญ ๋ณต์‚ฌ, ์‚ฝ์ž…, ๋””์Šคํฌ์— ๋‹ค์‹œ ์“ฐ๊ธฐ
  • ์‚ญ์ œ : deletion marker ์‚ฌ์šฉ

To read all records in order → external sorting method (์ž˜๋ผ์„œ ๋ถ€๋ถ„์ •๋ ฌ ํ›„ merge)

Relative or direct file

  • disk ์‹ค๋ฆฐ๋”๋ž‘ ๋น„์Šทํ•œ ์—ฐ์‚ฐ

<Files of ordered records (sorted files)>

Ordered or sequentail files : ์ˆœ์ฐจํŒŒ์ผ

  • ์ด์ง„ํƒ์ƒ‰ : log2 (๋ธ”๋Ÿญ ์ˆ˜)
  • ์„ ํ˜•ํƒ์ƒ‰ : ๋ธ”๋Ÿญ์ˆ˜/2 (์–˜๊ฐ€ ํ›จ์”ฌ ํฐ ์‹œ๊ฐ„)

Searchign in ordered files

  • ํšจ์œจ์ , ์ˆœ์ฐจ๊ธฐ์ค€ ์•„๋‹Œ๊ฑธ๋กœ ๊ฒ€์ƒ‰ํ•˜๋ฉด ์˜๋ฏธ์—†์Œ

Insertion and deletion in ์ˆœ์ฐจํŒŒ์ผ

  • ordering field ๊ฐ’ ๊ธฐ๋ฐ˜์œผ๋กœ ์‚ฝ์ž…
  • ๊ณต๊ฐ„๋ถ€์กฑ ์‹œ ์ค‘๊ฐ„์ค‘๊ฐ„์˜ ์—ฌ๋ถ„๊ณต๊ฐ„(reserved space)์— ์‚ฝ์ž…ํ•˜๊ฑฐ๋‚˜ ์ƒˆ ๊ณต๊ฐ„์— ์ž„์‹œ๋ณด๊ด€
  • ์‚ญ์ œ ์‹œ deletion marker, periodic reorganizaiton

Using a Transaction File

  • unordered file : overflow of transaction file

์ˆ˜์ •๋‚ด์šฉ์„ ๋ณด๊ด€ํ•จ. ์‚ฝ์ž…์€ ํšจ์œจ์ ์ด์ง€๋งŒ ๊ฒ€์ƒ‰์€ ๋ณต์žก

  • ๋น„์ •๋ ฌ ํ•„๋“œ ์ˆ˜์ • : ๋ ˆ์ฝ”๋“œ ๋ณ€๊ฒฝํ›„ ๋ฎ์–ด์“ฐ๊ธฐ
  • ์ •๋ ฌ ํ•„๋“œ ์ˆ˜์ • : old ๋ ˆ์ฝ”๋“œ๋ฅผ ์ง€์šฐ๊ณ  new ๋ ˆ์ฝ”๋“œ๋ฅผ ์‚ฝ์ž…

Reorganization

  1. sort the records in the overflow file
  2. merge them with the master file

< Average Access Times for a file of b Blocks under basic file organizations >

<Hashing Techniques>

ํ‚ค๊ฐ’ ํ•จ์ˆ˜๋กœ ์ฃผ์†Œ๋ฅผ ๊ตฌํ•จ.

search for the record within the block can be carried out in a main memory buffer

neeed only a single block access

Internal Hashing

Collision → open addressing, chaining, multiple hashing

Collision Resolutions

  • open addressing : ์ž…๋ ฅํ•œ ๋‹ค์Œ์ฃผ์†Œ์— ์ €์žฅ
  • chaining : ๋ณ„๋„์˜ ๋ธ”๋Ÿญํ• ๋‹น ํ›„ ์—ฐ๊ฒฐ๋ฆฌ์ŠคํŠธ๋กœ ์—ฐ๊ฒฐ
  • multiple hashing : applies a second hash function
  • ์ €์žฅ์„ ์ ๊ฒŒํ• ์ˆ˜๋ก ์ถฉ๋Œํ™•๋ฅ ์ด ๋‚ฎ์•„์ง€๋ฏ€๋กœ ์—ฌ์œ ๊ณต๊ฐ„์„ ๋‚จ๊ธฐ๊ธฐ

External Hashing for Disk files

  • hashing foir disk files is called external hashing
  • buckets
  • key → a relative bucket number
  • use a variation of chaining in each bucket

์ƒ๋Œ€๋ฒ„ํ‚ท๋ฒˆํ˜ธ๋ฅผ ์ด์šฉํ•˜์—ฌ ๋””์Šคํฌ์ƒ์˜ ์‹ค์ œ ์ฃผ์†Œ ์ฐพ๊ธฐ

operations on ์™ธ๋ถ€ ํ•ด์‹ฑ

  • ๊ฒ€์ƒ‰ : ๋น„์ •๋ ฌ ํŒŒ์ผ์—๋Š” ๋น„์‹ผ ์—ฐ์‚ฐ. overflow ๋ผ์ธ์„ ๋”ฐ๋ผ์„œ ๊ฐ
  • ์‚ญ์ œ : ๋ฒ„ํ‚ท์—์„œ ์ง€์šฐ๊ณ  ์˜ค๋ฒ„ํ”Œ๋กœ์šฐ ๋ผ์ธ ๋‹ค์Œ๊ฐ’์„ ์ฑ„์›Œ๋„ฃ๊ธฐ

  • ์ˆ˜์ • : ๊ฒ€์ƒ‰ ํ›„ ํ•ด์‹œํ•„๋“œ๊ฐ€ ์•„๋‹ˆ๋ฉด ๊ทธ๋ƒฅ ๋ฎ์–ด์“ฐ๊ณ , ํ•ด์‹œํ•„๋“œ์ผ๋•Œ๋Š” ์‚ญ์ œ ํ›„ ์‚ฝ์ž…

Static hashing : a fixed number of buckets

reorganize : change the number of blocks and ์ƒˆ๋กœ์šด ํ•ด์‹œํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉ → ๋ถ€๋‹ด์ด ํฐ ์—ฐ์‚ฐ

Hashing Techniques that allow dynamic file expansion

  • extendible hashing
  • linear hashing
  • dynamic hashing

์•ก์„ธ์Šค ๊ตฌ์กฐ๋Š” ๋ ˆ์ฝ”๋“œ์˜ ํ•ด์‹œ ๊ฐ’์˜ ์ด์ง„ ํ‘œํ˜„์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•ฉ๋‹ˆ๋‹ค.

Extendible hashing

  • ์ด์ง„์ˆ˜ ์ฃผ์†Œ ์‚ฌ์šฉ
  • d=global depth of the directory, d` = local depth (≤ d)

d = d` and overflow : ๋”๋ธ”๋ง

d > d` for all the buckets : ํ•ด๋ธ”๋ง

 

๋ ˆ์ฝ”๋“œ์˜ ๋ชจ์กฐํ‚ค(pseudo key) : key๊ฐ’์„ ์ด์ง„ ์ •๋ณด๋กœ ๋ฐ”๊ฟ”์ค„๋•Œ ์‚ฌ์šฉ

๐Ÿ’ก ํ™•์žฅํ˜• ํ•ด์‹ฑ์˜ ์žฅ๋‹จ์ 

  • ์žฅ์  : ์„ฑ๋Šฅ์ €ํ•˜๊ฐ€ ์—†๊ณ , ๋ฏธ๋ฆฌ ๊ณต๊ฐ„ํ• ๋‹นํ•  ํ•„์š” ์—†๊ณ , ์„ฑ๋Šฅ์— ๋ฏธ์น˜๋Š” ์˜ํ–ฅ์ด ์ ์Œ
  • ๋‹จ์  : ์ •์  ํ•ด์‹ฑ์—์„œ๋Š” ํ•œ๋ธ”๋Ÿญ ์ ‘๊ทผํ•˜๋ฉด ๋˜์ง€๋งŒ ํ™•์žฅํ˜•์€ 2๋ธ”๋Ÿญ์— ์ ‘๊ทผํ•ด์•ผํ•จ (h(K)→ ๋””๋ ‰ํ† ๋ฆฌ→ ๋ฒ„ํ‚ท)

 

Dynamic hashing

Linear hashing

  • overflow : split into two buckets. ์˜ค๋ฒ„ํ”Œ๋กœ๋œ ๋ฒ„ํ‚ท์˜ ๋ ˆ์ฝ”๋“œ๋Š” ๋‹ค๋ฅธ ํ•ด์‹œ ํ•จ์ˆ˜๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๋‘ ๋ฒ„ํ‚ท ์‚ฌ์ด์— ๋ถ„๋ฐฐ.
  • ์‹ค์ œ ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ 2๋ฐฐ๊ฐ€ ๋˜๋Š”๊ฒƒ์€ ์•„๋‹˜

< Parallelizing Disk access using RAID >

  • RAID : redundant arrays of inexpensive/independent disks
  • ์ž‘์€ ์—ฌ๋Ÿฌ๊ฐœ์˜ ๋””์Šคํฌ๋ฅผ ์ด์–ด์„œ ํฐ ํ•˜๋‚˜์˜ ์ €์žฅ์†Œ์ฒ˜๋Ÿผ ์‚ฌ์šฉ.
  • Data striping : ๋””์Šคํฌ ์„ฑ๋Šฅ์„ ๊ฐœ์„ . (์—ฌ๋Ÿฌ ๋””์Šคํฌ์— ๋ถ„ํ• ์ €์žฅ)
    • bit level
    • block level

Improving Reliability with RAID : ์‹ ๋ขฐ๋„ ๋†’์ด๊ธฐ

  • ๋””์Šคํฌ๊ฐ€ ๋งŽ์„์ˆ˜๋ก ๊ณ ์žฅ์œจ์ด ๋†’๋‹ค.
  • MTBF (Mean time between failures) of a disk = 22.8 years
  • MTBF of 100 disk drives = 83.8 day
  • ์†”๋ฃจ์…˜ : ๋ฐ์ดํ„ฐ๋ฅผ ์ค‘๋ณต์‹œํ‚ค๋ฉด ๋จ. ๊ทผ๋ฐ ์ถ”๊ฐ€์  ์—ฐ์‚ฐ์ด ํ•„์š”ํ•˜๊ณ  ๋น„์šฉ์ด ๋งŽ์ด ๋“ ๋‹ค.

Improving Performance with RAID

  • the technique of data striping achieves ๋” ๋†’์€ ์ „์†ก ์†๋„(์ „์†ก์œจ, tr)

RAID organizations and levels

  • the two factors of granularitiy of data interleaving(striping)
  • ์ปดํ“จํ„ฐ๊ฐ€ ์ค‘๋ณต์ •๋ณด๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ํŒจํ„ด

 

RAID level 0~6