DynamoDB Notes

Scaling, optimization etc all handled by dynamodb

Types of nosql

  • key-value databases
  • document databases
  • column-family databases
  • Graph databases

DynamoDB is a mix of key-value and document database

Throughput units

Datamodel

items are like rows

Attributes are like columns

Keys -- single or composite

An attribute can only be up to 400kb of data

Creating tables

Keys

  • Hash Primary Key

    • Unique, required, single attribute
    • DynamoDB creates unordered index
      • You can't creat queries on an unordered index
    • Use when you know the id/key
  • Hash and Range Primary Key

    • Key is comprised of two attributes
    • Combination must be unique
    • Unordered index + sorted range index
    • Best for query, grouping scenarios
    • example: client_id as hash and order_id as range

Data Types

  • Scalar
    • string, number, binary, boolean, null
  • multi-value types
    • String, number, and binary sets
  • Document types
    • List (array)
    • Map (Object) (should use this for OJ Stripe object)

Throughput units

  • Reads in blocks of 4kb

  • Writes in blocks of 1kb

  • Read capacity units per second

  • Write capacity units per second

  • Eventually consistent vs strongly consistent reads

  • Impact of secondary indexes

Partioning

  • Single partition can hold ~10gb of data
  • Throughput spread across partitions
  • DynamoDB automatically partitions based on hash key
  • partitions are limited to about 3000 read capacity units and 1000 write capacity
  • Dynamo will never shrink the size of your partitions
  • Unused capacity reserved for bursts

Table design

  • Avoid hot keys
    • really important to avoid hot keys
    • Need uniform access
    • Random extension to hash key
  • Time series data in multiple tables
    • put into tables based on monthly or weekly data
  • Test applications ahead of time
  • Storage of large items elsewhwere
  • Use caching solutions for popular items

Querying

  • GetItem

    • uses primary key
    • eventually consistent by default
  • Query

    • Find item via primary key attributes(s) for table or index
    • Retrieved in sorted order when using range key
  • Scan

    • Reads every item in a table or index
    • slow as table grows
    • you can run parallel scans
  • Filters and pagination support for query or scan

Secondary Indexes

  • Alternate keys for querying and scanning
  • up to 5 allowed per source table
  • contains all or subset of attributes
  • Automatically maintained
  • uses provisioned throughput
    • Throughtput capacity units split between table and index
  • no size limit

Global Secondary Index

  • For querying non-key attributes
  • Hash key or hash and range key
    • different view into the data
  • projected attributes get copied into the index
  • query or scan only

Local Secondary Index

  • Alternate range key for hash key
  • For each hash key, 10GB max
  • Applied when creating the table
  • Basically a different index
  • Projected attributes copied into the index
  • Query or scan only
  • throughput taken from the main table throughput