DynamoDB Notes
Scaling, optimization etc all handled by dynamodb
Types of nosql
- key-value databases
- document databases
- column-family databases
- Graph databases
DynamoDB is a mix of key-value and document database
Throughput units
Datamodel
items are like rows
Attributes are like columns
Keys -- single or composite
An attribute can only be up to 400kb of data
Creating tables
Keys
-
Hash Primary Key
- Unique, required, single attribute
- DynamoDB creates unordered index
- You can't creat queries on an unordered index
- Use when you know the id/key
-
Hash and Range Primary Key
- Key is comprised of two attributes
- Combination must be unique
- Unordered index + sorted range index
- Best for query, grouping scenarios
- example: client_id as hash and order_id as range
Data Types
- Scalar
- string, number, binary, boolean, null
- multi-value types
- String, number, and binary sets
- Document types
- List (array)
- Map (Object) (should use this for OJ Stripe object)
Throughput units
-
Reads in blocks of 4kb
-
Writes in blocks of 1kb
-
Read capacity units per second
-
Write capacity units per second
-
Eventually consistent vs strongly consistent reads
-
Impact of secondary indexes
Partioning
- Single partition can hold ~10gb of data
- Throughput spread across partitions
- DynamoDB automatically partitions based on hash key
- partitions are limited to about 3000 read capacity units and 1000 write capacity
- Dynamo will never shrink the size of your partitions
- Unused capacity reserved for bursts
Table design
- Avoid hot keys
- really important to avoid hot keys
- Need uniform access
- Random extension to hash key
- Time series data in multiple tables
- put into tables based on monthly or weekly data
- Test applications ahead of time
- Storage of large items elsewhwere
- Use caching solutions for popular items
Querying
-
GetItem
- uses primary key
- eventually consistent by default
-
Query
- Find item via primary key attributes(s) for table or index
- Retrieved in sorted order when using range key
-
Scan
- Reads every item in a table or index
- slow as table grows
- you can run parallel scans
-
Filters and pagination support for query or scan
Secondary Indexes
- Alternate keys for querying and scanning
- up to 5 allowed per source table
- contains all or subset of attributes
- Automatically maintained
- uses provisioned throughput
- Throughtput capacity units split between table and index
- no size limit
Global Secondary Index
- For querying non-key attributes
- Hash key or hash and range key
- different view into the data
- projected attributes get copied into the index
- query or scan only
Local Secondary Index
- Alternate range key for hash key
- For each hash key, 10GB max
- Applied when creating the table
- Basically a different index
- Projected attributes copied into the index
- Query or scan only
- throughput taken from the main table throughput