DynamoDB Notes
Scaling, optimization etc all handled by dynamodb
Types of nosql
- key-value databases
- document databases
- column-family databases
- Graph databases
DynamoDB is a mix of key-value and document database
Throughput units
items are like rows
Attributes are like columns
Keys -- single or composite
An attribute can only be up to 400kb of data
Creating tables
Hash Primary Key
- Unique, required, single attribute
- DynamoDB creates unordered index
- You can't creat queries on an unordered index
- Use when you know the id/key
Hash and Range Primary Key
- Key is comprised of two attributes
- Combination must be unique
- Unordered index + sorted range index
- Best for query, grouping scenarios
- example: client_id as hash and order_id as range
Data Types
- Scalar
- string, number, binary, boolean, null
- multi-value types
- String, number, and binary sets
- Document types
- List (array)
- Map (Object) (should use this for OJ Stripe object)
Throughput units
Reads in blocks of 4kb
Writes in blocks of 1kb
Read capacity units per second
Write capacity units per second
Eventually consistent vs strongly consistent reads
Impact of secondary indexes
- Single partition can hold ~10gb of data
- Throughput spread across partitions
- DynamoDB automatically partitions based on hash key
- partitions are limited to about 3000 read capacity units and 1000 write capacity
- Dynamo will never shrink the size of your partitions
- Unused capacity reserved for bursts
Table design
- Avoid hot keys
- really important to avoid hot keys
- Need uniform access
- Random extension to hash key
- Time series data in multiple tables
- put into tables based on monthly or weekly data
- Test applications ahead of time
- Storage of large items elsewhwere
- Use caching solutions for popular items
- uses primary key
- eventually consistent by default
- Find item via primary key attributes(s) for table or index
- Retrieved in sorted order when using range key
- Reads every item in a table or index
- slow as table grows
- you can run parallel scans
Filters and pagination support for query or scan
Secondary Indexes
- Alternate keys for querying and scanning
- up to 5 allowed per source table
- contains all or subset of attributes
- Automatically maintained
- uses provisioned throughput
- Throughtput capacity units split between table and index
- no size limit
Global Secondary Index
- For querying non-key attributes
- Hash key or hash and range key
- different view into the data
- projected attributes get copied into the index
- query or scan only
Local Secondary Index
- Alternate range key for hash key
- For each hash key, 10GB max
- Applied when creating the table
- Basically a different index
- Projected attributes copied into the index
- Query or scan only
- throughput taken from the main table throughput