Characteristics | Data Warehouse
| Data Lake |
Data | Relational data from transactional systems, operational databases, and line of business applications
| All data, including structured, semi-structured, and unstructured
|
Schema | Often designed prior to the data warehouse implementation but also can be written at the time of analysis(schema-on-write or schema-on-read)
| Written at the time of analysis (schema-on-read)
|
Price/Performance
| Fastest query results using local storage
| Query results getting faster using low-cost storage and decoupling of compute and storage
|
Data quality
| Highly curated data that serves as the central version of the truth
| Any data that may or may not be curated (i.e. raw data)
|
Users | Business analysts, data scientists, and data developers
| Business analysts (using curated data), data scientists, data developers, data engineers, and data architects
|
Analytics | Batch reporting, BI, and visualizations
| Machine learning, exploratory analytics, data discovery, streaming, operational analytics, big data, and profiling
|
Characteristics | Data Warehouse
| Transactional Database |
Suitable workloads
| Analytics, reporting, big data
| Transaction processing |
Data source
| Data collected and normalized from many sources | Data captured as-is from a single source, such as a transactional system
|
Data capture
| Bulk write operations typically on a predetermined batch schedule
| Optimized for continuous write operations as new data is available to maximize transaction throughput
|
Data normalization
| De-normalized schemas, such as the Star schema or Snowflake schema
| Highly normalized, static schemas
|
Data storage | Optimized for simplicity of access and high-speed query performance using columnar storage
| Optimized for high throughout write operations to a single row-oriented physical block
|
Data access | Optimized to minimize I/O and maximize data throughput
| High volumes of small read operations
|