Notice
해당 post는 "The Multi-streamed Solid-State Drive"논문의 대부분을 인용 + 정리한 것이다.
https://www.usenix.org/conference/hotstorage14/workshop-program/presentation/kang
The Multi-streamed Solid-State Drive | USENIX
www.usenix.org
Paper Review
Abstract
This paper makes a case for the multi-streamied solid state drive (SSD). It offers an intuitive storage interface for the host system to inform the SSD about the expected lifetime of data being written.
Introduction
- SSD is becoming an popular main storage device
- To provide an illusion that SSDs work similary to HDDs, the FTL layer exists inside the SSD
- As the SSD is continuously written, the underlying NAND flash medium can become fragmented
- FTL tries to reclaim free space to absorb further write trafic, internal data movement(GC) are incurred
- To reduce the GC overhead the paper purposes multi-streaming, an interface mechanism that helps close the semantic gap between the host system and the SSD
- Host system can explicitly open "streams" in the SSD and send write requests to different streams according to their expected lifetime.
- Data in the same stream are not only written together to a physically related NAND flash space, but also separted from data in other streams.
Background
- SSD aging results in more frequent GC as the SSD is filled with more data and fragmented.
- There have been prior studies to improve the GC performance
- Classifying access patterns (sequential or random) ▶ Vulnerable to workloads that frequently change
- Detects and sperates hot and cold data, based on access history of locations ▶ Decrease performance when the access pattern of specific locations is changed (LFS)
- TRIM: Host tells the device that specific LBA is unmapped, reduces the valid data that needs to be copied
The Multi-Streamed SSD
- Why does traditional write pattern optimizations (LFS) fail to improve the SSD aging problem?
- Because, SSD's GC overheads depend not only on the current write pattern, but also on how data have been already placed in the SSD.
- Challenges to the SSD aging problem,
- How to predict the lifetime of a data written to the SSD
- How to ensure that data with similar lifetime are placed in the same erase unit
- Solution (Multi-Streaming, Stream)
- Host should provide adequate information about data lifetime!
- SSD is responsible to place data with similar lifetime into the same erase unit!
- Stream: Abstraction of SSD capacity allocation that stores a set of data with the same lifetime expectancy.
- Application can use the `fadvise` system call to pass the stream ID for a `file`
Evaluation
- Workload: Cassandra
- SSTable(Sorted Strings Tables): Append only tables that are immutable
- Memtable: In memory data structure
- CommitLogs: Used to recover from sudden power failure
- SStables are cperiodically compacted to form a new larger SSTable to reduce the space and time overheads of maintaining many fragmented SSTables
- Four different model types to compare
- Normal: Single stream conventional SSD
- Single: Seperate Cassandra data from system data. (examples of system data are ext4 file system meta and journaling data)
- Multi-Log: Utilized total 3 streams, one additional stream compared to the "Single" model is used for CommitLog traffic.
- Multi-Data: Further seperates SSTables in different tiers to three independent streams
- Intuition: SSTables in the same tier would have similar lifetime
- Results:
- TRIM is shown to be critically important for the sustained performance ▶ Tested on the Normal Model
- GC overheads correlate very well with the throughput
- Multi-Data outperforms all other configurations and sustains the thorughput
- Additional Experiment ▶ In-SSD mechanism to detect multiple sequential access patterns on the fly and assign adequate stream ID
- Result was marginal(minor)
- Due to how LBAs are allocated by the ext4 file system, large files may not get sequential LBAs due to fragmentation
- Expanding existing files may not get consecutive LBAs
- Therefore Host should provide the Stream information!
- Result was marginal(minor)
- There are many workloads that may benefit from the multistream SSD
- LSM Tree using DB
- Commit logs, undo logs, temporary table data in OLTP applications
- Multi-head LFS
- Flash storage OS
Personal Notes