Notice
해당 post의 정리는"The Design and Implementation of a Log-Structured File System" 논문의 내용을 그대로 인용 및 정리한 것이다.
https://dl.acm.org/doi/10.1145/146941.146943
The design and implementation of a log-structured file system | ACM Transactions on Computer Systems
This paper presents a new technique for disk storage management called a log-structured file system. A log-structured file system writes all modifications to disk sequentially in a log-like structure, thereby speeding up both file writing and crash ...
dl.acm.org
Paper Review
Abstract
This paper presents a new technique for disk storage management called a log-structured file system. A log-structured file system writes all modifications to disk sequentially in a log-like structure, thereby speeding up both file writing and crash recovery. The log is the only structure on disk; it contains indexing information so that files can be read back from the log efficiently. In order to maintain large free areas on disk for fast writing, we divie the log into segments and use a segment cleaner to compress the live infromation from heavily fragmented segments. We present a series of simulations thatdemonstrate the efficiency of a simple cleaning policy based on cost and benefit.
Introduction
- CPU speeds have increased dramatically while disk access times have only impropved slowly
- LFS is based on the assumption that files are cached in main memory and that increasing memory sizes will make the caches more and more effective at satifying read requests ▶Disk traffic becomes dominated by writes
- New information to disk is saved in a sequential way (Log) ▶ Eliminating almost all seeks and permits faster crash recovery
- LFS differs from recent studies using a log structure that it does not use the log as a temporary storage, rather is uses it as a permanent home for information.
- The difficult challenge is the ability to provide large extents of free space ▶ A segment cleaner process continually regenerating empty segments by compressing the live data from heavily fragmented segments
- In this paper, the authors explore different algorithms and discovered a simple but effective algorithm based on cost and benefit ▶ Segregates older, more slowly changing data from young rapidly-changing data and treats them differently
- Sprite LFS is at least as fast as Unix in all cases but one ▶ File Sequential read after Random Write
Design for file systems of the 1990's
- File system design is goverend by two general forces: Technology and Workload
- Technology
- Three components of technology are particularly significant for FS design ▶ Processors, disks, main memory
- Processors ▶ speed increasing at a nearly exponential rate.
- disks ▶ Improvements have been primarily on cost and capacity rather than performance
- Two components of disk performance ▶ Transfer bandwidth and access time
- main memory ▶ Increasing the size at an exponential rate ▶ Larger file cache possible
- Larger file cache results in two things
- Absorbs a greater fraction of the read request
- Can serve bigger write buffers
- Larger file cache results in two things
- Three components of technology are particularly significant for FS design ▶ Processors, disks, main memory
- Workloads
- One of the most difficult workloads for file systems designs to handle efficiently is found in office and engineering environments ▶ Access to small files ▶ Small random disk access and the creation and deletion time is dominated by updates to the file system metadata
- Supercomputing environments ▶ Sequential accesses ▶ Does not impose interesting problems to the FS layer, rather for the hardware designers to improve bandwidth for lare-file accesses.
- Problems with the existing file system
- Spreads information around the disk in a way that causes too many small accesses. ▶ FFS physically seperates different files and inodes, directory entries containing the file's name.
- Creating a file in Unix FFS ▶ 2 for file's attributes, one for file data, one for directory's data, one for directory's attributes
- Current FS tend to write synchronously ▶ Application must wait for the write to complete
- Spreads information around the disk in a way that causes too many small accesses. ▶ FFS physically seperates different files and inodes, directory entries containing the file's name.
Log-structured file systems
- The fundamental idea of LFS is to improve the write performance by buffering a sequence of file system changes in the file cache and then writing all the changes to disk sequentially in a single disk wirte operation.
- For workloads that contain many small files, a LFS can convert many small synchronous random writes of traditional FS into large asynchronous sequential transfers
- Although it might seem simple there are two issues to resolve for this perfect idea.
- How to retrieve(read) information from the log
- How to manage the free space on disk to so that large extents of free space are always available for writing
- File location and Read
- Basic goal was to match or exceed the read performance of the FFS
- Use index structures in the log to permit random-access retrievals
- Inodes in FFS ▶ Fixed position, Inodes in LFS ▶ Random position (Log), Therefore it needs an inode map
- Inode map ▶ divided into blocks that are written to the log, the active parts can be cached in main memory, and all the inode map blocks are recorded in a fixed checkpoint region
- Free Space management: Segments
Crash Recovery
Experience with the Sprite LFS
Related Work