[File System] The Design and Implementation of a Log-Structured File System

Paper Review

[File System] The Design and Implementation of a Log-Structured File System

telomere37 2022. 11. 14. 18:44

Notice

해당 post의 정리는"The Design and Implementation of a Log-Structured File System" 논문의 내용을 그대로 인용 및 정리한 것이다.

https://dl.acm.org/doi/10.1145/146941.146943

The design and implementation of a log-structured file system | ACM Transactions on Computer Systems

dl.acm.org

Paper Review

Abstract

This paper presents a new technique for disk storage management called a log-structured file system. A log-structured file system writes all modifications to disk sequentially in a log-like structure, thereby speeding up both file writing and crash recovery. The log is the only structure on disk; it contains indexing information so that files can be read back from the log efficiently. In order to maintain large free areas on disk for fast writing, we divie the log into segments and use a segment cleaner to compress the live infromation from heavily fragmented segments. We present a series of simulations thatdemonstrate the efficiency of a simple cleaning policy based on cost and benefit.

Introduction

CPU speeds have increased dramatically while disk access times have only impropved slowly
LFS is based on the assumption that files are cached in main memory and that increasing memory sizes will make the caches more and more effective at satifying read requests ▶Disk traffic becomes dominated by writes
New information to disk is saved in a sequential way (Log) ▶ Eliminating almost all seeks and permits faster crash recovery
LFS differs from recent studies using a log structure that it does not use the log as a temporary storage, rather is uses it as a permanent home for information.
The difficult challenge is the ability to provide large extents of free space ▶ A segment cleaner process continually regenerating empty segments by compressing the live data from heavily fragmented segments
In this paper, the authors explore different algorithms and discovered a simple but effective algorithm based on cost and benefit ▶ Segregates older, more slowly changing data from young rapidly-changing data and treats them differently
Sprite LFS is at least as fast as Unix in all cases but one ▶ File Sequential read after Random Write

Design for file systems of the 1990's

File system design is goverend by two general forces: Technology and Workload
Technology
- Three components of technology are particularly significant for FS design ▶ Processors, disks, main memory
  - Processors ▶ speed increasing at a nearly exponential rate.
  - disks ▶ Improvements have been primarily on cost and capacity rather than performance
    - Two components of disk performance ▶ Transfer bandwidth and access time
  - main memory ▶ Increasing the size at an exponential rate ▶ Larger file cache possible
    - Larger file cache results in two things
      - Absorbs a greater fraction of the read request
      - Can serve bigger write buffers
Workloads
- One of the most difficult workloads for file systems designs to handle efficiently is found in office and engineering environments ▶ Access to small files ▶ Small random disk access and the creation and deletion time is dominated by updates to the file system metadata
- Supercomputing environments ▶ Sequential accesses ▶ Does not impose interesting problems to the FS layer, rather for the hardware designers to improve bandwidth for lare-file accesses.
Problems with the existing file system
- Spreads information around the disk in a way that causes too many small accesses. ▶ FFS physically seperates different files and inodes, directory entries containing the file's name.
  - Creating a file in Unix FFS ▶ 2 for file's attributes, one for file data, one for directory's data, one for directory's attributes
- Current FS tend to write synchronously ▶ Application must wait for the write to complete

Log-structured file systems

The fundamental idea of LFS is to improve the write performance by buffering a sequence of file system changes in the file cache and then writing all the changes to disk sequentially in a single disk wirte operation.
For workloads that contain many small files, a LFS can convert many small synchronous random writes of traditional FS into large asynchronous sequential transfers
Although it might seem simple there are two issues to resolve for this perfect idea.
- How to retrieve(read) information from the log
- How to manage the free space on disk to so that large extents of free space are always available for writing
File location and Read
- Basic goal was to match or exceed the read performance of the FFS
- Use index structures in the log to permit random-access retrievals
- Inodes in FFS ▶ Fixed position, Inodes in LFS ▶ Random position (Log), Therefore it needs an inode map
- Inode map ▶ divided into blocks that are written to the log, the active parts can be cached in main memory, and all the inode map blocks are recorded in a fixed checkpoint region
Free Space management: Segments

Crash Recovery

Experience with the Sprite LFS

Related Work

'Paper Review' 카테고리의 다른 글

[OS] Practical, transparent operating system support for superpages (0)	2022.12.12
[OS] Scheduler Activations: Effective Kernel Support for the User-Level Management of Parallelism (0)	2022.12.11
[Storage] ZNS: Avoiding the Block Interface Tax for Flash-based SSDs (0)	2022.11.13
[Storage] The Multi-streamed Solid-State Drive (0)	2022.11.13
[File System] F2FS: A New File System for Flash Storage (0)	2022.11.11

현재글[File System] The Design and Implementation of a Log-Structured File System

어제의 나를 이기자

Today :
Yesterday :

일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30