아무것도 몰라요

Paper Review

[OS] Scheduler Activations: Effective Kernel Support for the User-Level Management of Parallelism

telomere37 2022. 12. 11. 21:47

 

 

Scheduler activations: effective kernel support for the user-level management of parallelism: ACM SIGOPS Operating Systems Revie

Threads are the vehicle for concurrency in many approaches to parallel programming. Threads separate the notion of a sequential execution stream from the other aspects of traditional UNIX-like processes, such as address spaces and I/O descriptors. The ...

dl.acm.org


 

Abstract

  • Existing Problem
    1. Performance of Kernel Trheads is inherently worse than that of user-level threads
    2. Kernel threads are the wrong abstraction on which to support user-level management of parallelism
  • Solution
    • Implement a new kernel interface and user-level thread package that together provide the same functionality as kernel threads without compromising the performance and flexibility advantages of user-level management of parallelism

 


Introduction

  • In order to provide high level of perallelism, the cost of creating and managing paralleism should be low
  • One way of parallel prgoram: Sharing memory between processes ▶ Too inefficient ▶Use of threads
  • Two ways to use thread level parallelism
    • User-level
      • Managed by runtime library routines linked into each application
      • Requires no kernel intervention
      • Excellent performance (PCR, FastThreads)
      • Cost is within an order of magnitude of the cost of a procedure call
      • Flexible (Can be customized)
      • Execute within the context of traditional processes ▶ Can exhibit poor performance because of the distorted equivalence between virtual and physical processors.
    • Kernel-level
      • Multiprocessor operating systems provide direct kernel support for multiple threads per address space.
      • Using kernel threads avoids the system integration problems, because kernel directly schedules each application's threads onto physical processors.
      • Heavyweighted
      • Performance an order of magnitude higher than traditional processes
      • Performance and order of magnitude wrose than best-case performance of user-level threads
      • Because of performance issue, user-level threads implemented on top of kernel-level threads ▶ exactly the same problem of normal user-level threads
  • Dilemma!
    • User-level threads: Good performance, correct behavior under uniprogrammed with no I/O
    • Kernel-level threads: Worse performance but has less restriction
  • Goul of the paper
    • For common cases where thread operations do not need kernel intervention, provide the best performance equal to user-level threads
    • For infrequent cases where kernel intervention should be involved(processor reallocation, I/O), mimic the behavior of a kernel thread management system 
      • No processor idles when there are ready threads
      • No high-priority thread waits for low-priority threads
      • When a thread traps to the kernel to block, the processor can be used to run another thread
    • User-level part of the system is structure to simplify application specific customization. (easy to change the scheduling policy, provide different concurrency model)
  • In order to achieve such goals
    • the kernel should be able to access the user-level scheduling information (how much parallelism there is)
    • user-level software needs to be aware of kernel events (processor reallocation, I/O request, etc)
  • Approach
    • Provide each application witha virtual multiprocessor ▶ an abstraction of dedicated physical machine
    • Each application knows how many (virtual) processors have been allocated
      ▶ Kernel should notify the address space thread scheduler of every kernel event that affects the address space
    • The OS has complete control over the allocation of processor among address spaces(applications)
      ▶ Thread system in each address space notifies the kernel of actions that can affect processor allocation decisions
  •  Scheduler activations ▶ vectors control from the kernel to the address space thread scheduler on a kernel event

User-Level Threads: performance advantages and functionality limitations

  •  User-level thread advantage
    • Flexibility
    • Performance
  • Kernel-Level thread disadvantage
    • Cost of accessing thread managment operations are high ▶ Should cross kernel boundaries
    • Cost of generality is high ▶ Should be able to support a majority of applications
    • Experiment
      • Null-fork: 34 usec (Overhead of forking a thread)
      • Signal-Wait: 37 usec (Overhead of synchronizing two threads)
  • Difficulties that arise when user-level threads are built on top of kernel-level threads
    • Kernel threads block, resume, and preempt without nofiying the user level
    • Kernel threads are scheduled obliviously with respect to the user-level thread state
      • Solution: Could create more kernel threads than physical processors
        ▶ Problem: scheduling is hard
        • user level thread may hold spin locks
        • priority issues

Effective kernel support for the user-level management of parallelism

  • Kernel allocates processors to address spaces
  • Address space schedules the threads on allocated processors
  • Kerenl notifies the user-level thread system on changes of number of processors allocated, user-level thread block
  • User-level thread system notifies the kernel when more or less processors are needed
  • Application sees no difference
  • Scheduler Activation
    • Communication between the kernel processor allocator and the user-level thread system
    • Roles
      1. Serves as a vessel for running user-level threads
      2. Notifies user-level thread system of kernel event
      3. Provides space in kernel for saving the processor context of the current user-level thread when blocked
    • Two execution stacks (kernel, application)
    • When a user-level thread calls into the kernel, it uses its activation's kernel stack
    • The user-level thread scheduler runs on the activation's user-level stack
    • Kernel maintains an activation control block to record the state of the blocked user level thread
    • How a program starts to work
      1. Kernel creates a scheduler activation
      2. Assigns it to a processor
      3. upcalls into the application address space at a fixed entry point
      4. User-level thread management system receives the upcall
      5. User-level thread management system uses that activation as the context to initialize itself and run the main application thread
      6. As the program create additional threads, kernel creates mor scheduler activation or each processor and upcalls into the user level
    • Difference between kernel threads ▶ when an activation's user-level thread is stopped by the kernel, the thread is never directly resumed by the kernel. A new scheduler activation is created to notify the user-level thread system that the thread has been stopped.
  • Scheduler Activation Upcall Points
    • Add this processor (Processor #)
    • Processor(Scheduler activation) has been preempted (preempted activation #, machine state)
    • Scheduler activation has blocked (blocked activation #)
    • Scheduler activation has unblocked (unblocked activation#, machine state)

  • Some details
    1. If threads have priorities, an additional preemption may have to take place
    2. Application is free to build any other concurrency model on top of scheduler activations
    3. scheduler activations work properly even when a preemption or a page fault occurs in the user-level thread manager, when no user-level thread is running
    4. A user-level thread that has blocked in the kernel may still need to execute further in kernel mode when the I/O completes
  • Notifying the Kernel
    • User level thread system need not tell kernel about every thread operation, but only about the small subset that can affect the kernel's processor allocation decision
    • More runnable threads than processors or More processors than runnable threads
    • Kernel call
      • Add more processors (additional # of processors needed)
      • This processor is idel()
    • A malicious process may use multiple processors
      • Multi-level feedback can be used
      • Favor address spaces that use fewer processors, shortest job first algorithm
  • Handling Critical Section
    • User-level thread could be executing in a critical section
    • Two possibilities: poor performance, dead lock
    • Solution: Prevention and Recovery
      • Prevention
        • Inopportune preemptions are avoided through the use of scheduling and locking protocol between the kernel and user level ▶ Requires the kernel to yield control over processor allocation to the user-level
      • Recovery
        • When a user-level thread has been preempted or unblocked, thre thread system checks if the thread was executing in a critical section. The thread continues temporarily until exits the critical section

Implementation

  • Processor Allocation Policy (Kernel)
    • Policy: "Space-shares" ▶ divided evenly among the highest priority address spaces
  • Thread scheduling Policy (Address space)
    • Application's choice
  • Performance Enhancements (optimization)
    1. Critical Section ▶ Checking whether or not if it were in the critical section ▶ make a copy of every low-level critical section (delimit each critical section in C source  code, post-process the compiler generated assembly)
    2. Management of scheduler activation ▶ caching scheduler activations, discarding in bulk

Performance

  • Cost of user-level thread operations
    • slightly lower than FastThreads
      • Null Fork - 3usec, increment, decrement the number of busy threads and determining wheter to notify kernel
      • Signal Wait - 5usec, cost of checking whether a preempted thread is being resumed
  • Cost of communication between the kernel and the user level (upcalls)
    • Upcall performance is important because...
      • Helps determine the "break-even" point, neede for user-level threads to begin to outperform kernel threads
      • Determines how long other threads running in the application may have to wait for a critical resouce held by the preempted thread
    • Expected Implementation error(?)
      • Implemented on the existing implementation of the Topaz Kernel thread system
      • Topaz thread system is written in carefully tuned assembler
  • Overall effect on the performance of applications
    • When application makes minimal use of kernel services (Using only User-level threads)
      • Topaz Lock overhead (kernel thread overhead) is much greater because of lock contention
    • When application requires kernel involvement because of I/O (Kernel intervention)
      • When the working set does not fit in memory a sharp decrease in performance
      • FastThreads degrades more quickly ▶ virtual processor blocks with user-level threads
    • Effect on multiprogramming causing system-induced kernel events
      • FastThreads: physical processors idling waiting for a lock to be release 
      • Topaz Threads: common thread operations are more expensive

Conclusion

The authors tried to provide an interface of parallelism to the application. Their main goal was to provide the functionality of kernel threads while achieving the performance and flexibility of user levle threads. They came up with the idea of virtual multiprocessor. Where...

  • Kernel allocates processors
  • Address space schedules threads
  • Kernel notifies events (preemption, unblock, block)
  • Address space notifies events that affect processor allocation decisions

To talk to the address space, the kernel uses something  called schedular activations. It  is the execution context for vectoring control from the krenel to the address space on a kernel event.