Study guide for CMPU 334 Final, Spring 2020 This study guide is supposed to help you prepare for the final exam. The final exam will be a similar format to the concurrency midterm due to our current quarantine situation. I hope everyone stays healthy and has enjoyed learning about operating systems! The best resource is to read the books and watch the lectures. But here I will try to highlight parts that I think are most important, especially homeworks from the book's website that you should definitely practice!!! Hint Hint!!!! This study guide is not a guarantee of what will or will not be on the test, it's just another (of many) resources so use/don't use at your own risk!!! Overview: Why learn about OS? The OS gives you a peek under the covers into how everything really works. Application programmers complain that programming is hard, but they have no idea! The OS makes it easy for them! The OS has 2 main roles (both of which make things easy for the programmer) - virtual machine - standard library And it always tries to maintain both safety and efficiency while performing those roles. We followed the "three easy pieces" organization of topics: virtualization, concurrency and persistence. Part I: Virtualization Recall the basic diagram of a computer. The CPU and memory are two resources that we focus on with virtualization. How do we make it look like a CPU is available for every program regardless of how many CPUs are on the system? How do we make it look like each program has its own dedicated memory? These are the questions we answer in this unit. First, CPU. The main abstraction here is the "process". Review the three states a process can be in: READY, RUNNING, BLOCKED, and why/when a process transitions. Now that you know about concurrency primitives, hopefully you can see where a condition variable or semaphore may be handy for the BLOCKED state! Under the covers, we discussed the mechanisms around this abstraction, primarily Limited Direct Execution (LDE). For example, refer to the diagrams (e.g., Figure 6.2) in the book: http://pages.cs.wisc.edu/~remzi/OSTEP/cpu-mechanisms.pdf What hardware primitives did we need? Why? How do the OS, hardware, user process cooperate to make this happen? How does the OS maintain control/safety? Can you think through what would happen when switching between processes? Now that you know about timer interrupts and scheduling algorithms, do you see how LDE underpins everything? Do you see when you would switch page tables (via the PTBR) and registers when switching between different processes? Speaking of scheduling, you should review some of the common scheduler policies: FIFO, SJF, RR, MLFQ. What metrics (e.g., turnaround time, response time) do they try to optimize for? What is different about the approach taken by proportional share schedulers? We spent a long time on virtual memory, and it's a big topic. Remember the goal of virtual memory: this beautiful illusion that every process has its own private memory. Don't forget exactly how often the CPU is translating a memory address from virtual to physical. Good thing we have TLBs, huh? Although we spent the most time on paging (and indeed paging is the dominant memory translation technology out there), make sure you remember segmentation and understand the tradeoffs between it and paging. Be sure to brush up on basic, linear page table translations. Definitely go to the book's [homework site] (http://pages.cs.wisc.edu/~remzi/OSTEP/Homework/homework.html), download the [linear page table problem generator] (http://pages.cs.wisc.edu/~remzi/OSTEP/Homework/HW-Paging-LinearTranslate.tgz), read the README and work on some problems. I would recommend practicing with the default parameters, perhaps with the verbose mode (-v) but varying the random seed. Test yourself by computing with and without answers (-c). Practice these until you can do them in your sleep and reach out if you are having trouble! One more note on paging: make sure you remember the big challenges with page tables: translation is slow, and page tables can be big. Review approaches that address these problems: primarily TLBs and multi-level paging. TLBs are so core to modern systems, make sure you are comfortable with how they fit in the picture, both if managed by H/W or S/W. Finally, understand the need for swapping, which parts are policies (FIFO, OPT, LRU, etc.) and which are mechanisms. Part II: Concurrency Next we shifted focus to the problem of concurrency: running multiple programs at the same time that share state. The primary abstraction is that of a thread; threads are like processes, but share an address space (including code, heap and globals), but have their own stack and cpu regs. The first issue that comes up is the one that we saw the first day of class with two threads trying to update a shared counter. We saw nondeterministic behavior due to a race condition; remember to think of the assembly instructions that are actually being executed. And think like an EVIL SCHEDULER! For that example we needed mutual exclusion, so talked about how to build locks. Make sure you know how to build a spinlock from an atomic H/W primitive! There are many to practice on: atomic xchg, test and set, compare and swap, load-linked/store-conditional, fetch and add, etc. For a primitive like this: int xchg(int *lock, int val) { int old = *lock *lock = val; return *old; } Know how to fill out the lock structures, like this: init (int *lock) { *lock = 0; } lock (int *lock) { while (xchg(lock, 1) == 1) ; } unlock (int *lock) { *lock = 0; } Then, make sure you understand what is good/bad about spinlocks (starvation? wasting resources?) and how to address them. Also, now you know about semaphores, you can use them for mutual exclusion as well! Just make sure you are careful when identifying the critical section! Think like the EVIL SCHEDULER! We talked about another form of synchronization: waiting and signaling, and how condition variables are used to solve these problems. Remember to always use a while for MESA semantics around your cond_waits! Review problems that come up all the time, like Producer/consumer. And again, think about how some problems can be nicely solved using semaphores, especially when the state variables are automatically encoded in the counter (e.g., the "done" variable for the "thread join" problem). We talked about deadlock (the deadly embrace) and the conditions that needed to hold for it, which can be used to try to prevent it. Alternatively, scheduling can avoid deadlock, or you could imagine detecting it and trying to recover. Part III: Persistence You haven't been tested on this unit yet, so expect a number of questions about it on the final. The main thing is to understand the basics of how a simple filesystem is implemented underneath that beloved filesystem API, what's on disk and what's only in memory. Speaking of the API, while open, read and write on files might be familiar to you, don't forget about links! Both soft (symbolic) links and hard links, and remember the mystery of deleting a file (it was unlink all along!)! And don't forget about directories, which are a special type of file. The FSCK lab should have been good practice on the filesystem structures, but for more, use the [homework generator from the textbook's website] (http://pages.cs.wisc.edu/~remzi/OSTEP/Homework/HW-VSFS.tgz) Seriously, work through these! Given an operation, be sure you can figure out how the state will change and vice-versa. Generate problems like this: ./vsfs.py -n 6 ./vsfs.py -n 6 -r and generate more practice problems by varying the seed (-s). Test yourself by computing with and without answers (-c). Make sure you can do them both ways in your sleep! We talked about how the inodes are set up and different options for supporting large files (e.g., indirect pointers, double indirect pointers!!!). We also talked about the incredible number of disk accesses that even some really simple seeming FS operations take, like file read, path traversal, and creating a file. When thinking about these inefficiencies, we talked about disks, and make sure you understand where they perform well (sequential access) vs. not well (random access) and why. Be sure to be able to calculate how to get a certain efficiency by amortizing disk seeks with larger block sizes. Recall how disk scheduling algorithms have similarities to some of the process scheduling algorithms (SSTF vs. SJF), and how the elevator algorithm solves starvation. Be sure to review Fast File System and the insights of treating the disk like a disk. Finally, we looked at crashes, and all of the weird and (not so) wonderful states a filesystem can be in because of the fact that only single writes are atomic and our disk scheduling algorithms might shuffle the many writes that it takes to do ANYTHING in a filesystem (updating data and inodes for both directories and files). What does it mean for a filesystem to be consistent? You should all be masters at FSCK right now, but why does FSCK suffer for larger disks? And how does journaling work? Finally, we talked about log structured filesystems, a dramatic rearrangement of the structures on the disk in order to achieve fully sequential writes (as much as possible). In Summary: I hope this helps condense a lot of the topics into a more manageable high-level picture of operating systems. The goal of this course was for you to develop a mental model of what is actually going on under the covers. If nothing else, I hope you see that there's a lot going on down there! From all the memory translations to enable virtual memory to all the disk accesses to give us a convenient filesystem abstraction (thank goodness for caching in both cases, via TLB or page cache!) to seemingly effortless concurrency, so many computer scientists take the OS for granted. Not you! I hope you have gained a new appreciation and learned a lot this semester and had some fun along the way. I wish we could have had a normal semester without the backdrop of a global pandemic! Good luck studying and please reach out to me with questions, comments, or clarifications!