Study guide for CMPU 334 Final, Spring 2020

This study guide is supposed to help you prepare for the final exam.
The final exam will be a similar format to the concurrency midterm due
to our current quarantine situation.  I hope everyone stays healthy
and has enjoyed learning about operating systems!

The best resource is to read the books and watch the lectures.  But
here I will try to highlight parts that I think are most important,
especially homeworks from the book's website that you should
definitely practice!!! Hint Hint!!!!

This study guide is not a guarantee of what will or will not be on the
test, it's just another (of many) resources so use/don't use at your
own risk!!!

Overview:

Why learn about OS?  The OS gives you a peek under the covers into how
everything really works.  Application programmers complain that
programming is hard, but they have no idea!  The OS makes it easy for
them!

The OS has 2 main roles (both of which make things easy for the programmer)
- virtual machine
- standard library

And it always tries to maintain both safety and efficiency while
performing those roles.

We followed the "three easy pieces" organization of topics:
virtualization, concurrency and persistence.


Part I: Virtualization

Recall the basic diagram of a computer.  The CPU and memory are two
resources that we focus on with virtualization.  How do we make it
look like a CPU is available for every program regardless of how many
CPUs are on the system?  How do we make it look like each program has
its own dedicated memory?  These are the questions we answer in this
unit.

First, CPU.  The main abstraction here is the "process".  Review the
three states a process can be in: READY, RUNNING, BLOCKED, and
why/when a process transitions.  Now that you know about concurrency
primitives, hopefully you can see where a condition variable or
semaphore may be handy for the BLOCKED state!

Under the covers, we discussed the mechanisms around this abstraction,
primarily Limited Direct Execution (LDE).

For example, refer to the diagrams (e.g., Figure 6.2) in the book:
http://pages.cs.wisc.edu/~remzi/OSTEP/cpu-mechanisms.pdf
What hardware primitives did we need?  Why?  How do the OS, hardware,
user process cooperate to make this happen?  How does the OS maintain
control/safety?

Can you think through what would happen when switching between
processes?  Now that you know about timer interrupts and scheduling
algorithms, do you see how LDE underpins everything?  Do you see when
you would switch page tables (via the PTBR) and registers when
switching between different processes?

Speaking of scheduling, you should review some of the common scheduler
policies: FIFO, SJF, RR, MLFQ.  What metrics (e.g., turnaround time,
response time) do they try to optimize for?  What is different about
the approach taken by proportional share schedulers?

We spent a long time on virtual memory, and it's a big topic.
Remember the goal of virtual memory: this beautiful illusion that
every process has its own private memory.  Don't forget exactly how
often the CPU is translating a memory address from virtual to
physical.  Good thing we have TLBs, huh?

Although we spent the most time on paging (and indeed paging is the
dominant memory translation technology out there), make sure you
remember segmentation and understand the tradeoffs between it and
paging.

Be sure to brush up on basic, linear page table translations.
Definitely go to the book's [homework site]
(http://pages.cs.wisc.edu/~remzi/OSTEP/Homework/homework.html),
download the [linear page table problem generator]
(http://pages.cs.wisc.edu/~remzi/OSTEP/Homework/HW-Paging-LinearTranslate.tgz),
read the README and work on some problems.  I would recommend
practicing with the default parameters, perhaps with the verbose mode
(-v) but varying the random seed.  Test yourself by computing with and
without answers (-c).  Practice these until you can do them in your
sleep and reach out if you are having trouble!

One more note on paging: make sure you remember the big challenges
with page tables: translation is slow, and page tables can be big.
Review approaches that address these problems: primarily TLBs and
multi-level paging.  TLBs are so core to modern systems, make sure you
are comfortable with how they fit in the picture, both if managed by
H/W or S/W.

Finally, understand the need for swapping, which parts are policies
(FIFO, OPT, LRU, etc.) and which are mechanisms.


Part II: Concurrency

Next we shifted focus to the problem of concurrency: running multiple
programs at the same time that share state.  The primary abstraction
is that of a thread; threads are like processes, but share an address
space (including code, heap and globals), but have their own stack and
cpu regs.

The first issue that comes up is the one that we saw the first day of
class with two threads trying to update a shared counter.  We saw
nondeterministic behavior due to a race condition; remember to think
of the assembly instructions that are actually being executed.  And
think like an EVIL SCHEDULER!

For that example we needed mutual exclusion, so talked about how to
build locks.  Make sure you know how to build a spinlock from an
atomic H/W primitive!  There are many to practice on: atomic xchg,
test and set, compare and swap, load-linked/store-conditional, fetch
and add, etc. 

For a primitive like this:

       int xchg(int *lock, int val) {
         int old = *lock
         *lock = val;
         return *old;
       }

Know how to fill out the lock structures, like this:

         init (int *lock) {
           *lock = 0;
         }
  
         lock (int *lock) {
           while (xchg(lock, 1) == 1)
             ;
         }
  
         unlock (int *lock) {
           *lock = 0;
         }

Then, make sure you understand what is good/bad about spinlocks
(starvation?  wasting resources?) and how to address them.  Also, now
you know about semaphores, you can use them for mutual exclusion as
well!  Just make sure you are careful when identifying the critical
section!  Think like the EVIL SCHEDULER!

We talked about another form of synchronization: waiting and
signaling, and how condition variables are used to solve these
problems.  Remember to always use a while for MESA semantics around
your cond_waits!  Review problems that come up all the time, like
Producer/consumer.  And again, think about how some problems can be
nicely solved using semaphores, especially when the state variables
are automatically encoded in the counter (e.g., the "done" variable
for the "thread join" problem).

We talked about deadlock (the deadly embrace) and the conditions that
needed to hold for it, which can be used to try to prevent it.
Alternatively, scheduling can avoid deadlock, or you could imagine
detecting it and trying to recover.


Part III: Persistence

You haven't been tested on this unit yet, so expect a number of
questions about it on the final.  The main thing is to understand the
basics of how a simple filesystem is implemented underneath that
beloved filesystem API, what's on disk and what's only in memory.  

Speaking of the API, while open, read and write on files might be
familiar to you, don't forget about links!  Both soft (symbolic) links
and hard links, and remember the mystery of deleting a file (it was
unlink all along!)!  And don't forget about directories, which are a
special type of file.

The FSCK lab should have been good practice on the filesystem
structures, but for more, use the [homework generator from the
textbook's website]
(http://pages.cs.wisc.edu/~remzi/OSTEP/Homework/HW-VSFS.tgz)
Seriously, work through these!  Given an operation, be sure you can
figure out how the state will change and vice-versa.

Generate problems like this:

./vsfs.py -n 6
./vsfs.py -n 6 -r

and generate more practice problems by varying the seed (-s).  Test
yourself by computing with and without answers (-c).  Make sure you
can do them both ways in your sleep!

We talked about how the inodes are set up and different options for
supporting large files (e.g., indirect pointers, double indirect
pointers!!!).  We also talked about the incredible number of disk
accesses that even some really simple seeming FS operations take, like
file read, path traversal, and creating a file.

When thinking about these inefficiencies, we talked about disks, and
make sure you understand where they perform well (sequential access)
vs. not well (random access) and why.  Be sure to be able to calculate
how to get a certain efficiency by amortizing disk seeks with larger
block sizes.  Recall how disk scheduling algorithms have similarities
to some of the process scheduling algorithms (SSTF vs. SJF), and how
the elevator algorithm solves starvation.  Be sure to review Fast File
System and the insights of treating the disk like a disk.

Finally, we looked at crashes, and all of the weird and (not so)
wonderful states a filesystem can be in because of the fact that only
single writes are atomic and our disk scheduling algorithms might
shuffle the many writes that it takes to do ANYTHING in a filesystem
(updating data and inodes for both directories and files).  What does
it mean for a filesystem to be consistent?  You should all be masters
at FSCK right now, but why does FSCK suffer for larger disks?  And how
does journaling work?

Finally, we talked about log structured filesystems, a dramatic
rearrangement of the structures on the disk in order to achieve fully
sequential writes (as much as possible).


In Summary:

I hope this helps condense a lot of the topics into a more manageable
high-level picture of operating systems.  The goal of this course was
for you to develop a mental model of what is actually going on under
the covers.  If nothing else, I hope you see that there's a lot going
on down there!  From all the memory translations to enable virtual
memory to all the disk accesses to give us a convenient filesystem
abstraction (thank goodness for caching in both cases, via TLB or page
cache!) to seemingly effortless concurrency, so many computer
scientists take the OS for granted.  Not you!

I hope you have gained a new appreciation and learned a lot this
semester and had some fun along the way.  I wish we could have had a
normal semester without the backdrop of a global pandemic!

Good luck studying and please reach out to me with questions,
comments, or clarifications!