<html> <h1 id=“lab-3-tables”>Lab 3: Tables</h1> <p>In this lab you’re going to be working with tables! The goal of this lab is to give you practice with:</p> <ul> <li>Extracting rows and columns from a table</li> <li>Writing and testing helper functions</li> <li>Filtering data with
filter-with
</li> <li>Writing nested functions</li> <li>Adding a column to a table</li> <li>Summarizing columns</li> </ul> <h2 id=“setup”>Setup</h2> <p><a href=“https://en.wikipedia.org/wiki/FiveThirtyEight”>FiveThirtyEight</a> conducted a survey in which tens of thousands of people were asked to choose between two candies; from the survey’s responses, they computed the winning percentage of each candy. Then, they compiled a data-set with
Number
attributes like the winning percentage, relative price, and sugar percentage, and also
Boolean
attributes such as chocolate, fruity, caramel, and hard. This <a href=“https://github.com/fivethirtyeight/data/blob/master/candy-power-ranking/candy-data.csv”>data</a> has been analyzed in a FiveThirtyEight article, <a href=“https://fivethirtyeight.com/videos/the-ultimate-halloween-candy-power-ranking/”>“The Ultimate Halloween Candy Power Ranking,”</a> which is definitely worth a read after the lab.</p> <p>In this lab, you’ll be looking at the <strong>relationships between these columns</strong>.</p> <p>First, to get started include this code at the top of your program:</p> <pre>
include shared-gdrive("dcic-2021", "1wyQZj_L0qqV9Ekgr9au6RX2iqt2Ga8Ep") include gdrive-sheets ssid = "1fJp5pbmutjQ9HlyUA8b0Yc-BqNxSd0i9wrZgZxpSCKM" data-sheet = <span class="hljs-keyword">load</span>-spreadsheet(ssid) candy-<span class="hljs-keyword">data</span> = <span class="hljs-keyword">load</span>-<span class="hljs-keyword">table</span>: <span class="hljs-keyword">name</span>, chocolate, fruity, caramel, nutty, nougat, crisped-rice, hard, bar, pluribus, sugar-<span class="hljs-keyword">percent</span>, price-<span class="hljs-keyword">percent</span>, win-<span class="hljs-keyword">percent</span> <span class="hljs-keyword">source</span>: <span class="hljs-keyword">data</span>-sheet.sheet-<span class="hljs-keyword">by</span>-<span class="hljs-keyword">name</span>(<span class="hljs-string">"candy-data"</span>, <span class="hljs-literal">true</span>) <span class="hljs-keyword">end</span>
</pre><p>This code loads a Pyret table from a Google Sheet. Press the “Run” button in the top right corner to process the code in the definitions window. Then type
candy-data
in the interactions window to see the data as a table.</p> <p>For this lab, you’ll want to refer to the <a href=“https://hackmd.io/@cs111/table”>Pyret Tables Documentation</a> and not the built-in Pyret documentation.</p> <h2 id=“part-1-filtering”>Part 1: Filtering</h2> <p>Let’s use the power of filtering to learn more from our candy data.</p> <h3 id=“1-1-sugar-rush”>1.1: Sugar Rush</h3> <p>We want to know which candies have the most sugar. Write an expression in the definitions window that produces a table containing only the candies where
sugar-percent > 0.75
. To do this, you'll first need to write a function that takes a table
Row
as input and returns a
Boolean
that answers the question is the
sugar-percent
of that row greater than 75%. We've started the function for you. For this function you are just trying to determine if the input,
r
, has a value in the
sugar-percent
column that is greater than 75%. Be sure to test your function with a
where
block. For example, if you pass in the row
candy-data.row-n(0)
to your function, it should return
false
, and the inputting row
candy-data.row-n(4)
should cause your function to return
true
. As a reminder, the first row in the table has the index of
0
.</p> <pre>
<span class="hljs-function"><span class="hljs-keyword">fun</span> over-75-percent-<span class="hljs-title">sugar</span><span class="hljs-params">(r :: <span class="hljs-type">Row</span>)</span></span> -> <span class="hljs-built_in">Boolean</span>: doc: <span class="hljs-string">"Returns true if the sugar-percent column in a row is over 75%"</span> ... # Replace the <span class="hljs-string">"..."</span> and <span class="hljs-keyword">this</span> comment with your code end
</pre><p>Once you have written that function, we'll write an <strong>expression</strong> (not a function) to filter our table. Take a look at the <a href=“https://hackmd.io/@cs111/table#Creating-and-Extracting-Tables”>filter-with</a> function. The first input to the
filter-with
function is the
Table
to be filtered and second input is the name of the filter function, which is the function you just wrote above. That's right, as you hopefully remember, functions can be passed as inputs to other functions! Call over a coach if you want help using this structure. If you've got everything right, your expression will evaluate to a table with 15 rows in it. Hint: <em>your expression should be a call to
filter-with
with the proper inputs.</em></p> <p>Suggestion: When you’re done writing the expression, copy and paste this line</p> <pre>
<span class="hljs-meta">#-----------------------------------</span>
</pre><p>into the definitions window directly below it to separate it from the next part.</p> <h3 id=“1-2-bougie”>1.2: Bougie</h3> <p>Now that we know how to satisfy our sweet tooth, write an expression in the definitions window that produces a table containing only the candies where
price-percent
is greater than 90%. Be sure to test your predicate function with a
where
block. If you got your filter correct, you should have a table with 8 rows in it.</p> <p>Suggestion: Once again, separate it from the next part by pasting this line</p> <pre>
<span class="hljs-meta">#-----------------------------------</span>
</pre><p>on the line below it.</p> <h3 id=“1-3-chocolate”>1.3: Chocolate</h3> <p>How many of the candies have chocolate?</p> <p>Write an expression in the definitions window that outputs this number in the interactions window. You can get the length of a table by writing
<table>.length()
where
<table>
is the name of (or an expression that evaluates to) a table. If you got your filter correct, you'll find a table with 37 chocolate candies in it.</p> <p>Suggestion: Once again, separate it from the next part by pasting this line</p> <pre>
<span class="hljs-meta">#-----------------------------------</span>
</pre><p>on the line below it.</p> <h3 id=“1-4-chocolate-and-caramel”>1.4: Chocolate and Caramel</h3> <p>Of the candies that have chocolate, what proportion also have caramel?</p> <p>Write an expression in the definitions window that outputs this proportion in the interactions window. If you got the right answer, you'll see that 10/37 or about 27% of the chocolate candies also have caramel.</p> <p>Suggestion: Once again, separate it from the next part by pasting this line</p> <pre>
<span class="hljs-meta">#-----------------------------------</span>
</pre><p>on the line below it.</p> <h3 id=“1-5-chocolate-and-anything-optional-”>1.5: Chocolate and Anything (optional)</h3> <p>Next, we'd like to know what attribute is paired most frequently with chocolate. We could write filter functions like we did for part 1.4 for all the other possible pairings, but that would be very tedious and redundant.</p> <p>So instead, to help answer this question, let's write a function that generalizes the expression in 1.4. This function will take in a
String
representing the name of the ingredient being paired with chocolate and a
Table
that is only candies that have the chocolate attribute. This function returns a number which is the proportion of candies that have the attribute specified to the total number of chocolate candies.</p> <p>The function declaration should look something like this:</p> <pre>
<span class="hljs-function"><span class="hljs-keyword">fun</span> candy-with-<span class="hljs-title">attr</span><span class="hljs-params">(t :: <span class="hljs-type">Table</span>, attr :: <span class="hljs-type">String</span>)</span></span> -> Number:
</pre><p>Look for commonalities across the code you wrote for the predicate functions in (1.3) and in (1.4). To do this you will need to use nested functions (a function defined inside of another function). This will allow the filter function to access the
attr
string, even though it is not being passed into the function.</p> <p>Compare the results of your function on the inputs “fruity,” “nutty,” and “caramel”. Hopefully your answer to “caramel” matches the answer you got in part 1.4! You don’t have to write code for this comparison – just use your new function to compute all three proportions and compare them manually.</p> <hr> <p><strong>Checkpoint</strong>: Call over a coach once you reach this point and talk over your code with them.</p> <hr> <h2 id=“part-2-building-columns-and-analyzing-them”>Part 2: Building columns and analyzing them</h2> <h3 id=“2-1-new-column”>2.1: New column</h3> <p>Build a column of Boolean values that indicates whether a candy is fruity and hard, but not a pluribus (multiple candies in a packet like Skittles or M&Ms). This should produce a new table with an added column. Write an expression in the definitions window that uses this new column to compute how many candies meet this condition.</p> <p>Hint: Take a look at the
build-column
function. Call over a coach if you want help using this.</p> <h3 id=“2-2-maximum”>2.2: Maximum</h3> <p>Of the candies for which this
Boolean
is true (fruity and hard, but not a pluribus), which has the highest winning percentage?</p> <p>Hint: This requires the use of the
order-by
function, in addition to the
row-n
and
filter-with
functions.</p> <p>An example of the
row-n
function to take the fifth row of a table:</p> <pre>
table.row-<span class="hljs-built_in">n</span>(<span class="hljs-comment">4</span>)
</pre><h3 id=“2-3-mean”>2.3: Mean</h3> <p>Of the candies for which this
Boolean
is true, what’s the average winning percentage?</p> <p>Hint: This requires the use of the
mean
function.</p> <h2 id=“takeaways”>Takeaways</h2> <p>This lab has mostly been about getting you comfortable working with tabular data and practicing some common operators on tables. It also gets you thinking about our course’s focus on data: what patterns of manipulating data do we often use in computations? How does the organization of our data impact our ability to answer these questions?</p> <p>Here, we see that filtering, ordering, and summarizing data are some of the key operations. So far we’ve only looked at these operations on tabular data, but these same building blocks will arise many times through this course. When you have a computation to perform around data, you should start by thinking through what combinations of filtering, sorting and summarizing will help you compute your answer.</p> <h2 id=“submitting-the-lab”>Submitting the Lab</h2> <p>When you've complete the exercises, show your code to your instructor or one of the coaches.</p> <p>Then, upload your
lab03.arr
file to the Lab 3 assignment on <a href=“https://www.gradescope.com”>Gradescope</a>.</p> <h2 id=“acknowledgments”>Acknowledgments</h2> <p>This lab includes material adapted from Kathi Fisler and colleagues at Brown University.</p> </html>