Homework 6: SQL

hw06.zip

Due by 11:59pm on Thursday, August 7

Instructions

Download hw06.zip. Inside the archive, you will find a file called hw06.sql, along with a copy of the ok autograder.

Submission: When you are done, submit the assignment by uploading all code files you've edited to Gradescope. You may submit more than once before the deadline; only the final submission will be scored. Check that you have successfully submitted your code on Gradescope. See Lab 0 for more instructions on submitting assignments.

Using Ok: If you have any questions about using Ok, please refer to this guide.

Grading: Homework is graded based on correctness. Each incorrect problem will decrease the total score by one point. This homework is out of 2 points.

To check your progress, you can run sqlite3 directly by running:

python3 sqlite_shell.py --init hw06.sql

You should also check your work using ok:

python3 ok

Visualizing SQL

The CS61A SQL Web Interpreter is a great tool for visualizing and debugging SQL statements!

To get started, visit code.cs61a.org and hit Start SQL interpreter on the launch screen.

Most tables used in assignments are already available for use, so let's try to execute a SELECT statement:

In addition to displaying a visual representation of the output table, the "Step-by-step" button lets us step through the SQL execution and visualize every transformation that takes place. For our example, clicking on the next arrow will produce the following visuals, demonstrating exactly how SQL is grouping our rows to form the final output!

Required Questions

Getting Started Videos

These videos may provide some helpful direction for tackling the coding problems on this assignment.

To see these videos, you should be logged into your berkeley.edu email.

YouTube link

SQL

A SELECT statement describes an output table based on input rows. To write one:

Describe the input rows using FROM and WHERE clauses.
Group those rows and determine which groups should appear as output rows using GROUP BY and HAVING clauses.
Format and order the output rows and columns using SELECT and ORDER BY clauses.

SELECT (Step 3) FROM (Step 1) WHERE (Step 1) GROUP BY (Step 2) HAVING (Step 2) ORDER BY (Step 3);

Step 1 may involve joining tables (using commas) to form input rows that consist of two or more rows from existing tables.

The WHERE, GROUP BY, HAVING, and ORDER BY clauses are optional.

Consult the drop-down for a refresher on SQL. It's okay to skip directly to the questions and refer back here should you get stuck.

SQL Basics

Creating Tables

You can create SQL tables either from scratch or from existing tables.

The following statement creates a table by specifying column names and values without referencing another table. Each SELECT clause specifies the values for one row, and UNION is used to join rows together. The AS clauses give a name to each column; it need not be repeated in subsequent rows after the first.

CREATE TABLE [table_name] AS
  SELECT [val1] AS [column1], [val2] AS [column2], ... UNION
  SELECT [val3]             , [val4]             , ... UNION
  SELECT [val5]             , [val6]             , ...;

Let's say we want to make the following table called big_game which records the scores for the Big Game each year. This table has three columns: berkeley, stanford, and year.

We could do so with the following CREATE TABLE statement:

CREATE TABLE big_game AS
  SELECT 30 AS berkeley, 7 AS stanford, 2002 AS year UNION
  SELECT 28,             16,            2003         UNION
  SELECT 17,             38,            2014;

Selecting From Tables

More commonly, we will create new tables by selecting specific columns that we want from existing tables by using a SELECT statement as follows:

SELECT [columns] FROM [tables] WHERE [condition] ORDER BY [columns] LIMIT [limit];

Let's break down this statement:

SELECT [columns] tells SQL that we want to include the given columns in our output table; [columns] is a comma-separated list of column names, and * can be used to select all columns
FROM [table] tells SQL that the columns we want to select are from the given table
WHERE [condition] filters the output table by only including rows whose values satisfy the given [condition], a boolean expression
ORDER BY [columns] orders the rows in the output table by the given comma-separated list of columns; by default, values are sorted in ascending order (ASC), but you can use DESC to sort in descending order
LIMIT [limit] limits the number of rows in the output table by the integer [limit]

Here are some examples:

Select all of Berkeley's scores from the big_game table, but only include scores from years past 2002:

sqlite> SELECT berkeley FROM big_game WHERE year > 2002;
28
17

Select the scores for both schools in years that Berkeley won:

sqlite> SELECT berkeley, stanford FROM big_game WHERE berkeley > stanford;
30|7
28|16

Select the years that Stanford scored more than 15 points:

sqlite> SELECT year FROM big_game WHERE stanford > 15;
2003
2014

SQL operators

Expressions in the SELECT, WHERE, and ORDER BY clauses can contain one or more of the following operators:

comparison operators: =, >, <, <=, >=, <> or != ("not equal")
boolean operators: AND, OR
arithmetic operators: +, -, *, /
concatenation operator: ||

Output the ratio of Berkeley's score to Stanford's score each year:

sqlite> select berkeley * 1.0 / stanford from big_game;
0.447368421052632
1.75
4.28571428571429

Output the sum of scores in years where both teams scored over 10 points:

sqlite> select berkeley + stanford from big_game where berkeley > 10 and stanford > 10;
55
44

Output a table with a single column and single row containing the value "hello world":

sqlite> SELECT "hello" || " " || "world";
hello world

Dog Data

In each question below, you will define a new table based on the following tables.

CREATE TABLE parents AS
  SELECT "ace" AS parent, "bella" AS child UNION
  SELECT "ace"          , "charlie"        UNION
  SELECT "daisy"        , "hank"           UNION
  SELECT "finn"         , "ace"            UNION
  SELECT "finn"         , "daisy"          UNION
  SELECT "finn"         , "ginger"         UNION
  SELECT "ellie"        , "finn";

CREATE TABLE dogs AS
  SELECT "ace" AS name, "long" AS fur, 26 AS height UNION
  SELECT "bella"      , "short"      , 52           UNION
  SELECT "charlie"    , "long"       , 47           UNION
  SELECT "daisy"      , "long"       , 46           UNION
  SELECT "ellie"      , "short"      , 35           UNION
  SELECT "finn"       , "curly"      , 32           UNION
  SELECT "ginger"     , "short"      , 28           UNION
  SELECT "hank"       , "curly"      , 31;

CREATE TABLE sizes AS
  SELECT "toy" AS size, 24 AS min, 28 AS max UNION
  SELECT "mini"       , 28       , 35        UNION
  SELECT "medium"     , 35       , 45        UNION
  SELECT "standard"   , 45       , 60;

Your tables should still perform correctly even if the values in these tables change. For example, if you are asked to list all dogs with a name that starts with h, you should write:

SELECT name FROM dogs WHERE "h" <= name AND name < "i";

In other words, you should not assume that the dogs table has only the data in the table above by writing:

SELECT "hank";

The former query would still be correct if the name ginger were changed to gigi or a row was added with the name harry. Contrastingly, writing SELECT "hank"; would not.

Q1: By Parent Height

Create a table by_parent_height that has a column of the names of all dogs that have a parent, ordered by the height of the parent dog from tallest parent to shortest parent.

-- All dogs with parents ordered by decreasing height of their parent
CREATE TABLE by_parent_height AS
  SELECT "REPLACE THIS LINE WITH YOUR SOLUTION";

For example, finn has a parent ellie with height 35, and so should appear before ginger who has a parent finn with height 32. The names of dogs with parents of the same height should appear together in any order. For example, bella and charlie should both appear at the end, but either one can come before the other.

For our example tables, the by_parent_height table should look like this:

+----------+
| chil     |
+----------+
| hank     |
| finn     |
| ace      |
| daisy    |
| ginger   |
| bella    |
| charlie  |
+----------+

Use Ok to test your code:

python3 ok -q by_parent_height

Q2: Size of Dogs

The Fédération Cynologique Internationale classifies a standard poodle as over 45 cm and up to 60 cm. The sizes table describes this and other such classifications, where a dog must be over the min and less than or equal to the max in height to qualify as size.

Create a size_of_dogs table with two columns, one for each dog's name and another for its size.

-- The size of each dog
CREATE TABLE size_of_dogs AS
  SELECT "REPLACE THIS LINE WITH YOUR SOLUTION";

The size_of_dogs table should look like this:

+----------+----------+
| name     | size     |
+----------+----------+
| ace      | toy      |
| bella    | standard |
| charlie  | standard |
| daisy    | standard |
| ellie    | mini     |
| finn     | mini     |
| ginger   | toy      |
| hank     | mini     |
+----------+----------+

Use Ok to test your code:

python3 ok -q size_of_dogs

Q3: Sentences

There are two pairs of siblings that have the same size. Create a table that contains a row with a sentence describing the siblings by their size for each pair.

-- [Optional] Filling out this helper table is recommended
CREATE TABLE siblings AS
  SELECT "REPLACE THIS LINE WITH YOUR SOLUTION";

-- Sentences about siblings that are the same size
CREATE TABLE sentences AS
  SELECT "REPLACE THIS LINE WITH YOUR SOLUTION";

Each sibling pair should appear only once in the output, and siblings should be listed in alphabetical order (e.g. "bella and charlie..." instead of "charlie and bella..."), as follows:

sqlite> SELECT * FROM sentences;
The two siblings, bella and charlie, have the same size: standard
The two siblings, ace and ginger, have the same size: toy

Hint: First, create a helper table containing the names of each pair of siblings. This will make comparing the sizes of siblings when constructing the main table easier. Make sure to not pair a child with themselves and do not include duplicate pairs.

Hint: If you join a table with itself, use AS within the FROM clause to give each table an alias.

Hint: In order to concatenate two strings into one, use the || operator, e.g. SELECT "hello" || "world"; will return helloworld.

Use Ok to test your code:

python3 ok -q sentences

Q4: Low Variance

We want to create a table that contains the height range (defined as the difference between maximum and minimum height) of all dogs that share a fur type. However, we'll only consider fur types where each dog with that fur type is within 30% of the average height of all dogs with that fur type; we call this the low variance criterion.

For example, if the average height for short-haired dogs is 10, then in order to be included in our output, all dogs with short hair must have a height of at most 13 and at least 7 (inclusive).

Hint: MIN, MAX, and AVG will be useful here. Hint: You may want to first find the average height and make sure that:
* There are no heights smaller than 0.7 (i.e. 70%) of the average.
* There are no heights greater than 1.3 (i.e. 130%) of the average.

-- Height range for each fur type where all of the heights differ by no more than 30% from the average height
CREATE TABLE low_variance AS
  SELECT "REPLACE THIS LINE WITH YOUR SOLUTION";

Your output should have two columns, in this order: the fur type and the height_range for the fur types that meet this criteria. It should look like this:

+------------+--------------+
| fur        | height_range |
+------------+--------------+
| Curly      | 1            |
+------------+--------------+

The average height of long-haired dogs is 39.7, so the low variance criterion requires the height of each long-haired dog to be between 27.8 and 51.6. However, ace is a long-haired dog with height 26, which is outside this range. For short-haired dogs, bella falls outside the valid range (check!). Thus, neither short nor long haired dogs are included in the output. There are two curly haired dogs: finn with height 32 and hank with height 31. This gives a height range of 1.

Use Ok to test your code:

python3 ok -q low_variance

Check Your Score Locally

You can locally check your score on each question of this assignment by running

python3 ok --score

This does NOT submit the assignment! When you are satisfied with your score, submit the assignment to Gradescope to receive credit for it.

Submit Assignment

Submit this assignment by uploading any files you've edited to the appropriate Gradescope assignment. Lab 00 has detailed instructions.

Exam Practice

Homework assignments will also contain prior exam questions for you to try. These questions have no submission component; feel free to attempt them if you'd like some practice!