93acac6049a8532fa1c06bfcdcdc50ecc75ec022
public/symmetric.md
... | ... | @@ -0,0 +1,283 @@ |
1 | +#### Note added 15-Sep-2017 |
|
2 | + |
|
3 | +I talked to Lloyd Tabb at the Looker Join conference and asked him how he came up with this strategy: A colleague asked him whether he knew that you could do a `sum(distinct . . .)`. Lloyd did not know, but when told, this idea came to him very quickly. |
|
4 | + |
|
5 | +It has been submitted as a [patent](https://www.google.sr/patents/CA2965831A1?cl=en&dq=inassignee:%22Looker+Data+Sciences,+Inc.%22&hl=en&sa=X&ved=0ahUKEwi0uOCgkqfWAhUK64MKHYIRDtgQ6AEIJDAA). |
|
6 | + |
|
7 | +----- |
|
8 | + |
|
9 | +### Looker and Symmetric Aggregates (the 1:many "fanout" problem with aggregates) |
|
10 | + |
|
11 | +#### I was looking at some Looker-generated SQL and saw this: |
|
12 | + |
|
13 | + SELECT |
|
14 | + COALESCE(SUM(customers.visits ), 0) AS "customers.total_visits" |
|
15 | + FROM public.customers AS customers |
|
16 | + |
|
17 | +(See <http://localhost:9999/explore/fanout2/customers?qid=ek2NHkVfoJ34yRkLHgagFj>) |
|
18 | + |
|
19 | +Simple. Then I added an aggregate from a joined view (a "view" in Looker is a |
|
20 | +thin façade on a table) and I saw this: |
|
21 | + |
|
22 | + SELECT |
|
23 | + COALESCE(COALESCE( ( SUM(DISTINCT (CAST(FLOOR(COALESCE(customers.visits ,0)*(1000000*1.0)) AS DECIMAL(65,0))) + ('x' || MD5(customers.id ::varchar))::bit(64)::bigint::DECIMAL(65,0) *18446744073709551616 + ('x' || SUBSTR(MD5(customers.id ::varchar),17))::bit(64)::bigint::DECIMAL(65,0) ) - SUM(DISTINCT ('x' || MD5(customers.id ::varchar))::bit(64)::bigint::DECIMAL(65,0) *18446744073709551616 + ('x' || SUBSTR(MD5(customers.id ::varchar),17))::bit(64)::bigint::DECIMAL(65,0)) ) / (1000000*1.0), 0), 0) AS "customers.total_visits", |
|
24 | + COALESCE(COALESCE( ( SUM(DISTINCT (CAST(FLOOR(COALESCE(orders.amount ,0)*(1000000*1.0)) AS DECIMAL(65,0))) + ('x' || MD5(orders.id ::varchar))::bit(64)::bigint::DECIMAL(65,0) *18446744073709551616 + ('x' || SUBSTR(MD5(orders.id ::varchar),17))::bit(64)::bigint::DECIMAL(65,0) ) - SUM(DISTINCT ('x' || MD5(orders.id ::varchar))::bit(64)::bigint::DECIMAL(65,0) *18446744073709551616 + ('x' || SUBSTR(MD5(orders.id ::varchar),17))::bit(64)::bigint::DECIMAL(65,0)) ) / (1000000*1.0), 0), 0) AS "orders.total_amount" |
|
25 | + FROM public.customers AS customers |
|
26 | + LEFT JOIN public.orders AS orders ON customers.id = orders.customer_id |
|
27 | + |
|
28 | +What are they doing? [Looker [rightly, I think] brags about this and calls it "symmetric aggregates"](https://www.youtube.com/watch?v=oRXQ4oNsyNk) ([and here is the business case](https://discourse.looker.com/t/symmetric-aggregates/261)), |
|
29 | +but they don't explain how it works. I conducted some searches and it would |
|
30 | +seem that Looker's strategy here is unique and a genuine innovation. |
|
31 | + |
|
32 | +#### Sample database |
|
33 | + |
|
34 | + drop table if exists customers; |
|
35 | + drop table if exists orders; |
|
36 | + |
|
37 | + create table customers (id int, first_name varchar, last_name varchar, visits int); |
|
38 | + create table orders (id int, amount numeric(6,2), customer_id int); |
|
39 | + |
|
40 | + insert into customers (id, first_name, last_name, visits) values (1, 'Amelia', 'Earhart', 2); |
|
41 | + insert into customers (id, first_name, last_name, visits) values (2, 'Charles', 'Lindberg', 2); |
|
42 | + insert into customers (id, first_name, last_name, visits) values (3, 'Wilbur', 'Wright', 4); |
|
43 | + |
|
44 | + insert into orders (id, amount, customer_id) values (1, 25.00, 1); |
|
45 | + insert into orders (id, amount, customer_id) values (2, 50.00, 1); |
|
46 | + insert into orders (id, amount, customer_id) values (3, 75.00, 2); |
|
47 | + insert into orders (id, amount, customer_id) values (4, 100.00, 3); |
|
48 | + |
|
49 | +How many visits are there? |
|
50 | + |
|
51 | + select sum(visits) from customers; |
|
52 | + -- returns 8 |
|
53 | + |
|
54 | +Yay! What is the average length of a visit? |
|
55 | + |
|
56 | + select avg(visits) from customers; |
|
57 | + -- returns 2.66 |
|
58 | + |
|
59 | +And now let's look at the classic "surprise" for beginning users of SQL: |
|
60 | +Join in a column, and . . . |
|
61 | + |
|
62 | + select avg(visits) from customers left join orders on orders.customer_id = customers.id; |
|
63 | + -- returns 2.5 |
|
64 | + -- oops! |
|
65 | + |
|
66 | +Let's add some more columns, remove the aggregate, and look at the data: |
|
67 | + |
|
68 | + select |
|
69 | + customers.id as customer_id, |
|
70 | + customers.first_name, |
|
71 | + customers.last_name, |
|
72 | + customers.visits, |
|
73 | + orders.id as order_id, |
|
74 | + orders.amount |
|
75 | + from customers left join orders on orders.customer_id = customers.id |
|
76 | + |
|
77 | +(<http://localhost:9999/sql/mx9cgdrvfhwsdc>) |
|
78 | + |
|
79 | +That produces: |
|
80 | + |
|
81 | +| customer_id | first_name | last_name | visits | order_id | amount | |
|
82 | +|-------------|------------|-----------|:------:|:--------:|-------:| |
|
83 | +| 1 | Amelia | Earhart | 2 | 1 | 25.00 | |
|
84 | +| 1 | Amelia | Earhart | 2 | 2 | 50.00 | |
|
85 | +| 2 | Charles | Lindberg | 2 | 3 | 75.00 | |
|
86 | +| 3 | Wilbur | Wright | 4 | 4 | 100.00 | |
|
87 | + |
|
88 | +Clear enough. We have "extra" values in the visits column |
|
89 | +because of the 1:many ("fanout") relationship |
|
90 | +between customers and orders. |
|
91 | + |
|
92 | +If we want the counts of customers and the counts of amounts, we can just use |
|
93 | +`count(distinct ...)`: |
|
94 | + |
|
95 | + select |
|
96 | + count(distinct customers.id) as "customers count", |
|
97 | + count(distinct orders.id) as "orders count" |
|
98 | + from customers left join orders on orders.customer_id = customers.id; |
|
99 | + |
|
100 | +Works great! However, `sum` is problematic. |
|
101 | + |
|
102 | + select |
|
103 | + sum(customers.visits) as "customers total visits", |
|
104 | + sum(orders.amount) as "orders total amount" |
|
105 | + from customers left join orders on orders.customer_id = customers.id; |
|
106 | + |
|
107 | +(<http://localhost:9999/sql/2brb8hdmwdbhtx>) |
|
108 | + |
|
109 | +Result: |
|
110 | + |
|
111 | +| customers total visits | orders total amount | |
|
112 | +| :--------------------: | :-----------------: | |
|
113 | +| 10 | 250.00 | |
|
114 | + |
|
115 | +Clearly this is going to seem "wrong" to people who aren't comfortable with |
|
116 | +joins. The classic ways to fix this: with a [subquery](http://www.sqlteam.com/article/aggregating-correlated-sub-queries) or [creative self-joins](http://www.sqlteam.com/article/how-to-use-group-by-with-distinct-aggregates-and-derived-tables) or [window functions](https://stackoverflow.com/a/13169627). However, these |
|
117 | +solutions can |
|
118 | +introduces as many problems as they solve (subqueries and window functions |
|
119 | +aren't as easily composable as the Looker solution), especially for an analytic tool |
|
120 | +where we don't want people to have to worry about SQL. |
|
121 | + |
|
122 | +#### Quick demo in Looker |
|
123 | + |
|
124 | +(Show the live SQL query-writing.) |
|
125 | + |
|
126 | +#### How they do it |
|
127 | + |
|
128 | +The technique is to do a `sum(distinct something)` where the "something" |
|
129 | +for visits is _the same_ for each combination of `customers.id` and `visits`. |
|
130 | +That way when we do `sum(distinct something)` there will be only one value |
|
131 | +for the combination of `customers.id = 1` and `visits = 2` (see the table |
|
132 | +above that joins `customers` and `orders`). We can then take this sum and |
|
133 | +_subtract_ the "something" applied _only_ to the customer id, and the remainder |
|
134 | +will be the total of visits from the customers table. I am going to call the calculation on the |
|
135 | +customer id `id_offset`. |
|
136 | + |
|
137 | +Another way to think about this: The aggregate is being calculated not |
|
138 | +against all of the values in the column (which can have duplicates because |
|
139 | +of the join), but against the distinct primary |
|
140 | +key from the source table. |
|
141 | + |
|
142 | +Alright, let's try a naive solution. We'll say that `something` is |
|
143 | +`(customers_id + 100) + visits`. In other words, |
|
144 | + |
|
145 | + select |
|
146 | + customers.id as customer_id, |
|
147 | + customers.first_name, |
|
148 | + customers.last_name, |
|
149 | + customers.visits, |
|
150 | + (customers.id + 100) as id_offset, |
|
151 | + (customers.id + 100) + customers.visits as something, |
|
152 | + orders.id as order_id, |
|
153 | + orders.amount |
|
154 | + from customers left join orders on orders.customer_id = customers.id; |
|
155 | + |
|
156 | +(<http://localhost:9999/sql/jvctwzbqnkywjr>) |
|
157 | + |
|
158 | +Now we can get a correct sum of the visits column like so: |
|
159 | + |
|
160 | + select sum(distinct something) - sum(distinct id_offset) as "customers total visits" |
|
161 | + from ( |
|
162 | + select |
|
163 | + customers.id as customer_id, |
|
164 | + customers.first_name, |
|
165 | + customers.last_name, |
|
166 | + customers.visits, |
|
167 | + (customers.id + 100) as id_offset, |
|
168 | + (customers.id + 100) + customers.visits as something, |
|
169 | + orders.id as order_id, |
|
170 | + orders.amount |
|
171 | + from customers left join orders on orders.customer_id = customers.id |
|
172 | + ) as orig; |
|
173 | + -- returns 8 (correct!) |
|
174 | +(<http://localhost:9999/sql/bbctxqmdvwnv2z>) |
|
175 | + |
|
176 | +Right? `(103 + 104 + 107) - (101 + 102 + 103) = 8`. |
|
177 | + |
|
178 | +That's the concept in a nutshell. However, the selection of the function |
|
179 | +is critical. Here's an example where the column data doesn't work with |
|
180 | +the `+ 100` function: Customer/visit combinations must have a value for `something` that is unique. |
|
181 | +So if we had two customers like this (id: 1, visits: 2; id: 2, visits: 1), |
|
182 | +with three total visits we would get |
|
183 | + |
|
184 | +| customer id | visits | id_offset | something | |
|
185 | +| :---------: | :----: | :-------: | :-------: | |
|
186 | +| 1 | 2 | 101 | 103 | |
|
187 | +| 2 | 1 | 102 | 103 | |
|
188 | + |
|
189 | +Here, `sum(distinct something)` is 103, but `sum(distinct id_offset)` is |
|
190 | +203, and we would produce a total for visits of 100. Oops. So we are screwed. We get a bit of improvement with (customers.id * 100): |
|
191 | + |
|
192 | +| customer id | visits | id_offset | something | |
|
193 | +| :---------: | :----: | :-------: | :-------: | |
|
194 | +| 1 | 2 | 100 | 102 | |
|
195 | +| 2 | 1 | 200 | 201 | |
|
196 | + |
|
197 | +Now `sum(distinct something)` = 303, and `sum(dstinct id_offset)` = 300, do |
|
198 | +303 - 300 = 3, and we're good again. |
|
199 | + |
|
200 | +Therefore, generalizing the function would help. Here's how Looker does it: |
|
201 | + |
|
202 | + COALESCE(COALESCE( ( |
|
203 | + SUM(DISTINCT (CAST(FLOOR(COALESCE(customers.visits ,0)*(1000000*1.0)) AS DECIMAL(65,0))) |
|
204 | + + ('x' || MD5(customers.id::varchar))::bit(64)::bigint::DECIMAL(65,0) * 18446744073709551616 |
|
205 | + + ('x' || SUBSTR(MD5(customers.id ::varchar),17))::bit(64)::bigint::DECIMAL(65,0) ) |
|
206 | + - SUM(DISTINCT ('x' || MD5(customers.id::varchar))::bit(64)::bigint::DECIMAL(65,0) * 18446744073709551616 |
|
207 | + + ('x' || SUBSTR(MD5(customers.id ::varchar),17))::bit(64)::bigint::DECIMAL(65,0)) ) |
|
208 | + / (1000000*1.0), 0), 0) |
|
209 | + AS "customers.total_visits" |
|
210 | + |
|
211 | +You should be able to see the skeleton of the subtraction of what I have |
|
212 | +called the `sum(dstinct id_offset)` from the `sum(distinct something)`. |
|
213 | + |
|
214 | +The meat of this is in the use of the MD5 function to get a 128 bit hash value. |
|
215 | +What they are doing is getting a decimal number out of the hash. Here are a |
|
216 | +few queries that will build up to the actual Looker query and start to make this clear. |
|
217 | + |
|
218 | + select md5(1::varchar); |
|
219 | + -- the basic hash: c4ca4238a0b923820dcc509a6f75849b |
|
220 | + select 'x' || md5(1::varchar); |
|
221 | + -- makes something that can be cast to a bit(64): xc4ca4238a0b923820dcc509a6f75849b |
|
222 | + select ('x' || md5(1::varchar))::bit(64); |
|
223 | + -- cast to binary: 1100010011001010010000100011100010100000101110010010001110000010 |
|
224 | + select ('x' || md5(1::varchar))::bit(64)::bigint; |
|
225 | + -- cast to 8-byte bigint: -4266524885998034046 |
|
226 | + select ('x' || md5(1::varchar))::bit(64)::bigint::decimal(65,0); |
|
227 | + -- cast to exact precision decimal: -4266524885998034046 |
|
228 | + |
|
229 | +Some references to the PostgreSQL documentation: [casting](https://www.postgresql.org/docs/9.6/static/sql-expressions.html#SQL-SYNTAX-TYPE-CASTS); [bit string constants expressed as hexadecimal and the leading "x"](https://www.postgresql.org/docs/current/static/sql-syntax-lexical.html#SQL-SYNTAX-BIT-STRINGS); [bit string type](https://www.postgresql.org/docs/current/static/datatype-bit.html); [numeric/decimal type and arbitrary precious arithmetic](https://www.postgresql.org/docs/9.6/static/datatype-numeric.html#DATATYPE-NUMERIC-DECIMAL). |
|
230 | + |
|
231 | +Remember that the 128 bit md5 was cast to an 8-byte bigint, which happens to |
|
232 | +be signed. The range of this int is -9223372036854775808 to +9223372036854775807; |
|
233 | +there are 18446744073709551616 values in this range. The multiplication is |
|
234 | +essentially moving the digits of the md5 calculation. |
|
235 | + |
|
236 | + select ('x' || md5(1::varchar))::bit(64)::bigint::decimal(65,0) * 18446744073709551616; |
|
237 | + -- Expand the space: -78703492656118554855266830188162318336 |
|
238 | + |
|
239 | +Now conduct the same computation on just the first 17 characters (I guess |
|
240 | +this is to avoid md5 collisions): |
|
241 | + |
|
242 | + select ('x' || substr(md5(1::varchar), 17))::bit(64)::bigint::decimal(65,0); |
|
243 | + -- produces: 994258241967195291 |
|
244 | + |
|
245 | +And add these together: |
|
246 | + |
|
247 | + select |
|
248 | + ('x' || md5(1::varchar))::bit(64)::bigint::decimal(65,0) * 18446744073709551616 |
|
249 | + + ('x' || substr(md5(1::varchar), 17))::bit(64)::bigint::decimal(65,0); |
|
250 | + -- produces: -78703492656118554854272571946195123045 |
|
251 | + |
|
252 | +The same idea is done for the subtrahend. |
|
253 | + |
|
254 | +The last piece of the puzzle is the multiplication times 1000000 and division |
|
255 | +by 1000000. This is done to preserve fractional value in integer arithmetic. |
|
256 | + |
|
257 | +#### Why additional addition of part of the hash? |
|
258 | + |
|
259 | +As I say, probably they are trying to avoid md5 collisions. Seems to work pretty well without it. |
|
260 | +Here I'm commenting it out. Same result. |
|
261 | + |
|
262 | + SELECT |
|
263 | + ( |
|
264 | + SUM(DISTINCT |
|
265 | + ( |
|
266 | + CAST(FLOOR(customers.visits * (1000000*1.0)) AS DECIMAL(65,0))) |
|
267 | + + ('x' || MD5(customers.id ::varchar))::bit(64)::bigint::DECIMAL(65,0) *18446744073709551616 |
|
268 | + -- + ('x' || SUBSTR(MD5(customers.id ::varchar),17))::bit(64)::bigint::DECIMAL(65,0) |
|
269 | + ) |
|
270 | + - |
|
271 | + SUM(DISTINCT |
|
272 | + ( |
|
273 | + 'x' || MD5(customers.id ::varchar))::bit(64)::bigint::DECIMAL(65,0) *18446744073709551616 |
|
274 | + -- + ('x' || SUBSTR(MD5(customers.id ::varchar),17))::bit(64)::bigint::DECIMAL(65,0) |
|
275 | + ) |
|
276 | + |
|
277 | + ) |
|
278 | + |
|
279 | + / (1000000*1.0) |
|
280 | + |
|
281 | + AS "customers.total_visits" |
|
282 | + FROM public.customers AS customers |
|
283 | + LEFT JOIN public.orders AS orders ON customers.id = orders.customer_id |
public/test-doubles.md
... | ... | @@ -0,0 +1,314 @@ |
1 | +Page references to xUnit Test Patterns (XTP), especially Chapter 11 ("Test Doubles"), "Four-Phase Test" (pp. 358-361) and Practical Object-Oriented Design in Ruby (POODR), especially Chapter 9 ("Designing Cost-Effective Tests"). |
|
2 | + |
|
3 | +Other resources: |
|
4 | + |
|
5 | +* <https://robots.thoughtbot.com/four-phase-test> |
|
6 | + |
|
7 | +### The thing we are testing (the SUT) |
|
8 | + |
|
9 | +The thing we are testing is called a "System under test" (SUT) (XTP) or "object under test" (POODR, p. 195, Figure 9.1). We'll use SUT. It is fair to think about other things being under test, such as a class, object, method, or application, but the idea here is that the thing under test is defined by the scope of the test itself. Other objects may be involved, such as a "depended-on component" (DOC). |
|
10 | + |
|
11 | + |
|
12 | +(POODR, Figure 9.1) |
|
13 | + |
|
14 | +### Testing messages received from others |
|
15 | + |
|
16 | +When we receive a message from another object, we want to verify a state change in our own object. |
|
17 | + |
|
18 | +Example: Suppose we have an object that converts fahrenheit to celsius. When we pass in a measurement |
|
19 | +in fahrenheit, we want to verify that the computation of the celsius value is correct. |
|
20 | + |
|
21 | +```ruby |
|
22 | +require 'minitest/autorun' |
|
23 | + |
|
24 | +class TemperatureConverter |
|
25 | + def f_to_c(f) |
|
26 | + (f - 32.0) * 5.0 / 9.0 |
|
27 | + end |
|
28 | +end |
|
29 | + |
|
30 | +class TemperatureConverterTest < MiniTest::Test |
|
31 | + def setup |
|
32 | + @tc = TemperatureConverter.new |
|
33 | + end |
|
34 | + |
|
35 | + def test_f_to_c |
|
36 | + assert_in_delta 0.0, @tc.f_to_c(32), 0.01 |
|
37 | + end |
|
38 | +end |
|
39 | +``` |
|
40 | + |
|
41 | +This is very easy. The SUT does not depend on other classes; the state change can be verified directly. |
|
42 | + |
|
43 | +### Testing messages received from others -- When to use a stub |
|
44 | + |
|
45 | +Now let's say that we have a converter object that is capable of a variety of conversions. The |
|
46 | +way this is going to work is that we are going to pass in the type of conversion we want, along |
|
47 | +with a value to be converted, and we want to get the right result. |
|
48 | + |
|
49 | +It will look something like this: |
|
50 | + |
|
51 | + Converter.new(FToCConverter).convert(32) |
|
52 | + |
|
53 | +Now our Converter class depends on a collaborator, FToCConverter. XTP calls this an |
|
54 | +"indirect input" (p. 125), and the name it gives to this kind of collaborator is a DOC -- a "depended-on component." |
|
55 | +We could test Converter with a specific Collaborator (FToCConverter) or we can try to test the SUT in isolation. |
|
56 | +If we can test the SUT in isolation, then there will be fewer dependencies on other objects, |
|
57 | +which will make our test less brittle. |
|
58 | + |
|
59 | +When we test the converter, we are not attempting to establish whether the conversion is correct; |
|
60 | +instead, we want to verify that it can delegate the 32 to the specific converter and return a |
|
61 | +value. For the verification of the specific converter, we will write separate tests for that. |
|
62 | +Additionally, we may be writing the main converter first. We may not even know the range of |
|
63 | +specific converters we are going to want, or have one in hand. |
|
64 | + |
|
65 | +#### Stubbing |
|
66 | + |
|
67 | +A stub is an implementation that returns a canned answer (POODR, p. 210). |
|
68 | + |
|
69 | +**NOTE:** We stub on the DOC, not on the SUT. For some guidance on this, see <https://robots.thoughtbot.com/don-t-stub-the-system-under-test>. |
|
70 | + |
|
71 | +If we create a stub manually, we will want an instance of FToCConverter's convert method to return |
|
72 | +a value that we can verify. It might look like this: |
|
73 | + |
|
74 | +```ruby |
|
75 | +require 'minitest/autorun' |
|
76 | + |
|
77 | +class FToCConverter |
|
78 | + def convert(value) |
|
79 | + 500 |
|
80 | + end |
|
81 | +end |
|
82 | + |
|
83 | +class Converter |
|
84 | + def initialize(specific_converter_class) |
|
85 | + @specific_converter = specific_converter_class.new |
|
86 | + end |
|
87 | + def convert(value) |
|
88 | + @specific_converter.convert(value) |
|
89 | + end |
|
90 | +end |
|
91 | + |
|
92 | +class ConverterTest < MiniTest::Test |
|
93 | + def setup |
|
94 | + @c = Converter.new(FToCConverter) |
|
95 | + end |
|
96 | + |
|
97 | + def test_convert |
|
98 | + assert_in_delta 500, @c.convert(32), 0.01 |
|
99 | + end |
|
100 | +end |
|
101 | +``` |
|
102 | + |
|
103 | +**NOTE:** In the real world, this is not quite how it's done. Why? Because in this test, _we don't care about |
|
104 | +FToCConverter_. All we care about is making available to the test an object that exposes a `convert` |
|
105 | +method that returns a specific value, so that we van validate that the main `convert` method on |
|
106 | +`Converter` leverages it. In short, we want to write the least amount of code to see that the plumbing |
|
107 | +is working. In MiniTest, it might look like this: |
|
108 | + |
|
109 | +```ruby |
|
110 | +require 'minitest/autorun' |
|
111 | + |
|
112 | +class FToCConverter |
|
113 | + def convert(value) |
|
114 | + 500 |
|
115 | + end |
|
116 | +end |
|
117 | + |
|
118 | +class Converter |
|
119 | + def initialize(specific_converter) |
|
120 | + @specific_converter = specific_converter |
|
121 | + end |
|
122 | + def convert(value) |
|
123 | + @specific_converter.convert(value) |
|
124 | + end |
|
125 | +end |
|
126 | + |
|
127 | +class ConverterTest < MiniTest::Test |
|
128 | + def specific_converter |
|
129 | + @specific_converter ||= FToCConverter.new |
|
130 | + end |
|
131 | + |
|
132 | + def setup |
|
133 | + @c = Converter.new(specific_converter) |
|
134 | + end |
|
135 | + |
|
136 | + def test_convert |
|
137 | + specific_converter.stub :convert, 400 do |
|
138 | + assert_in_delta 400, @c.convert(32), 0.01 |
|
139 | + end |
|
140 | + end |
|
141 | +end |
|
142 | +``` |
|
143 | + |
|
144 | +Notice that here our implementation of `FToCConverter#convert` returns 500. But our stub of |
|
145 | +this method returns 400: and we verify against that. What this means is that we can check the plumbing |
|
146 | +of the delegation to the specific converter even if it's wrong; our stubbed test doesn't depend on |
|
147 | +what the actual class does at all: our test is completely independent. Additionally, we have moved the |
|
148 | +key verification value closer to the assertion, which is easier to read. |
|
149 | + |
|
150 | +For here on out, we will use a different syntax in MiniTest. Here's the same test but using |
|
151 | +spec-style syntax using keywords such as "describe" and "expect." |
|
152 | + |
|
153 | +```ruby |
|
154 | +require 'minitest/autorun' |
|
155 | + |
|
156 | +class FToCConverter |
|
157 | + def convert(value) |
|
158 | + 500 |
|
159 | + end |
|
160 | +end |
|
161 | + |
|
162 | +class Converter |
|
163 | + def initialize(specific_converter) |
|
164 | + @specific_converter = specific_converter |
|
165 | + end |
|
166 | + def convert(value) |
|
167 | + @specific_converter.convert(value) |
|
168 | + end |
|
169 | +end |
|
170 | + |
|
171 | +describe Converter do |
|
172 | + let(:specific_converter) { FToCConverter.new } |
|
173 | + subject { Converter.new(specific_converter) } |
|
174 | + |
|
175 | + it "can delegate to a specific converter" do |
|
176 | + specific_converter.stub :convert, 400 do |
|
177 | + expect subject.convert(32).must_be_within_epsilon 400, 0.01 |
|
178 | + end |
|
179 | + end |
|
180 | +end |
|
181 | +``` |
|
182 | + |
|
183 | +`let` and `subject` provide for dynamically creating methods that return what's defined in the blocks. |
|
184 | +With `let`, the name of the method comes from the symbol you pass in. For `subject`, the method |
|
185 | +is called `subject`. They are both lazy. So if you never use `specific_converter` the block `{ FToCConverter.new }` will |
|
186 | +never run. Also, because they are lazy, you can reverse the order. Maybe people like to have the |
|
187 | +subject of the test at the top of the spec. I'll do that below. |
|
188 | + |
|
189 | +Stubbing in MiniTest requires the the stubbed method actually exist in the instance being stubbed. Other |
|
190 | +testing frameworks are more lenient in this respect, which can result in more concise tests -- but at |
|
191 | +the expense of having test doubles that are _too_ fake and don't match up to your real collaborators. For |
|
192 | +example, you might stub a method `convert` but over time your collaborators change that method name |
|
193 | +to `transform`: Your tests would continue to pass because you've stubbed a method that doesn't exist |
|
194 | +on the real objects. |
|
195 | + |
|
196 | +### The Four-Phase Test |
|
197 | + |
|
198 | +It is conventional to think of tests as having four phases: |
|
199 | + |
|
200 | +1. Setup |
|
201 | +2. Exercise |
|
202 | +3. Verify |
|
203 | +4. Teardown |
|
204 | + |
|
205 | +Typically teardown (releasing resources) is done for you by the framework. Here's our spec of `Converter` |
|
206 | +with comments showing the phases: |
|
207 | + |
|
208 | +```ruby |
|
209 | +describe Converter do |
|
210 | + let(:specific_converter) { FToCConverter.new } # setup |
|
211 | + subject { Converter.new(specific_converter) } # setup |
|
212 | + |
|
213 | + it "can delegate to a specific converter" do |
|
214 | + specific_converter.stub :convert, 400 do |
|
215 | + converted_value = subject.convert(32) # exercise |
|
216 | + expect converted_value.must_be_within_epsilon 400, 0.01 # verify |
|
217 | + end |
|
218 | + end |
|
219 | + # teardown |
|
220 | +end |
|
221 | +``` |
|
222 | + |
|
223 | +### Testing messages sent to others -- when to mock |
|
224 | + |
|
225 | +When we send a message to another object that results in a side effect, we want to verify the side effect. |
|
226 | + |
|
227 | +In other words, we want to prove that a promised behavior change in a collaborator has been triggered. |
|
228 | + |
|
229 | +Example: |
|
230 | + |
|
231 | +Our `Converter` provides a means for there to be logging of the conversion. |
|
232 | + |
|
233 | +Let's make a few changes to our design of `Converter`. First off, let's allow that it's easier to |
|
234 | +provide configuration parameters via a hash. We'll also provide sensible defaults -- a specific converter |
|
235 | +that does nothing and `nil` for the logger. We won't trigger a logger if no logger is set. |
|
236 | + |
|
237 | +```ruby |
|
238 | +class Converter |
|
239 | + class PassThroughConverter |
|
240 | + def convert(value) |
|
241 | + value |
|
242 | + end |
|
243 | + end |
|
244 | + |
|
245 | + attr_reader :converter, :logger |
|
246 | + |
|
247 | + def initialize(args = {}) |
|
248 | + @converter = args[:converter] || PassThroughConverter.new |
|
249 | + @logger = args[:logger] |
|
250 | + end |
|
251 | + def convert(value) |
|
252 | + converted_value = converter.convert(value) |
|
253 | + log(value, converted_value) |
|
254 | + converted_value |
|
255 | + end |
|
256 | + |
|
257 | + private |
|
258 | + |
|
259 | + def log(value, converted_value) |
|
260 | + logger.log(value, converted_value) if logger |
|
261 | + end |
|
262 | +end |
|
263 | +``` |
|
264 | + |
|
265 | +Now, what do we want to verify? We do *not* want to verify that the `log` method on `Converter` gets |
|
266 | +called with `value` and `converted_value` -- what we want to know is whether the collaborator is |
|
267 | +sent the right message. In this case, we want to know if a logger instance would be sent |
|
268 | +the message `log` with the right values. At this point, we don't even have a logger class. We just |
|
269 | +know that it is going to expose a method `log` and expect that the method call will pass the |
|
270 | +value and its conversion. |
|
271 | + |
|
272 | +To make this happen, we want to create a `Mock`. Notice that we are still using the stubbed converter. |
|
273 | +But now we write `logger.expect` to set up our expectations for what will happen on the collaborating |
|
274 | +object; and then we verify it afterward. |
|
275 | + |
|
276 | +It is critical to understand that we are not mocking an object or a class; we are just verifying |
|
277 | +that the delegate is sent the `log` method with the right parameters. What this means is that we are |
|
278 | +verifying a "role" -- Just one aspect of the outgoing messages from the SUT. |
|
279 | + |
|
280 | +I've annotated this with the four phases. |
|
281 | + |
|
282 | +```ruby |
|
283 | +describe Converter, "delegation to logger" do |
|
284 | + subject { Converter.new(logger: logger) } # setup |
|
285 | + let(:logger) { Minitest::Mock.new } # setup |
|
286 | + let(:converter) { subject.converter } # setup |
|
287 | + |
|
288 | + it "logs the value and the converted value" do |
|
289 | + converter.stub :convert, 400 do |
|
290 | + logger.expect(:log, nil, [32, 400]) # verify |
|
291 | + subject.convert(32) # exercise |
|
292 | + end |
|
293 | + logger.verify # verify |
|
294 | + end |
|
295 | +end |
|
296 | +``` |
|
297 | + |
|
298 | + |
|
299 | +### Other topics to add |
|
300 | + |
|
301 | +* Metz's method for verifying that an object conforms to an interface |
|
302 | +* RSpec's "behaves like" pattern |
|
303 | +* Fixtures |
|
304 | + |
|
305 | +### Terms |
|
306 | + |
|
307 | +* SUT - System under test |
|
308 | +* DOC - Depended-on component |
|
309 | +* "indirect input" - This is data that gets into the SUT from a DOC. I.e., we're not calling a method with parameters; inputs are getting into the SUT via some DOC. (126) |
|
310 | +* "indirect output" - We want to verify at an "observation point" that calls to the DOC are happening correctly (127). |
|
311 | +* "Stubbing" for Indirect Input - When the SUT makes calls to the DOC, it may take data from the DOC. This data would be an "indirect input." We want to simulate these indirect inputs. Why? Because the DOC may be unpredictable or unavailable. A thing that stands in for the DOC so as to provide indirect inputs to the SUT is called a stub. The stub receives the calls and returns pre-configured responses. We want to "install a Test Stub in place of the DOC" (129). Want to provide indirect inputs? We say that install the Test Stub to act as a "control point" (135). We call it a "control point" because we are trying to force the SUT down some path (524). |
|
312 | +* "Test Spies" or "Mocking" for Indirect Output - By "indirect output," we mean the calls the SUT makes to DOCs. Example: the SUT makes calls to a logger. We want to ensure that the DOC is getting called properly. |
|
313 | + * Procedural Behavior Verification. We want to capture the calls to the DOC during SUT execution and see what happens. This means installing a Test Spy. It receives the calls and records them; then afterwards we make assertions on what is recorded in the Spy. What to check indirect outputs? We say that that happens at an "observation point" (e.g., 137). |
|
314 | + * Expected Behavior. We install a Mock Object, and say in advance what we expect. If the Mock doesn't get what we expect, it fails the test. |
test-doubles.md
... | ... | @@ -1,314 +0,0 @@ |
1 | -Page references to xUnit Test Patterns (XTP), especially Chapter 11 ("Test Doubles"), "Four-Phase Test" (pp. 358-361) and Practical Object-Oriented Design in Ruby (POODR), especially Chapter 9 ("Designing Cost-Effective Tests"). |
|
2 | - |
|
3 | -Other resources: |
|
4 | - |
|
5 | -* <https://robots.thoughtbot.com/four-phase-test> |
|
6 | - |
|
7 | -### The thing we are testing (the SUT) |
|
8 | - |
|
9 | -The thing we are testing is called a "System under test" (SUT) (XTP) or "object under test" (POODR, p. 195, Figure 9.1). We'll use SUT. It is fair to think about other things being under test, such as a class, object, method, or application, but the idea here is that the thing under test is defined by the scope of the test itself. Other objects may be involved, such as a "depended-on component" (DOC). |
|
10 | - |
|
11 | - |
|
12 | -(POODR, Figure 9.1) |
|
13 | - |
|
14 | -### Testing messages received from others |
|
15 | - |
|
16 | -When we receive a message from another object, we want to verify a state change in our own object. |
|
17 | - |
|
18 | -Example: Suppose we have an object that converts fahrenheit to celsius. When we pass in a measurement |
|
19 | -in fahrenheit, we want to verify that the computation of the celsius value is correct. |
|
20 | - |
|
21 | -```ruby |
|
22 | -require 'minitest/autorun' |
|
23 | - |
|
24 | -class TemperatureConverter |
|
25 | - def f_to_c(f) |
|
26 | - (f - 32.0) * 5.0 / 9.0 |
|
27 | - end |
|
28 | -end |
|
29 | - |
|
30 | -class TemperatureConverterTest < MiniTest::Test |
|
31 | - def setup |
|
32 | - @tc = TemperatureConverter.new |
|
33 | - end |
|
34 | - |
|
35 | - def test_f_to_c |
|
36 | - assert_in_delta 0.0, @tc.f_to_c(32), 0.01 |
|
37 | - end |
|
38 | -end |
|
39 | -``` |
|
40 | - |
|
41 | -This is very easy. The SUT does not depend on other classes; the state change can be verified directly. |
|
42 | - |
|
43 | -### Testing messages received from others -- When to use a stub |
|
44 | - |
|
45 | -Now let's say that we have a converter object that is capable of a variety of conversions. The |
|
46 | -way this is going to work is that we are going to pass in the type of conversion we want, along |
|
47 | -with a value to be converted, and we want to get the right result. |
|
48 | - |
|
49 | -It will look something like this: |
|
50 | - |
|
51 | - Converter.new(FToCConverter).convert(32) |
|
52 | - |
|
53 | -Now our Converter class depends on a collaborator, FToCConverter. XTP calls this an |
|
54 | -"indirect input" (p. 125), and the name it gives to this kind of collaborator is a DOC -- a "depended-on component." |
|
55 | -We could test Converter with a specific Collaborator (FToCConverter) or we can try to test the SUT in isolation. |
|
56 | -If we can test the SUT in isolation, then there will be fewer dependencies on other objects, |
|
57 | -which will make our test less brittle. |
|
58 | - |
|
59 | -When we test the converter, we are not attempting to establish whether the conversion is correct; |
|
60 | -instead, we want to verify that it can delegate the 32 to the specific converter and return a |
|
61 | -value. For the verification of the specific converter, we will write separate tests for that. |
|
62 | -Additionally, we may be writing the main converter first. We may not even know the range of |
|
63 | -specific converters we are going to want, or have one in hand. |
|
64 | - |
|
65 | -#### Stubbing |
|
66 | - |
|
67 | -A stub is an implementation that returns a canned answer (POODR, p. 210). |
|
68 | - |
|
69 | -**NOTE:** We stub on the DOC, not on the SUT. For some guidance on this, see <https://robots.thoughtbot.com/don-t-stub-the-system-under-test>. |
|
70 | - |
|
71 | -If we create a stub manually, we will want an instance of FToCConverter's convert method to return |
|
72 | -a value that we can verify. It might look like this: |
|
73 | - |
|
74 | -```ruby |
|
75 | -require 'minitest/autorun' |
|
76 | - |
|
77 | -class FToCConverter |
|
78 | - def convert(value) |
|
79 | - 500 |
|
80 | - end |
|
81 | -end |
|
82 | - |
|
83 | -class Converter |
|
84 | - def initialize(specific_converter_class) |
|
85 | - @specific_converter = specific_converter_class.new |
|
86 | - end |
|
87 | - def convert(value) |
|
88 | - @specific_converter.convert(value) |
|
89 | - end |
|
90 | -end |
|
91 | - |
|
92 | -class ConverterTest < MiniTest::Test |
|
93 | - def setup |
|
94 | - @c = Converter.new(FToCConverter) |
|
95 | - end |
|
96 | - |
|
97 | - def test_convert |
|
98 | - assert_in_delta 500, @c.convert(32), 0.01 |
|
99 | - end |
|
100 | -end |
|
101 | -``` |
|
102 | - |
|
103 | -**NOTE:** In the real world, this is not quite how it's done. Why? Because in this test, _we don't care about |
|
104 | -FToCConverter_. All we care about is making available to the test an object that exposes a `convert` |
|
105 | -method that returns a specific value, so that we van validate that the main `convert` method on |
|
106 | -`Converter` leverages it. In short, we want to write the least amount of code to see that the plumbing |
|
107 | -is working. In MiniTest, it might look like this: |
|
108 | - |
|
109 | -```ruby |
|
110 | -require 'minitest/autorun' |
|
111 | - |
|
112 | -class FToCConverter |
|
113 | - def convert(value) |
|
114 | - 500 |
|
115 | - end |
|
116 | -end |
|
117 | - |
|
118 | -class Converter |
|
119 | - def initialize(specific_converter) |
|
120 | - @specific_converter = specific_converter |
|
121 | - end |
|
122 | - def convert(value) |
|
123 | - @specific_converter.convert(value) |
|
124 | - end |
|
125 | -end |
|
126 | - |
|
127 | -class ConverterTest < MiniTest::Test |
|
128 | - def specific_converter |
|
129 | - @specific_converter ||= FToCConverter.new |
|
130 | - end |
|
131 | - |
|
132 | - def setup |
|
133 | - @c = Converter.new(specific_converter) |
|
134 | - end |
|
135 | - |
|
136 | - def test_convert |
|
137 | - specific_converter.stub :convert, 400 do |
|
138 | - assert_in_delta 400, @c.convert(32), 0.01 |
|
139 | - end |
|
140 | - end |
|
141 | -end |
|
142 | -``` |
|
143 | - |
|
144 | -Notice that here our implementation of `FToCConverter#convert` returns 500. But our stub of |
|
145 | -this method returns 400: and we verify against that. What this means is that we can check the plumbing |
|
146 | -of the delegation to the specific converter even if it's wrong; our stubbed test doesn't depend on |
|
147 | -what the actual class does at all: our test is completely independent. Additionally, we have moved the |
|
148 | -key verification value closer to the assertion, which is easier to read. |
|
149 | - |
|
150 | -For here on out, we will use a different syntax in MiniTest. Here's the same test but using |
|
151 | -spec-style syntax using keywords such as "describe" and "expect." |
|
152 | - |
|
153 | -```ruby |
|
154 | -require 'minitest/autorun' |
|
155 | - |
|
156 | -class FToCConverter |
|
157 | - def convert(value) |
|
158 | - 500 |
|
159 | - end |
|
160 | -end |
|
161 | - |
|
162 | -class Converter |
|
163 | - def initialize(specific_converter) |
|
164 | - @specific_converter = specific_converter |
|
165 | - end |
|
166 | - def convert(value) |
|
167 | - @specific_converter.convert(value) |
|
168 | - end |
|
169 | -end |
|
170 | - |
|
171 | -describe Converter do |
|
172 | - let(:specific_converter) { FToCConverter.new } |
|
173 | - subject { Converter.new(specific_converter) } |
|
174 | - |
|
175 | - it "can delegate to a specific converter" do |
|
176 | - specific_converter.stub :convert, 400 do |
|
177 | - expect subject.convert(32).must_be_within_epsilon 400, 0.01 |
|
178 | - end |
|
179 | - end |
|
180 | -end |
|
181 | -``` |
|
182 | - |
|
183 | -`let` and `subject` provide for dynamically creating methods that return what's defined in the blocks. |
|
184 | -With `let`, the name of the method comes from the symbol you pass in. For `subject`, the method |
|
185 | -is called `subject`. They are both lazy. So if you never use `specific_converter` the block `{ FToCConverter.new }` will |
|
186 | -never run. Also, because they are lazy, you can reverse the order. Maybe people like to have the |
|
187 | -subject of the test at the top of the spec. I'll do that below. |
|
188 | - |
|
189 | -Stubbing in MiniTest requires the the stubbed method actually exist in the instance being stubbed. Other |
|
190 | -testing frameworks are more lenient in this respect, which can result in more concise tests -- but at |
|
191 | -the expense of having test doubles that are _too_ fake and don't match up to your real collaborators. For |
|
192 | -example, you might stub a method `convert` but over time your collaborators change that method name |
|
193 | -to `transform`: Your tests would continue to pass because you've stubbed a method that doesn't exist |
|
194 | -on the real objects. |
|
195 | - |
|
196 | -### The Four-Phase Test |
|
197 | - |
|
198 | -It is conventional to think of tests as having four phases: |
|
199 | - |
|
200 | -1. Setup |
|
201 | -2. Exercise |
|
202 | -3. Verify |
|
203 | -4. Teardown |
|
204 | - |
|
205 | -Typically teardown (releasing resources) is done for you by the framework. Here's our spec of `Converter` |
|
206 | -with comments showing the phases: |
|
207 | - |
|
208 | -```ruby |
|
209 | -describe Converter do |
|
210 | - let(:specific_converter) { FToCConverter.new } # setup |
|
211 | - subject { Converter.new(specific_converter) } # setup |
|
212 | - |
|
213 | - it "can delegate to a specific converter" do |
|
214 | - specific_converter.stub :convert, 400 do |
|
215 | - converted_value = subject.convert(32) # exercise |
|
216 | - expect converted_value.must_be_within_epsilon 400, 0.01 # verify |
|
217 | - end |
|
218 | - end |
|
219 | - # teardown |
|
220 | -end |
|
221 | -``` |
|
222 | - |
|
223 | -### Testing messages sent to others -- when to mock |
|
224 | - |
|
225 | -When we send a message to another object that results in a side effect, we want to verify the side effect. |
|
226 | - |
|
227 | -In other words, we want to prove that a promised behavior change in a collaborator has been triggered. |
|
228 | - |
|
229 | -Example: |
|
230 | - |
|
231 | -Our `Converter` provides a means for there to be logging of the conversion. |
|
232 | - |
|
233 | -Let's make a few changes to our design of `Converter`. First off, let's allow that it's easier to |
|
234 | -provide configuration parameters via a hash. We'll also provide sensible defaults -- a specific converter |
|
235 | -that does nothing and `nil` for the logger. We won't trigger a logger if no logger is set. |
|
236 | - |
|
237 | -```ruby |
|
238 | -class Converter |
|
239 | - class PassThroughConverter |
|
240 | - def convert(value) |
|
241 | - value |
|
242 | - end |
|
243 | - end |
|
244 | - |
|
245 | - attr_reader :converter, :logger |
|
246 | - |
|
247 | - def initialize(args = {}) |
|
248 | - @converter = args[:converter] || PassThroughConverter.new |
|
249 | - @logger = args[:logger] |
|
250 | - end |
|
251 | - def convert(value) |
|
252 | - converted_value = converter.convert(value) |
|
253 | - log(value, converted_value) |
|
254 | - converted_value |
|
255 | - end |
|
256 | - |
|
257 | - private |
|
258 | - |
|
259 | - def log(value, converted_value) |
|
260 | - logger.log(value, converted_value) if logger |
|
261 | - end |
|
262 | -end |
|
263 | -``` |
|
264 | - |
|
265 | -Now, what do we want to verify? We do *not* want to verify that the `log` method on `Converter` gets |
|
266 | -called with `value` and `converted_value` -- what we want to know is whether the collaborator is |
|
267 | -sent the right message. In this case, we want to know if a logger instance would be sent |
|
268 | -the message `log` with the right values. At this point, we don't even have a logger class. We just |
|
269 | -know that it is going to expose a method `log` and expect that the method call will pass the |
|
270 | -value and its conversion. |
|
271 | - |
|
272 | -To make this happen, we want to create a `Mock`. Notice that we are still using the stubbed converter. |
|
273 | -But now we write `logger.expect` to set up our expectations for what will happen on the collaborating |
|
274 | -object; and then we verify it afterward. |
|
275 | - |
|
276 | -It is critical to understand that we are not mocking an object or a class; we are just verifying |
|
277 | -that the delegate is sent the `log` method with the right parameters. What this means is that we are |
|
278 | -verifying a "role" -- Just one aspect of the outgoing messages from the SUT. |
|
279 | - |
|
280 | -I've annotated this with the four phases. |
|
281 | - |
|
282 | -```ruby |
|
283 | -describe Converter, "delegation to logger" do |
|
284 | - subject { Converter.new(logger: logger) } # setup |
|
285 | - let(:logger) { Minitest::Mock.new } # setup |
|
286 | - let(:converter) { subject.converter } # setup |
|
287 | - |
|
288 | - it "logs the value and the converted value" do |
|
289 | - converter.stub :convert, 400 do |
|
290 | - logger.expect(:log, nil, [32, 400]) # verify |
|
291 | - subject.convert(32) # exercise |
|
292 | - end |
|
293 | - logger.verify # verify |
|
294 | - end |
|
295 | -end |
|
296 | -``` |
|
297 | - |
|
298 | - |
|
299 | -### Other topics to add |
|
300 | - |
|
301 | -* Metz's method for verifying that an object conforms to an interface |
|
302 | -* RSpec's "behaves like" pattern |
|
303 | -* Fixtures |
|
304 | - |
|
305 | -### Terms |
|
306 | - |
|
307 | -* SUT - System under test |
|
308 | -* DOC - Depended-on component |
|
309 | -* "indirect input" - This is data that gets into the SUT from a DOC. I.e., we're not calling a method with parameters; inputs are getting into the SUT via some DOC. (126) |
|
310 | -* "indirect output" - We want to verify at an "observation point" that calls to the DOC are happening correctly (127). |
|
311 | -* "Stubbing" for Indirect Input - When the SUT makes calls to the DOC, it may take data from the DOC. This data would be an "indirect input." We want to simulate these indirect inputs. Why? Because the DOC may be unpredictable or unavailable. A thing that stands in for the DOC so as to provide indirect inputs to the SUT is called a stub. The stub receives the calls and returns pre-configured responses. We want to "install a Test Stub in place of the DOC" (129). Want to provide indirect inputs? We say that install the Test Stub to act as a "control point" (135). We call it a "control point" because we are trying to force the SUT down some path (524). |
|
312 | -* "Test Spies" or "Mocking" for Indirect Output - By "indirect output," we mean the calls the SUT makes to DOCs. Example: the SUT makes calls to a logger. We want to ensure that the DOC is getting called properly. |
|
313 | - * Procedural Behavior Verification. We want to capture the calls to the DOC during SUT execution and see what happens. This means installing a Test Spy. It receives the calls and records them; then afterwards we make assertions on what is recorded in the Spy. What to check indirect outputs? We say that that happens at an "observation point" (e.g., 137). |
|
314 | - * Expected Behavior. We install a Mock Object, and say in advance what we expect. If the Mock doesn't get what we expect, it fails the test. |