public/symmetric.md
... ...
@@ -0,0 +1,283 @@
1
+#### Note added 15-Sep-2017
2
+
3
+I talked to Lloyd Tabb at the Looker Join conference and asked him how he came up with this strategy: A colleague asked him whether he knew that you could do a `sum(distinct . . .)`. Lloyd did not know, but when told, this idea came to him very quickly.
4
+
5
+It has been submitted as a [patent](https://www.google.sr/patents/CA2965831A1?cl=en&dq=inassignee:%22Looker+Data+Sciences,+Inc.%22&hl=en&sa=X&ved=0ahUKEwi0uOCgkqfWAhUK64MKHYIRDtgQ6AEIJDAA).
6
+
7
+-----
8
+
9
+### Looker and Symmetric Aggregates (the 1:many "fanout" problem with aggregates)
10
+
11
+#### I was looking at some Looker-generated SQL and saw this:
12
+
13
+ SELECT
14
+ COALESCE(SUM(customers.visits ), 0) AS "customers.total_visits"
15
+ FROM public.customers AS customers
16
+
17
+(See <http://localhost:9999/explore/fanout2/customers?qid=ek2NHkVfoJ34yRkLHgagFj>)
18
+
19
+Simple. Then I added an aggregate from a joined view (a "view" in Looker is a
20
+thin façade on a table) and I saw this:
21
+
22
+ SELECT
23
+ COALESCE(COALESCE( ( SUM(DISTINCT (CAST(FLOOR(COALESCE(customers.visits ,0)*(1000000*1.0)) AS DECIMAL(65,0))) + ('x' || MD5(customers.id ::varchar))::bit(64)::bigint::DECIMAL(65,0) *18446744073709551616 + ('x' || SUBSTR(MD5(customers.id ::varchar),17))::bit(64)::bigint::DECIMAL(65,0) ) - SUM(DISTINCT ('x' || MD5(customers.id ::varchar))::bit(64)::bigint::DECIMAL(65,0) *18446744073709551616 + ('x' || SUBSTR(MD5(customers.id ::varchar),17))::bit(64)::bigint::DECIMAL(65,0)) ) / (1000000*1.0), 0), 0) AS "customers.total_visits",
24
+ COALESCE(COALESCE( ( SUM(DISTINCT (CAST(FLOOR(COALESCE(orders.amount ,0)*(1000000*1.0)) AS DECIMAL(65,0))) + ('x' || MD5(orders.id ::varchar))::bit(64)::bigint::DECIMAL(65,0) *18446744073709551616 + ('x' || SUBSTR(MD5(orders.id ::varchar),17))::bit(64)::bigint::DECIMAL(65,0) ) - SUM(DISTINCT ('x' || MD5(orders.id ::varchar))::bit(64)::bigint::DECIMAL(65,0) *18446744073709551616 + ('x' || SUBSTR(MD5(orders.id ::varchar),17))::bit(64)::bigint::DECIMAL(65,0)) ) / (1000000*1.0), 0), 0) AS "orders.total_amount"
25
+ FROM public.customers AS customers
26
+ LEFT JOIN public.orders AS orders ON customers.id = orders.customer_id
27
+
28
+What are they doing? [Looker [rightly, I think] brags about this and calls it "symmetric aggregates"](https://www.youtube.com/watch?v=oRXQ4oNsyNk) ([and here is the business case](https://discourse.looker.com/t/symmetric-aggregates/261)),
29
+but they don't explain how it works. I conducted some searches and it would
30
+seem that Looker's strategy here is unique and a genuine innovation.
31
+
32
+#### Sample database
33
+
34
+ drop table if exists customers;
35
+ drop table if exists orders;
36
+
37
+ create table customers (id int, first_name varchar, last_name varchar, visits int);
38
+ create table orders (id int, amount numeric(6,2), customer_id int);
39
+
40
+ insert into customers (id, first_name, last_name, visits) values (1, 'Amelia', 'Earhart', 2);
41
+ insert into customers (id, first_name, last_name, visits) values (2, 'Charles', 'Lindberg', 2);
42
+ insert into customers (id, first_name, last_name, visits) values (3, 'Wilbur', 'Wright', 4);
43
+
44
+ insert into orders (id, amount, customer_id) values (1, 25.00, 1);
45
+ insert into orders (id, amount, customer_id) values (2, 50.00, 1);
46
+ insert into orders (id, amount, customer_id) values (3, 75.00, 2);
47
+ insert into orders (id, amount, customer_id) values (4, 100.00, 3);
48
+
49
+How many visits are there?
50
+
51
+ select sum(visits) from customers;
52
+ -- returns 8
53
+
54
+Yay! What is the average length of a visit?
55
+
56
+ select avg(visits) from customers;
57
+ -- returns 2.66
58
+
59
+And now let's look at the classic "surprise" for beginning users of SQL:
60
+Join in a column, and . . .
61
+
62
+ select avg(visits) from customers left join orders on orders.customer_id = customers.id;
63
+ -- returns 2.5
64
+ -- oops!
65
+
66
+Let's add some more columns, remove the aggregate, and look at the data:
67
+
68
+ select
69
+ customers.id as customer_id,
70
+ customers.first_name,
71
+ customers.last_name,
72
+ customers.visits,
73
+ orders.id as order_id,
74
+ orders.amount
75
+ from customers left join orders on orders.customer_id = customers.id
76
+
77
+(<http://localhost:9999/sql/mx9cgdrvfhwsdc>)
78
+
79
+That produces:
80
+
81
+| customer_id | first_name | last_name | visits | order_id | amount |
82
+|-------------|------------|-----------|:------:|:--------:|-------:|
83
+| 1 | Amelia | Earhart | 2 | 1 | 25.00 |
84
+| 1 | Amelia | Earhart | 2 | 2 | 50.00 |
85
+| 2 | Charles | Lindberg | 2 | 3 | 75.00 |
86
+| 3 | Wilbur | Wright | 4 | 4 | 100.00 |
87
+
88
+Clear enough. We have "extra" values in the visits column
89
+because of the 1:many ("fanout") relationship
90
+between customers and orders.
91
+
92
+If we want the counts of customers and the counts of amounts, we can just use
93
+`count(distinct ...)`:
94
+
95
+ select
96
+ count(distinct customers.id) as "customers count",
97
+ count(distinct orders.id) as "orders count"
98
+ from customers left join orders on orders.customer_id = customers.id;
99
+
100
+Works great! However, `sum` is problematic.
101
+
102
+ select
103
+ sum(customers.visits) as "customers total visits",
104
+ sum(orders.amount) as "orders total amount"
105
+ from customers left join orders on orders.customer_id = customers.id;
106
+
107
+(<http://localhost:9999/sql/2brb8hdmwdbhtx>)
108
+
109
+Result:
110
+
111
+| customers total visits | orders total amount |
112
+| :--------------------: | :-----------------: |
113
+| 10 | 250.00 |
114
+
115
+Clearly this is going to seem "wrong" to people who aren't comfortable with
116
+joins. The classic ways to fix this: with a [subquery](http://www.sqlteam.com/article/aggregating-correlated-sub-queries) or [creative self-joins](http://www.sqlteam.com/article/how-to-use-group-by-with-distinct-aggregates-and-derived-tables) or [window functions](https://stackoverflow.com/a/13169627). However, these
117
+solutions can
118
+introduces as many problems as they solve (subqueries and window functions
119
+aren't as easily composable as the Looker solution), especially for an analytic tool
120
+where we don't want people to have to worry about SQL.
121
+
122
+#### Quick demo in Looker
123
+
124
+(Show the live SQL query-writing.)
125
+
126
+#### How they do it
127
+
128
+The technique is to do a `sum(distinct something)` where the "something"
129
+for visits is _the same_ for each combination of `customers.id` and `visits`.
130
+That way when we do `sum(distinct something)` there will be only one value
131
+for the combination of `customers.id = 1` and `visits = 2` (see the table
132
+above that joins `customers` and `orders`). We can then take this sum and
133
+_subtract_ the "something" applied _only_ to the customer id, and the remainder
134
+will be the total of visits from the customers table. I am going to call the calculation on the
135
+customer id `id_offset`.
136
+
137
+Another way to think about this: The aggregate is being calculated not
138
+against all of the values in the column (which can have duplicates because
139
+of the join), but against the distinct primary
140
+key from the source table.
141
+
142
+Alright, let's try a naive solution. We'll say that `something` is
143
+`(customers_id + 100) + visits`. In other words,
144
+
145
+ select
146
+ customers.id as customer_id,
147
+ customers.first_name,
148
+ customers.last_name,
149
+ customers.visits,
150
+ (customers.id + 100) as id_offset,
151
+ (customers.id + 100) + customers.visits as something,
152
+ orders.id as order_id,
153
+ orders.amount
154
+ from customers left join orders on orders.customer_id = customers.id;
155
+
156
+(<http://localhost:9999/sql/jvctwzbqnkywjr>)
157
+
158
+Now we can get a correct sum of the visits column like so:
159
+
160
+ select sum(distinct something) - sum(distinct id_offset) as "customers total visits"
161
+ from (
162
+ select
163
+ customers.id as customer_id,
164
+ customers.first_name,
165
+ customers.last_name,
166
+ customers.visits,
167
+ (customers.id + 100) as id_offset,
168
+ (customers.id + 100) + customers.visits as something,
169
+ orders.id as order_id,
170
+ orders.amount
171
+ from customers left join orders on orders.customer_id = customers.id
172
+ ) as orig;
173
+ -- returns 8 (correct!)
174
+(<http://localhost:9999/sql/bbctxqmdvwnv2z>)
175
+
176
+Right? `(103 + 104 + 107) - (101 + 102 + 103) = 8`.
177
+
178
+That's the concept in a nutshell. However, the selection of the function
179
+is critical. Here's an example where the column data doesn't work with
180
+the `+ 100` function: Customer/visit combinations must have a value for `something` that is unique.
181
+So if we had two customers like this (id: 1, visits: 2; id: 2, visits: 1),
182
+with three total visits we would get
183
+
184
+| customer id | visits | id_offset | something |
185
+| :---------: | :----: | :-------: | :-------: |
186
+| 1 | 2 | 101 | 103 |
187
+| 2 | 1 | 102 | 103 |
188
+
189
+Here, `sum(distinct something)` is 103, but `sum(distinct id_offset)` is
190
+203, and we would produce a total for visits of 100. Oops. So we are screwed. We get a bit of improvement with (customers.id * 100):
191
+
192
+| customer id | visits | id_offset | something |
193
+| :---------: | :----: | :-------: | :-------: |
194
+| 1 | 2 | 100 | 102 |
195
+| 2 | 1 | 200 | 201 |
196
+
197
+Now `sum(distinct something)` = 303, and `sum(dstinct id_offset)` = 300, do
198
+303 - 300 = 3, and we're good again.
199
+
200
+Therefore, generalizing the function would help. Here's how Looker does it:
201
+
202
+ COALESCE(COALESCE( (
203
+ SUM(DISTINCT (CAST(FLOOR(COALESCE(customers.visits ,0)*(1000000*1.0)) AS DECIMAL(65,0)))
204
+ + ('x' || MD5(customers.id::varchar))::bit(64)::bigint::DECIMAL(65,0) * 18446744073709551616
205
+ + ('x' || SUBSTR(MD5(customers.id ::varchar),17))::bit(64)::bigint::DECIMAL(65,0) )
206
+ - SUM(DISTINCT ('x' || MD5(customers.id::varchar))::bit(64)::bigint::DECIMAL(65,0) * 18446744073709551616
207
+ + ('x' || SUBSTR(MD5(customers.id ::varchar),17))::bit(64)::bigint::DECIMAL(65,0)) )
208
+ / (1000000*1.0), 0), 0)
209
+ AS "customers.total_visits"
210
+
211
+You should be able to see the skeleton of the subtraction of what I have
212
+called the `sum(dstinct id_offset)` from the `sum(distinct something)`.
213
+
214
+The meat of this is in the use of the MD5 function to get a 128 bit hash value.
215
+What they are doing is getting a decimal number out of the hash. Here are a
216
+few queries that will build up to the actual Looker query and start to make this clear.
217
+
218
+ select md5(1::varchar);
219
+ -- the basic hash: c4ca4238a0b923820dcc509a6f75849b
220
+ select 'x' || md5(1::varchar);
221
+ -- makes something that can be cast to a bit(64): xc4ca4238a0b923820dcc509a6f75849b
222
+ select ('x' || md5(1::varchar))::bit(64);
223
+ -- cast to binary: 1100010011001010010000100011100010100000101110010010001110000010
224
+ select ('x' || md5(1::varchar))::bit(64)::bigint;
225
+ -- cast to 8-byte bigint: -4266524885998034046
226
+ select ('x' || md5(1::varchar))::bit(64)::bigint::decimal(65,0);
227
+ -- cast to exact precision decimal: -4266524885998034046
228
+
229
+Some references to the PostgreSQL documentation: [casting](https://www.postgresql.org/docs/9.6/static/sql-expressions.html#SQL-SYNTAX-TYPE-CASTS); [bit string constants expressed as hexadecimal and the leading "x"](https://www.postgresql.org/docs/current/static/sql-syntax-lexical.html#SQL-SYNTAX-BIT-STRINGS); [bit string type](https://www.postgresql.org/docs/current/static/datatype-bit.html); [numeric/decimal type and arbitrary precious arithmetic](https://www.postgresql.org/docs/9.6/static/datatype-numeric.html#DATATYPE-NUMERIC-DECIMAL).
230
+
231
+Remember that the 128 bit md5 was cast to an 8-byte bigint, which happens to
232
+be signed. The range of this int is -9223372036854775808 to +9223372036854775807;
233
+there are 18446744073709551616 values in this range. The multiplication is
234
+essentially moving the digits of the md5 calculation.
235
+
236
+ select ('x' || md5(1::varchar))::bit(64)::bigint::decimal(65,0) * 18446744073709551616;
237
+ -- Expand the space: -78703492656118554855266830188162318336
238
+
239
+Now conduct the same computation on just the first 17 characters (I guess
240
+this is to avoid md5 collisions):
241
+
242
+ select ('x' || substr(md5(1::varchar), 17))::bit(64)::bigint::decimal(65,0);
243
+ -- produces: 994258241967195291
244
+
245
+And add these together:
246
+
247
+ select
248
+ ('x' || md5(1::varchar))::bit(64)::bigint::decimal(65,0) * 18446744073709551616
249
+ + ('x' || substr(md5(1::varchar), 17))::bit(64)::bigint::decimal(65,0);
250
+ -- produces: -78703492656118554854272571946195123045
251
+
252
+The same idea is done for the subtrahend.
253
+
254
+The last piece of the puzzle is the multiplication times 1000000 and division
255
+by 1000000. This is done to preserve fractional value in integer arithmetic.
256
+
257
+#### Why additional addition of part of the hash?
258
+
259
+As I say, probably they are trying to avoid md5 collisions. Seems to work pretty well without it.
260
+Here I'm commenting it out. Same result.
261
+
262
+ SELECT
263
+ (
264
+ SUM(DISTINCT
265
+ (
266
+ CAST(FLOOR(customers.visits * (1000000*1.0)) AS DECIMAL(65,0)))
267
+ + ('x' || MD5(customers.id ::varchar))::bit(64)::bigint::DECIMAL(65,0) *18446744073709551616
268
+ -- + ('x' || SUBSTR(MD5(customers.id ::varchar),17))::bit(64)::bigint::DECIMAL(65,0)
269
+ )
270
+ -
271
+ SUM(DISTINCT
272
+ (
273
+ 'x' || MD5(customers.id ::varchar))::bit(64)::bigint::DECIMAL(65,0) *18446744073709551616
274
+ -- + ('x' || SUBSTR(MD5(customers.id ::varchar),17))::bit(64)::bigint::DECIMAL(65,0)
275
+ )
276
+
277
+ )
278
+
279
+ / (1000000*1.0)
280
+
281
+ AS "customers.total_visits"
282
+ FROM public.customers AS customers
283
+ LEFT JOIN public.orders AS orders ON customers.id = orders.customer_id
public/test-doubles.md
... ...
@@ -0,0 +1,314 @@
1
+Page references to xUnit Test Patterns (XTP), especially Chapter 11 ("Test Doubles"), "Four-Phase Test" (pp. 358-361) and Practical Object-Oriented Design in Ruby (POODR), especially Chapter 9 ("Designing Cost-Effective Tests").
2
+
3
+Other resources:
4
+
5
+* <https://robots.thoughtbot.com/four-phase-test>
6
+
7
+### The thing we are testing (the SUT)
8
+
9
+The thing we are testing is called a "System under test" (SUT) (XTP) or "object under test" (POODR, p. 195, Figure 9.1). We'll use SUT. It is fair to think about other things being under test, such as a class, object, method, or application, but the idea here is that the thing under test is defined by the scope of the test itself. Other objects may be involved, such as a "depended-on component" (DOC).
10
+
11
+![](images/metz-poodr-testing.png)
12
+(POODR, Figure 9.1)
13
+
14
+### Testing messages received from others
15
+
16
+When we receive a message from another object, we want to verify a state change in our own object.
17
+
18
+Example: Suppose we have an object that converts fahrenheit to celsius. When we pass in a measurement
19
+in fahrenheit, we want to verify that the computation of the celsius value is correct.
20
+
21
+```ruby
22
+require 'minitest/autorun'
23
+
24
+class TemperatureConverter
25
+ def f_to_c(f)
26
+ (f - 32.0) * 5.0 / 9.0
27
+ end
28
+end
29
+
30
+class TemperatureConverterTest < MiniTest::Test
31
+ def setup
32
+ @tc = TemperatureConverter.new
33
+ end
34
+
35
+ def test_f_to_c
36
+ assert_in_delta 0.0, @tc.f_to_c(32), 0.01
37
+ end
38
+end
39
+```
40
+
41
+This is very easy. The SUT does not depend on other classes; the state change can be verified directly.
42
+
43
+### Testing messages received from others -- When to use a stub
44
+
45
+Now let's say that we have a converter object that is capable of a variety of conversions. The
46
+way this is going to work is that we are going to pass in the type of conversion we want, along
47
+with a value to be converted, and we want to get the right result.
48
+
49
+It will look something like this:
50
+
51
+ Converter.new(FToCConverter).convert(32)
52
+
53
+Now our Converter class depends on a collaborator, FToCConverter. XTP calls this an
54
+"indirect input" (p. 125), and the name it gives to this kind of collaborator is a DOC -- a "depended-on component."
55
+We could test Converter with a specific Collaborator (FToCConverter) or we can try to test the SUT in isolation.
56
+If we can test the SUT in isolation, then there will be fewer dependencies on other objects,
57
+which will make our test less brittle.
58
+
59
+When we test the converter, we are not attempting to establish whether the conversion is correct;
60
+instead, we want to verify that it can delegate the 32 to the specific converter and return a
61
+value. For the verification of the specific converter, we will write separate tests for that.
62
+Additionally, we may be writing the main converter first. We may not even know the range of
63
+specific converters we are going to want, or have one in hand.
64
+
65
+#### Stubbing
66
+
67
+A stub is an implementation that returns a canned answer (POODR, p. 210).
68
+
69
+**NOTE:** We stub on the DOC, not on the SUT. For some guidance on this, see <https://robots.thoughtbot.com/don-t-stub-the-system-under-test>.
70
+
71
+If we create a stub manually, we will want an instance of FToCConverter's convert method to return
72
+a value that we can verify. It might look like this:
73
+
74
+```ruby
75
+require 'minitest/autorun'
76
+
77
+class FToCConverter
78
+ def convert(value)
79
+ 500
80
+ end
81
+end
82
+
83
+class Converter
84
+ def initialize(specific_converter_class)
85
+ @specific_converter = specific_converter_class.new
86
+ end
87
+ def convert(value)
88
+ @specific_converter.convert(value)
89
+ end
90
+end
91
+
92
+class ConverterTest < MiniTest::Test
93
+ def setup
94
+ @c = Converter.new(FToCConverter)
95
+ end
96
+
97
+ def test_convert
98
+ assert_in_delta 500, @c.convert(32), 0.01
99
+ end
100
+end
101
+```
102
+
103
+**NOTE:** In the real world, this is not quite how it's done. Why? Because in this test, _we don't care about
104
+FToCConverter_. All we care about is making available to the test an object that exposes a `convert`
105
+method that returns a specific value, so that we van validate that the main `convert` method on
106
+`Converter` leverages it. In short, we want to write the least amount of code to see that the plumbing
107
+is working. In MiniTest, it might look like this:
108
+
109
+```ruby
110
+require 'minitest/autorun'
111
+
112
+class FToCConverter
113
+ def convert(value)
114
+ 500
115
+ end
116
+end
117
+
118
+class Converter
119
+ def initialize(specific_converter)
120
+ @specific_converter = specific_converter
121
+ end
122
+ def convert(value)
123
+ @specific_converter.convert(value)
124
+ end
125
+end
126
+
127
+class ConverterTest < MiniTest::Test
128
+ def specific_converter
129
+ @specific_converter ||= FToCConverter.new
130
+ end
131
+
132
+ def setup
133
+ @c = Converter.new(specific_converter)
134
+ end
135
+
136
+ def test_convert
137
+ specific_converter.stub :convert, 400 do
138
+ assert_in_delta 400, @c.convert(32), 0.01
139
+ end
140
+ end
141
+end
142
+```
143
+
144
+Notice that here our implementation of `FToCConverter#convert` returns 500. But our stub of
145
+this method returns 400: and we verify against that. What this means is that we can check the plumbing
146
+of the delegation to the specific converter even if it's wrong; our stubbed test doesn't depend on
147
+what the actual class does at all: our test is completely independent. Additionally, we have moved the
148
+key verification value closer to the assertion, which is easier to read.
149
+
150
+For here on out, we will use a different syntax in MiniTest. Here's the same test but using
151
+spec-style syntax using keywords such as "describe" and "expect."
152
+
153
+```ruby
154
+require 'minitest/autorun'
155
+
156
+class FToCConverter
157
+ def convert(value)
158
+ 500
159
+ end
160
+end
161
+
162
+class Converter
163
+ def initialize(specific_converter)
164
+ @specific_converter = specific_converter
165
+ end
166
+ def convert(value)
167
+ @specific_converter.convert(value)
168
+ end
169
+end
170
+
171
+describe Converter do
172
+ let(:specific_converter) { FToCConverter.new }
173
+ subject { Converter.new(specific_converter) }
174
+
175
+ it "can delegate to a specific converter" do
176
+ specific_converter.stub :convert, 400 do
177
+ expect subject.convert(32).must_be_within_epsilon 400, 0.01
178
+ end
179
+ end
180
+end
181
+```
182
+
183
+`let` and `subject` provide for dynamically creating methods that return what's defined in the blocks.
184
+With `let`, the name of the method comes from the symbol you pass in. For `subject`, the method
185
+is called `subject`. They are both lazy. So if you never use `specific_converter` the block `{ FToCConverter.new }` will
186
+never run. Also, because they are lazy, you can reverse the order. Maybe people like to have the
187
+subject of the test at the top of the spec. I'll do that below.
188
+
189
+Stubbing in MiniTest requires the the stubbed method actually exist in the instance being stubbed. Other
190
+testing frameworks are more lenient in this respect, which can result in more concise tests -- but at
191
+the expense of having test doubles that are _too_ fake and don't match up to your real collaborators. For
192
+example, you might stub a method `convert` but over time your collaborators change that method name
193
+to `transform`: Your tests would continue to pass because you've stubbed a method that doesn't exist
194
+on the real objects.
195
+
196
+### The Four-Phase Test
197
+
198
+It is conventional to think of tests as having four phases:
199
+
200
+1. Setup
201
+2. Exercise
202
+3. Verify
203
+4. Teardown
204
+
205
+Typically teardown (releasing resources) is done for you by the framework. Here's our spec of `Converter`
206
+with comments showing the phases:
207
+
208
+```ruby
209
+describe Converter do
210
+ let(:specific_converter) { FToCConverter.new } # setup
211
+ subject { Converter.new(specific_converter) } # setup
212
+
213
+ it "can delegate to a specific converter" do
214
+ specific_converter.stub :convert, 400 do
215
+ converted_value = subject.convert(32) # exercise
216
+ expect converted_value.must_be_within_epsilon 400, 0.01 # verify
217
+ end
218
+ end
219
+ # teardown
220
+end
221
+```
222
+
223
+### Testing messages sent to others -- when to mock
224
+
225
+When we send a message to another object that results in a side effect, we want to verify the side effect.
226
+
227
+In other words, we want to prove that a promised behavior change in a collaborator has been triggered.
228
+
229
+Example:
230
+
231
+Our `Converter` provides a means for there to be logging of the conversion.
232
+
233
+Let's make a few changes to our design of `Converter`. First off, let's allow that it's easier to
234
+provide configuration parameters via a hash. We'll also provide sensible defaults -- a specific converter
235
+that does nothing and `nil` for the logger. We won't trigger a logger if no logger is set.
236
+
237
+```ruby
238
+class Converter
239
+ class PassThroughConverter
240
+ def convert(value)
241
+ value
242
+ end
243
+ end
244
+
245
+ attr_reader :converter, :logger
246
+
247
+ def initialize(args = {})
248
+ @converter = args[:converter] || PassThroughConverter.new
249
+ @logger = args[:logger]
250
+ end
251
+ def convert(value)
252
+ converted_value = converter.convert(value)
253
+ log(value, converted_value)
254
+ converted_value
255
+ end
256
+
257
+ private
258
+
259
+ def log(value, converted_value)
260
+ logger.log(value, converted_value) if logger
261
+ end
262
+end
263
+```
264
+
265
+Now, what do we want to verify? We do *not* want to verify that the `log` method on `Converter` gets
266
+called with `value` and `converted_value` -- what we want to know is whether the collaborator is
267
+sent the right message. In this case, we want to know if a logger instance would be sent
268
+the message `log` with the right values. At this point, we don't even have a logger class. We just
269
+know that it is going to expose a method `log` and expect that the method call will pass the
270
+value and its conversion.
271
+
272
+To make this happen, we want to create a `Mock`. Notice that we are still using the stubbed converter.
273
+But now we write `logger.expect` to set up our expectations for what will happen on the collaborating
274
+object; and then we verify it afterward.
275
+
276
+It is critical to understand that we are not mocking an object or a class; we are just verifying
277
+that the delegate is sent the `log` method with the right parameters. What this means is that we are
278
+verifying a "role" -- Just one aspect of the outgoing messages from the SUT.
279
+
280
+I've annotated this with the four phases.
281
+
282
+```ruby
283
+describe Converter, "delegation to logger" do
284
+ subject { Converter.new(logger: logger) } # setup
285
+ let(:logger) { Minitest::Mock.new } # setup
286
+ let(:converter) { subject.converter } # setup
287
+
288
+ it "logs the value and the converted value" do
289
+ converter.stub :convert, 400 do
290
+ logger.expect(:log, nil, [32, 400]) # verify
291
+ subject.convert(32) # exercise
292
+ end
293
+ logger.verify # verify
294
+ end
295
+end
296
+```
297
+
298
+
299
+### Other topics to add
300
+
301
+* Metz's method for verifying that an object conforms to an interface
302
+* RSpec's "behaves like" pattern
303
+* Fixtures
304
+
305
+### Terms
306
+
307
+* SUT - System under test
308
+* DOC - Depended-on component
309
+* "indirect input" - This is data that gets into the SUT from a DOC. I.e., we're not calling a method with parameters; inputs are getting into the SUT via some DOC. (126)
310
+* "indirect output" - We want to verify at an "observation point" that calls to the DOC are happening correctly (127).
311
+* "Stubbing" for Indirect Input - When the SUT makes calls to the DOC, it may take data from the DOC. This data would be an "indirect input." We want to simulate these indirect inputs. Why? Because the DOC may be unpredictable or unavailable. A thing that stands in for the DOC so as to provide indirect inputs to the SUT is called a stub. The stub receives the calls and returns pre-configured responses. We want to "install a Test Stub in place of the DOC" (129). Want to provide indirect inputs? We say that install the Test Stub to act as a "control point" (135). We call it a "control point" because we are trying to force the SUT down some path (524).
312
+* "Test Spies" or "Mocking" for Indirect Output - By "indirect output," we mean the calls the SUT makes to DOCs. Example: the SUT makes calls to a logger. We want to ensure that the DOC is getting called properly.
313
+ * Procedural Behavior Verification. We want to capture the calls to the DOC during SUT execution and see what happens. This means installing a Test Spy. It receives the calls and records them; then afterwards we make assertions on what is recorded in the Spy. What to check indirect outputs? We say that that happens at an "observation point" (e.g., 137).
314
+ * Expected Behavior. We install a Mock Object, and say in advance what we expect. If the Mock doesn't get what we expect, it fails the test.
test-doubles.md
... ...
@@ -1,314 +0,0 @@
1
-Page references to xUnit Test Patterns (XTP), especially Chapter 11 ("Test Doubles"), "Four-Phase Test" (pp. 358-361) and Practical Object-Oriented Design in Ruby (POODR), especially Chapter 9 ("Designing Cost-Effective Tests").
2
-
3
-Other resources:
4
-
5
-* <https://robots.thoughtbot.com/four-phase-test>
6
-
7
-### The thing we are testing (the SUT)
8
-
9
-The thing we are testing is called a "System under test" (SUT) (XTP) or "object under test" (POODR, p. 195, Figure 9.1). We'll use SUT. It is fair to think about other things being under test, such as a class, object, method, or application, but the idea here is that the thing under test is defined by the scope of the test itself. Other objects may be involved, such as a "depended-on component" (DOC).
10
-
11
-![](images/metz-poodr-testing.png)
12
-(POODR, Figure 9.1)
13
-
14
-### Testing messages received from others
15
-
16
-When we receive a message from another object, we want to verify a state change in our own object.
17
-
18
-Example: Suppose we have an object that converts fahrenheit to celsius. When we pass in a measurement
19
-in fahrenheit, we want to verify that the computation of the celsius value is correct.
20
-
21
-```ruby
22
-require 'minitest/autorun'
23
-
24
-class TemperatureConverter
25
- def f_to_c(f)
26
- (f - 32.0) * 5.0 / 9.0
27
- end
28
-end
29
-
30
-class TemperatureConverterTest < MiniTest::Test
31
- def setup
32
- @tc = TemperatureConverter.new
33
- end
34
-
35
- def test_f_to_c
36
- assert_in_delta 0.0, @tc.f_to_c(32), 0.01
37
- end
38
-end
39
-```
40
-
41
-This is very easy. The SUT does not depend on other classes; the state change can be verified directly.
42
-
43
-### Testing messages received from others -- When to use a stub
44
-
45
-Now let's say that we have a converter object that is capable of a variety of conversions. The
46
-way this is going to work is that we are going to pass in the type of conversion we want, along
47
-with a value to be converted, and we want to get the right result.
48
-
49
-It will look something like this:
50
-
51
- Converter.new(FToCConverter).convert(32)
52
-
53
-Now our Converter class depends on a collaborator, FToCConverter. XTP calls this an
54
-"indirect input" (p. 125), and the name it gives to this kind of collaborator is a DOC -- a "depended-on component."
55
-We could test Converter with a specific Collaborator (FToCConverter) or we can try to test the SUT in isolation.
56
-If we can test the SUT in isolation, then there will be fewer dependencies on other objects,
57
-which will make our test less brittle.
58
-
59
-When we test the converter, we are not attempting to establish whether the conversion is correct;
60
-instead, we want to verify that it can delegate the 32 to the specific converter and return a
61
-value. For the verification of the specific converter, we will write separate tests for that.
62
-Additionally, we may be writing the main converter first. We may not even know the range of
63
-specific converters we are going to want, or have one in hand.
64
-
65
-#### Stubbing
66
-
67
-A stub is an implementation that returns a canned answer (POODR, p. 210).
68
-
69
-**NOTE:** We stub on the DOC, not on the SUT. For some guidance on this, see <https://robots.thoughtbot.com/don-t-stub-the-system-under-test>.
70
-
71
-If we create a stub manually, we will want an instance of FToCConverter's convert method to return
72
-a value that we can verify. It might look like this:
73
-
74
-```ruby
75
-require 'minitest/autorun'
76
-
77
-class FToCConverter
78
- def convert(value)
79
- 500
80
- end
81
-end
82
-
83
-class Converter
84
- def initialize(specific_converter_class)
85
- @specific_converter = specific_converter_class.new
86
- end
87
- def convert(value)
88
- @specific_converter.convert(value)
89
- end
90
-end
91
-
92
-class ConverterTest < MiniTest::Test
93
- def setup
94
- @c = Converter.new(FToCConverter)
95
- end
96
-
97
- def test_convert
98
- assert_in_delta 500, @c.convert(32), 0.01
99
- end
100
-end
101
-```
102
-
103
-**NOTE:** In the real world, this is not quite how it's done. Why? Because in this test, _we don't care about
104
-FToCConverter_. All we care about is making available to the test an object that exposes a `convert`
105
-method that returns a specific value, so that we van validate that the main `convert` method on
106
-`Converter` leverages it. In short, we want to write the least amount of code to see that the plumbing
107
-is working. In MiniTest, it might look like this:
108
-
109
-```ruby
110
-require 'minitest/autorun'
111
-
112
-class FToCConverter
113
- def convert(value)
114
- 500
115
- end
116
-end
117
-
118
-class Converter
119
- def initialize(specific_converter)
120
- @specific_converter = specific_converter
121
- end
122
- def convert(value)
123
- @specific_converter.convert(value)
124
- end
125
-end
126
-
127
-class ConverterTest < MiniTest::Test
128
- def specific_converter
129
- @specific_converter ||= FToCConverter.new
130
- end
131
-
132
- def setup
133
- @c = Converter.new(specific_converter)
134
- end
135
-
136
- def test_convert
137
- specific_converter.stub :convert, 400 do
138
- assert_in_delta 400, @c.convert(32), 0.01
139
- end
140
- end
141
-end
142
-```
143
-
144
-Notice that here our implementation of `FToCConverter#convert` returns 500. But our stub of
145
-this method returns 400: and we verify against that. What this means is that we can check the plumbing
146
-of the delegation to the specific converter even if it's wrong; our stubbed test doesn't depend on
147
-what the actual class does at all: our test is completely independent. Additionally, we have moved the
148
-key verification value closer to the assertion, which is easier to read.
149
-
150
-For here on out, we will use a different syntax in MiniTest. Here's the same test but using
151
-spec-style syntax using keywords such as "describe" and "expect."
152
-
153
-```ruby
154
-require 'minitest/autorun'
155
-
156
-class FToCConverter
157
- def convert(value)
158
- 500
159
- end
160
-end
161
-
162
-class Converter
163
- def initialize(specific_converter)
164
- @specific_converter = specific_converter
165
- end
166
- def convert(value)
167
- @specific_converter.convert(value)
168
- end
169
-end
170
-
171
-describe Converter do
172
- let(:specific_converter) { FToCConverter.new }
173
- subject { Converter.new(specific_converter) }
174
-
175
- it "can delegate to a specific converter" do
176
- specific_converter.stub :convert, 400 do
177
- expect subject.convert(32).must_be_within_epsilon 400, 0.01
178
- end
179
- end
180
-end
181
-```
182
-
183
-`let` and `subject` provide for dynamically creating methods that return what's defined in the blocks.
184
-With `let`, the name of the method comes from the symbol you pass in. For `subject`, the method
185
-is called `subject`. They are both lazy. So if you never use `specific_converter` the block `{ FToCConverter.new }` will
186
-never run. Also, because they are lazy, you can reverse the order. Maybe people like to have the
187
-subject of the test at the top of the spec. I'll do that below.
188
-
189
-Stubbing in MiniTest requires the the stubbed method actually exist in the instance being stubbed. Other
190
-testing frameworks are more lenient in this respect, which can result in more concise tests -- but at
191
-the expense of having test doubles that are _too_ fake and don't match up to your real collaborators. For
192
-example, you might stub a method `convert` but over time your collaborators change that method name
193
-to `transform`: Your tests would continue to pass because you've stubbed a method that doesn't exist
194
-on the real objects.
195
-
196
-### The Four-Phase Test
197
-
198
-It is conventional to think of tests as having four phases:
199
-
200
-1. Setup
201
-2. Exercise
202
-3. Verify
203
-4. Teardown
204
-
205
-Typically teardown (releasing resources) is done for you by the framework. Here's our spec of `Converter`
206
-with comments showing the phases:
207
-
208
-```ruby
209
-describe Converter do
210
- let(:specific_converter) { FToCConverter.new } # setup
211
- subject { Converter.new(specific_converter) } # setup
212
-
213
- it "can delegate to a specific converter" do
214
- specific_converter.stub :convert, 400 do
215
- converted_value = subject.convert(32) # exercise
216
- expect converted_value.must_be_within_epsilon 400, 0.01 # verify
217
- end
218
- end
219
- # teardown
220
-end
221
-```
222
-
223
-### Testing messages sent to others -- when to mock
224
-
225
-When we send a message to another object that results in a side effect, we want to verify the side effect.
226
-
227
-In other words, we want to prove that a promised behavior change in a collaborator has been triggered.
228
-
229
-Example:
230
-
231
-Our `Converter` provides a means for there to be logging of the conversion.
232
-
233
-Let's make a few changes to our design of `Converter`. First off, let's allow that it's easier to
234
-provide configuration parameters via a hash. We'll also provide sensible defaults -- a specific converter
235
-that does nothing and `nil` for the logger. We won't trigger a logger if no logger is set.
236
-
237
-```ruby
238
-class Converter
239
- class PassThroughConverter
240
- def convert(value)
241
- value
242
- end
243
- end
244
-
245
- attr_reader :converter, :logger
246
-
247
- def initialize(args = {})
248
- @converter = args[:converter] || PassThroughConverter.new
249
- @logger = args[:logger]
250
- end
251
- def convert(value)
252
- converted_value = converter.convert(value)
253
- log(value, converted_value)
254
- converted_value
255
- end
256
-
257
- private
258
-
259
- def log(value, converted_value)
260
- logger.log(value, converted_value) if logger
261
- end
262
-end
263
-```
264
-
265
-Now, what do we want to verify? We do *not* want to verify that the `log` method on `Converter` gets
266
-called with `value` and `converted_value` -- what we want to know is whether the collaborator is
267
-sent the right message. In this case, we want to know if a logger instance would be sent
268
-the message `log` with the right values. At this point, we don't even have a logger class. We just
269
-know that it is going to expose a method `log` and expect that the method call will pass the
270
-value and its conversion.
271
-
272
-To make this happen, we want to create a `Mock`. Notice that we are still using the stubbed converter.
273
-But now we write `logger.expect` to set up our expectations for what will happen on the collaborating
274
-object; and then we verify it afterward.
275
-
276
-It is critical to understand that we are not mocking an object or a class; we are just verifying
277
-that the delegate is sent the `log` method with the right parameters. What this means is that we are
278
-verifying a "role" -- Just one aspect of the outgoing messages from the SUT.
279
-
280
-I've annotated this with the four phases.
281
-
282
-```ruby
283
-describe Converter, "delegation to logger" do
284
- subject { Converter.new(logger: logger) } # setup
285
- let(:logger) { Minitest::Mock.new } # setup
286
- let(:converter) { subject.converter } # setup
287
-
288
- it "logs the value and the converted value" do
289
- converter.stub :convert, 400 do
290
- logger.expect(:log, nil, [32, 400]) # verify
291
- subject.convert(32) # exercise
292
- end
293
- logger.verify # verify
294
- end
295
-end
296
-```
297
-
298
-
299
-### Other topics to add
300
-
301
-* Metz's method for verifying that an object conforms to an interface
302
-* RSpec's "behaves like" pattern
303
-* Fixtures
304
-
305
-### Terms
306
-
307
-* SUT - System under test
308
-* DOC - Depended-on component
309
-* "indirect input" - This is data that gets into the SUT from a DOC. I.e., we're not calling a method with parameters; inputs are getting into the SUT via some DOC. (126)
310
-* "indirect output" - We want to verify at an "observation point" that calls to the DOC are happening correctly (127).
311
-* "Stubbing" for Indirect Input - When the SUT makes calls to the DOC, it may take data from the DOC. This data would be an "indirect input." We want to simulate these indirect inputs. Why? Because the DOC may be unpredictable or unavailable. A thing that stands in for the DOC so as to provide indirect inputs to the SUT is called a stub. The stub receives the calls and returns pre-configured responses. We want to "install a Test Stub in place of the DOC" (129). Want to provide indirect inputs? We say that install the Test Stub to act as a "control point" (135). We call it a "control point" because we are trying to force the SUT down some path (524).
312
-* "Test Spies" or "Mocking" for Indirect Output - By "indirect output," we mean the calls the SUT makes to DOCs. Example: the SUT makes calls to a logger. We want to ensure that the DOC is getting called properly.
313
- * Procedural Behavior Verification. We want to capture the calls to the DOC during SUT execution and see what happens. This means installing a Test Spy. It receives the calls and records them; then afterwards we make assertions on what is recorded in the Spy. What to check indirect outputs? We say that that happens at an "observation point" (e.g., 137).
314
- * Expected Behavior. We install a Mock Object, and say in advance what we expect. If the Mock doesn't get what we expect, it fails the test.