I just don't really like GUI applications. There are roughly four things open in my screen at a time: editor, terminal, browser and slack.
But I would like to try it out anyway. I just need to find a use case where it makes sense.
I looked at the tool you linked to - does it have a cli?
What happens if there are significant differences that you expected? (Think adding a field to an invoice.)
At least, we tried plain diffing and found that it was impossible, because the noise level was way too high.
How do you handle differences that turn it to be non-errors? If you have a large diff, isn't it likely that an error slips through, if you can't filter out the insignificant ones?
Yes, you can think of it as just producing a version of the data that is ideal for comparisons. An alternative could be something that indicates how comparisons should be carried out.
Also, we didn't really know much about the data, only that there shouldn't be any differences, so when they did turn up, we had to explore both the old and new systems to understand why it happened.
I think the approach you describe can be useful for smaller data sets where the data can be organized better for human consumption. I used my tool for testing transformations of data sets that were several gigabytes but with an initial sampling step.
With my system, if I used a normalizer to remove data that would otherwise trigger an actual error, I would be mixing intents in my test code. The term filter doesn't convert the same intent to preserve all significant information.
A common example would be arrays that are actually used as sets, so the order of the entries isn't significant and duplicates shouldn't occur. The elements can then be sorted and deduplicated in the normalizer (or even better - converted to an actual set, if available).
Yes. I am using the term normalizer because I am thinking about it as a function that should ideally preserve all information, but produce a normalized version of data that is ideal for comparison.
I can be very impatient when I know that I am waiting for no obvious reason, so I always explore how to make tests as fast as possible. The test tool is very fast in offline mode, so you can run it in watch mode and get continuous feedback.
And ouf course there are ways to filter the test results, so you can focus on the output of a specific test.
I added an offline flag, so you can capture the payloads to disk and rerun the tests against the payloads on disk for as long as you are combing through the test results, writing accept() and todo() calls. This makes iterating very fast.
Obviously, it is possible to re-capture v1 at some point where you feel confident that v2 is correct and then start the whole process over again.
This dramatically reduces the amount of noise and doesn't require you to mark the differences as correct by hand. You just use code to mark the expected differences.
By explicitly allowing differences, but writing functions that mark them as accepted, it becomes much more obvious what happens as you progress, because you are retaining the original baseline (v1) and just marking the differences found as accepted.
The function todo() works much like accepted(), except it marks the difference as unresolved, removing it from the output, yet failing the test.
(As part of your test definition.)
The function accepted() is called with the difference and if the accept function returns true for a given difference, the difference is no longer considered an error and is removed from the output.
I suppose that these concepts are relatively known to you, if you have used approval testing, but in addition to this, each test is accompanied by two utility functions that you can call.
We realized that it is common to introduce a normalizer that normalizes the payloads returned by the v1- and v2 handlers (there may even be separate normalizers for each of them), where the purpose is to remove non-significant noise from the data that is being returned.
This comparizon results in a stream of differences containing the path ("foo/bar"), the difference type ("added", "removed" or "edited") and the values found. If the stream contains no items, the test passes and otherwise it fails.
So if v1 = { foo: { bar: 'baz' } } and v2 = { foo: { bar: 42 } } then the path "foo/bar" is reported to have an error with old = 'baz' and new = 42.
If you just run a test defined with these two handlers are then called and the objects are deep-compared where every difference is reported as a path to the difference.
The basic idea is that you define tests by specifying handlers that fetch v1 (old data) and v2 (new data) respectively. How the fetching is done is up to you, but the result must be a nested object structure, possibly representing json.
I am working as a contractor, so the tool is proprietary, but I think I can persuade them to release it.
Also, I only use it for (sometimes rather large) refactoring where the overall functionality of the system is expected to be unchanged, but some non-significant changes in the captured data are to be expected.
I think that Kens concern about winding up blindly approving differences necessitates tooling like the stuff I did to make approval testing create sufficient value.
@emilybache.com I watched your video about approval testing. I've been using this technique since 2005 and found a much more reliable way to differentiate between expected and unexpected differences.
He is interacting with her much the same way I've been interacting with chatbots. It's as if he is unable to even imagine that she has agency.