COMP61511 (Fall 2018)

Software Engineering Concepts
In Practice

Week 2

Bijan Parsia & Christos Kotselidis

<bijan.parsia, christos.kotselidis@manchester.ac.uk>
(bug reports welcome!)

FizzBuzz in Way Too Much Detail

NP-1

The Naivest Fizzbuzz

  • Any proposals?
  • Let's see the obvious!

The Naivest Fizzbuzz (source)

print("""1
2
Fizz
4
Buzz
Fizz
7
8
Fizz
Buzz
11
Fizz
13
14
FizzBuzz
16
17
Fizz
19
Buzz
Fizz
22
23
Fizz
Buzz
26
Fizz
28
29
FizzBuzz
31
32
Fizz
34
Buzz
Fizz
37
38
Fizz
Buzz
41
Fizz
43
44
FizzBuzz
46
47
Fizz
49
Buzz
Fizz
52
53
Fizz
Buzz
56
Fizz
58
59
FizzBuzz
61
62
Fizz
64
Buzz
Fizz
67
68
Fizz
Buzz
71
Fizz
73
74
FizzBuzz
76
77
Fizz
79
Buzz
Fizz
82
83
Fizz
Buzz
86
Fizz
88
89
FizzBuzz
91
92
Fizz
94
Buzz
Fizz
97
98
Fizz
Buzz""")

A Rational FizzBuzz

A Rational FizzBuzz (source)

for i in range(1,101):
    if i % 3 == 0 and i % 5 == 0:
        print('FizzBuzz')
    elif i % 3 == 0:
        print('Fizz')
    elif i % 5 == 0:
        print('Buzz')
    else:
        print(i)

DRY

  • "Don't Repeat Yourself"
    • A fundamental principle of SE
    • It is against
      • Cut and Paste reuse
      • Not Invented Here syndrome
  • Is our current version DRY?

A Dryer Version

A Dryer Version (source)

for i in range(1,101):
    fizz = i % 3 == 0
    buzz = i % 5 == 0
    if fizz and buzz:
        print('FizzBuzz')
    elif fizz:
        print('Fizz')
    elif buzz:
        print('Buzz')
    else:
        print(i)

EVEN DRIER!!!

  • We repeat the _ % _ == 0 pattern!
  • We say print a lot
  • We can fix it!

EVEN DRIER!!! (source)

FIZZ = 'Fizz'
BUZZ = 'Buzz'

def divisible_by(numerator, denominator):
    return numerator % denominator == 0

def fizzit(num):
    fizz = divisible_by(num, 3)
    buzz = divisible_by(num, 5)
    if fizz and buzz:
        return FIZZ + BUZZ
    elif fizz:
        return FIZZ
    elif buzz:
        return BUZZ
    else:
        return i

for i in range(1,101):
    print(fizzit(i))

Parameterization

  • Basic software principle: Don't hard code stuff!
    • Make your code parameterisable!
  • The current version hard codes a lot, e.g.,
    FIZZ = 'Fizz'
    BUZZ = 'Buzz'
    
  • We have to modify the source code if we want to change this!
    • What else is hard coded?
    • We can fix it!

Parameterization (source)

"""We parameterise by:
* The range of integers covered.
* The text that is output.
* The multiples that trigger text to be output

https://www.tomdalling.com/blog/software-design/fizzbuzz-in-too-much-detail/"""

def fizzbuzz(bounds, triggers):
    for i in bounds:
        result = ''
        for text, divisor in triggers:
            result += text if i % divisor == 0 else ''
        print(result if result else i)

fizzbuzz(range(1, 101), [
    ['Fizz', 3],
    ['Buzz', 5]])

Still Hard Coding!

  • The kind of test is hard coded
  • We can fix that!

Still Hard Coding! (source)

def fizzbuzz(bounds, triggers):
    for i in bounds:
        result = ''
        for text, predicate in triggers:
            result += text if predicate(i) else ''
        print(result if result else i)

fizzbuzz(range(1, 101), [
    ['Fizz', lambda i: i % 3 == 0],
    ['Buzz', lambda i: i % 5 == 0],
    ['Zazz', lambda i: i < 10]
])

The Path to Hell...

  • ...is paved with good intentions!
  • Each choice was somehow reasonable
    • We applied good SE principles
    • We made choices that are often good
  • But we ended up in nonsense land
    • Local sense led to global nonsense

Judgement

  • Software engineers can't just follow rules
  • Good software engineering requires judgement
    • When to apply which rules
    • When to break rules
    • *How to apply or break them
    • The reason for each rule
      • And whether it makes sense now

Acknowledgement

This lecture was derived from the excellent blog post FizzBuzz In Too Much Detail by Tom Dalling.

Tom uses Ruby and goes a couple of steps further. Worth a read!

Product Qualities

Qualities (or "Properties")

  • Software has a variety of characteristics
    • Size, implementation language, license...
    • User base, user satisfaction, market share...
    • Crashingness, bugginess, performance, functions...
    • Usability, prettiness, slickness...

"Quality" of Success

  • Success is determined by
    • the success criteria
      • i.e., the nature and degree of desired characteristics
    • whether the software fulfils those criteria
      • i.e., possesses the desired characteristics to the desired degree

Inducing Success

  • While success is determined by qualities
    • the determination isn't straightforward
    • the determination isn't strict
      • for example, luck plays a role!
    • it depends on how you specify the critical success factors

Software Quality Landscape

20.1. Characteristics of Software Quality

External vs. Internal (rough thought)

  • External qualities:
    • McConnell: those "that a user of the software product is aware of"
  • Internal qualities:
    • "non-external characterisitcs that a developer directly experiences while working on that software"
  • Boundary varies with the kind of user!

External Definition

  • External qualities:
    • McConnell: those "that a user of the software product is aware of"
    • This isn't quite right!
      • A user might be aware of the implementation langauge
    • "characteristics of software that a user directly experiences in the normal use of that software"?

Internal Definition

  • Internal qualities:
    • "non-external characterisitcs that a developer directly experiences while working on that software"
    • Intuitively, "under the hood"

External: Functional vs. Non-functional

  • Functional ≈ What the software does
    • Behavioural
    • What does it accomplish for the user
    • Primary requirements
  • Non-functional ≈ How it does it
    • Quality of service
      • There can be requirements here!
    • Ecological features

Key Functional: Correctness

  • Correctness
    • Freedom from faults in
      • spec,
      • design,
      • implementation
    • Does the job
    • Fulfills all the use cases or user stories

Implementation and design could be perfect, but if there was a spec misunderstanding, ambiguity, or change, the software will not be correct!

External: "Qualities of Service"

  • Usability — can the user make it go
  • Efficiency — wrt time & space
  • Reliability — long MTBF
  • Integrity
    • Corruption/loss free
    • Attack resistance/secure
  • Robustness — behaves well on strange input

All these contribute to the user experience (UX)!

Internal: Testability

  • A critical property!
    • Relative to a target quality
      • A system could be
        • highly testable for correctenss
        • lowly testable for efficiency
    • Partly determined by test infrastructure
      • Having great hooks for tests pointless without tests

Internal: Testability

  • Practically speaking
    • Low testability blocks knowing qualities
    • Test-based evidence is essential

Comprehending Product Qualities

Got-an-idea

Comprehension?

  • We can distinguish two forms:
    • Know-that
      • You believe a true claim about the software
      • ...with appropriate evidence
    • Know-how
      • You have a competancy with respect to the software
      • E.g., you know-how to recompile it for a different platform
  • Both require significant effort!

Quality Levels

  • We talked about different kinds of quality
    • Coming in degrees or amounts
    • "Easy" example: Good vs. poor performance
  • Most qualities in principle are quantifiable
    • Most things are quantifiable
  • But reasonable quantification isn't always possible
    • Or worth it

Defects as Quality Lacks

A defect in a software system is a quality level (for some quality) that is not acceptable.

  • Quality levels need to be elicited and negotiated
    • All parties must agree on
      • what they are,
      • their operational definition
      • their significance

What counts as a defect is often determined late in the game!

Question

If your program crashes then it

  1. definitely has a bug.
  2. is highly likely to have a bug.
  3. may or may not have a bug.

Question

If your program crashes, and the cause is in your code, then it

  1. definitely has a bug.
  2. is highly likely to have a bug.
  3. may or may not have a bug.

Bug or Feature?

(Does QA hate you? — scroll for the cartoons as well as the wisdom.)

  • Even a crashing code path can be a feature!
  • Contention arises when the stakes are high
    • and sometime the stakes can seem high to some people!
    • defect rectification costs the same
      • whether the defect is detected...
      • ...or a feature is redefined
  • Defects (even redefined features) aren't personal

Problem Definition

This is a logical, not temporal, order.

Problem Definition

The penalty for failing to define the problem is that you can waste a lot of time solving the wrong problem. This is a double-barreled penalty because you also don't solve the right problem.
McConnell, 3.3

Quality Assurance

  • Defect Avoidance or Prevention
    • "Prerequisite" work can help
      • Requirement negotiation
      • Design
      • Tech choice
    • Methodology
  • Defect Detection & Rectification
    • If a defect exists,
      • Find it
      • Fix it

The Points of Quality

  1. Defect prevention
    • Design care, code reviews, etc.
  2. Defect appraisal
    • Detection, triaging, etc.
  3. Internal rectification
    • We fix/mitigate before shipping
  4. External rectification
    • We cope after shipping

Defect Detection Techniques

Defect Detection Techniques

Experiencing Software

  • It's one to know that there are bugs
    • All software has bugs!
  • It's another to be able to trigger a bug
    • Not just a specific bug!
    • If you understand the software
      • You know how to break it.
  • Similarly, for making changes
    • tweaks, extensions, adaptions, etc.
  • The more command, the more modalities of mastery

Lab!

Keep-calm-and-carry-on-scan

Revisiting Rainfall

We're going to look at your rainfalls before discussing it in detail.

We're going to do a code review!

You're going to work in 2-person teams!

Three Tasks

  1. Do a code review!
  2. Write some tests based on your code review!
  3. Do an essay review!

To the lab! Material in the usual place.

Testing Rainfall

A study in rain

Rainfail

  • 14 out of 29 students submitted
  • Key point: 1 out of 15 programs passed all 13 tests
    • 1 program passed ALL tests1
    • 1 passed 9
    • 6 passed 8
    • 3 passed 6
    • 1 passed 4
    • 1 passed 0
    • 2 "We could not compile your code."

The rainfall problem is still a challenge!

Let's Talk Testing

  • You had limited time
    • So test generation had to be quick!
    • Typically ad hoc
      • Can we do better?
  • How testable is rainfall.py?
    • You were responsible only for average_rainfall(input_list)
      • Only this unit! Can ignore all else!
        • Perfect for doctest

Problem Statement

Design a program called rainfall that consumes a list of numbers representing daily rainfall amounts as entered by a user. The list may contain the number -999 indicating the end of the data of interest. Produce the average of the non-negative values in the list up to the first -999 (if it shows up). There may be negative numbers other than -999 in the list.

Set up

def average_rainfall(input_list):
    """>>> average_rainfall(<<FIRST TEST INPUT>>)
    <<FIRST EXPECTED RESULT>>
    """
    # Here is where your code should go
    return "Your computed average" #<-- change this!
$ python 1setup.py 
Your computed average

First Test Run

$ python -m doctest 1setup.py 
**********************************************************************
File "/Users/bparsia/Documents/2018/Teaching/COMP61511/labs/lab1/followup/1setup.py", line 2, in 1setup.average_rainfall
Failed example:
    average_rainfall(<<FIRST TEST INPUT>>)
Exception raised:
    Traceback (most recent call last):
      File "//anaconda/lib/python3.5/doctest.py", line 1320, in __run
        compileflags, 1), test.globs)
      File "<doctest 1setup.average_rainfall[0]>", line 1
        average_rainfall(<<FIRST TEST INPUT>>)
                          ^
    SyntaxError: invalid syntax
**********************************************************************
1 items had failures:
   1 of   1 in 1setup.average_rainfall
***Test Failed*** 1 failures.

First Test

  • Where do we get our first real test?
    • Hint: Read the docs:

Convert to Appropriate doctest

  • For a system test, we'd need to use subprocess etc.
    • But we can just test our unit!
      • average_rainfall(input_list)
      • But it takes a list not a string as input!
    • '2 3 4 67 -999' ==> [2, 3, 4, 67, -999]
      • We had to massage the input to get our test!

Tested average_rainfall v 2

def average_rainfall(input_list):
    """>>> average_rainfall([2,3,4,67, -999])
    19.0
    """
    # Here is where your code should go
    return "Your computed average" #<-- change this!
$ python 1setup.py 
Your computed average

Second Test Run

$ python -m doctest 2firstfull.py 
**********************************************************************
File "/Users/bparsia/Documents/2018/Teaching/COMP61511/labs/lab1/followup/2firstfull.py", line 2, in 2firstfull.average_rainfall
Failed example:
    average_rainfall([2,3,4,67, -999])
Expected:
        19.0
Got:
    'Your computed average'
**********************************************************************
1 items had failures:
   1 of   1 in 2firstfull.average_rainfall
***Test Failed*** 1 failures.

Yay!

  • We have a real and reasonable test!
    • And a clear format for subsequent tests
    • And an infrastructure that makes it easy to run tests
  • We have a broken implementation
    • As witnessed by a test!
  • We Can Fix It!

Rosie Sez

First Implementation

def average_rainfall(input_list):
    """>>> average_rainfall([2,3,4,67, -999])
    19.0
    """
    # Here is where your code should go
    return sum(input_list)/len(input_list)
  • Will this fail this test?
  • Is there a test that it will pass?

First Implementation with Test

def average_rainfall(input_list):
    """>>> average_rainfall([2,3,4,67, -999])
    19.0
    >>> average_rainfall([2,3,4,67])
    19.0
    """
    # Here is where your code should go
    return sum(input_list)/len(input_list)

Third Test Run

$ python -m doctest 4firstimpl2.py 
**********************************************************************
File "/Users/bparsia/Documents/2018/Teaching/COMP61511/labs/lab1/followup/4firstimpl2.py", line 2, in 4firstimpl2.average_rainfall
Failed example:
    average_rainfall([2,3,4,67, -999])
Expected:
        19.0
Got:
    -184.6
**********************************************************************
1 items had failures:
   1 of   2 in 4firstimpl2.average_rainfall
***Test Failed*** 1 failures.

Second Implementation

def average_rainfall(input_list):
    """>>> average_rainfall([2,3,4,67, -999])
    19.0
    >>> average_rainfall([2,3,4,67])
    19.0
    """
    # Here is where your code should go
    return sum(input_list[:-1])/len(input_list[:-1])
  • Fixes one test but not the other!
  • Tests work together

Third Implementation

def average_rainfall(input_list):
    """>>> average_rainfall([2, 3, 4, 67, -999])
    19.0
    >>> average_rainfall([2, 3, 4, 67])
    19.0
    """
    rainfall_sum = 0
    count = 0
    for i in input_list:
        if i == -999:
            break
        else:
            rainfall_sum += i
            count += 1
    # Here is where your code should go
    return rainfall_sum/count

Fourth Test Run

$ python -m doctest 5secondimpl.py 
**********************************************************************
File "/Users/bparsia/Documents/2018/Teaching/COMP61511/labs/lab1/followup/5secondimpl.py", line 2, in 5secondimpl.average_rainfall
Failed example:
    average_rainfall([2,3,4,67, -999])
Expected:
        19.0
Got:
    19.0
**********************************************************************
1 items had failures:
   1 of   2 in 5secondimpl.average_rainfall
***Test Failed*** 1 failures.

Whaaaaaaaaaaaaaaaaaat?!

A Bug!

  • There was a bug in our tests

    • All along!
      def average_rainfall(input_list):
      """>>> average_rainfall([2, 3, 4, 67, -999])
      19.0
      
      vs.
      def average_rainfall(input_list):
      """    >>> average_rainfall([2, 3, 4, 67])
      19.0
      
  • Earlier tests failed for two reasons!

  • One bug concealed the other!!!

Yay!

$ python -m doctest 6secondimpl2.py 
$
$ python -m doctest -v 6secondimpl2.py 
Trying:
    average_rainfall([2,3,4,67, -999])
Expecting:
    19.0
ok
Trying:
    average_rainfall([2,3,4,67])
Expecting:
    19.0
ok
1 items had no tests:
    6secondimpl2
1 items passed all tests:
   2 tests in 6secondimpl2.average_rainfall
2 tests in 2 items.
2 passed and 0 failed.
Test passed. 

Next Tests?

  • These tests clearly aren't enough
  • What next?
    • Look for boundary conditions ([-999])
    • Look for "odd equivalents"
      • Is [-999, 1] the same as [-999]?
      • How about [] and [-999]?
      • How about [-999] and [-999, 0]
    • Look for normal cases you haven't covered
      • [-1, 0, 10, -999]
      • For each new feature iterate the earlier moves!
        • e.g., is [-1, -2, -3, -999 1] the same as []?

A Classification of Tests

Anthropogenie; oder, Entwickelungsgeschichte des menschen. Keimes- und stammesgeschichte (1891) (19181262098)

A Classification of Tests

  • Based on a 5W+H approach by Ray Sinnema (archived)
    • Who (Programmer vs. customer vs. manager vs...)
    • What (Correctness vs. Performance vs. Useability vs...)
    • When (Before writing code or after)
      • Or even before architecting!
    • Where (Unit vs. Component vs. Integration vs. System)
      • Or lab vs. field
    • Why (Verification vs. specification vs. design)
    • How (Manual vs. automated)
      • On demand vs. continuous

Who?

  • Sinnema: Tests give confidence in the system
    • I.e., they are evidence of a quality
    • Who is getting the evidence?
      • Users? Tests focus on external qualities
        • Can I accept this software?
      • Programmers? Tests focus on internal qualities
        • Can I check in this code?
      • Managers? Both?
        • Are we ready to release
  • But also, who is writing the test?
    • A bug report is a (typically partial) test case!

What?

  • Which qualities am I trying to show?
    • Internal vs. external
    • Functional vs. non-functional?
    • Most developer testing is functional (i.e., correctness)
      • And at the unit level
      • Does this class behave as designed

When?

  • When is the test written?
    • Before the code is written?
    • After the code is written?
  • Perhaps a better distinction
    • Tests written with existing code/design in mind
    • Test written without regard for existing code/design
    • This is related to white vs. black box testing
      • Main difference is whether you respect the existing API

Where?

  • Unit
    • Smallest "chunk" of coherent code
    • Method, routine, sometimes a class
    • McConnell: "the execution of a complete class, routine, or small program that has been written by a single programmer or team of programmers, which is tested in isolation from the more complete system"
  • Component (McConnell specific, I think)
    • "work of multiple programmers or programming teams" and in isolation

Where? (ctnd)

  • Integration
    • Testing the interaction of two or more units/components
  • System
    • Testing the system as a whole
    • In the lab
      • I.e., in a controlled setting
    • In the field
      • I.e., in "natural", uncontrolled settings

Where? (ctnd encore)

  • Regression
    • A bit of a funny one
    • Backward looking and change oriented
      • Ensure a change hasn't broken anything
      • Esp previous fixes.

Why?

  • Three big reasons
    1. Verification (or validation)
      • Does the system possess a quality to a certain degree?
    2. Design
      • Impose constraints on the design space
        • Both structure and function
    3. Comprehension
      • How does the system work?
        • Reverse engineering
      • How do I work with the system?

How?

  • Manual
    • Typically interactive
      • Human intervation for more than initiation
    • Expectations flexible
  • Automated
    • The test executes and evaluates on initiation
    • Automatically run (i.e., continuously)

Test Coverage(s)

Blanket

Coverage

  • Esp. for fine grained tests, generality is a problem
  • We want a set of tests that
    • determines some property
    • at a reasonable level of confidence
  • This typically requires coverage

Coverage and Requirements

  • Consider acceptance testing
    • For a test suite to support acceptance
      • It needs to provide information about all the critical requirements
  • Consider test driven development
    • Where tests drive design
    • What happens without requirements coverage?

Code Coverage

  • A test case (or suite) covers a line of code
    • if the running of the test executes the LOC
  • Code coverage is a minimal sort of completeness
    • See McConnell on "basis" testing
      • Aim for minimal test suite with full code coverage
    • See coverage.py
    • Tricky bit typically involves branches
      • The more branches, the harder to achieve code coverage

Input Coverage

  • Input spaces are (typically) too large to cover directly
    • So we need a sample
    • Pure sample probably inadequate
      • Space too large and uninteresting
    • We want a biased sample
      • E.g., where the bugs are
        • Hence, attention to boundary cases
      • E.g., common inputs
        • That is, what's likely to be seen

Situation/Scenario Coverage

  • Inputs aren't everything
    • Machine configuration
    • History of use
    • Interaction patterns
  • Field testing helps
    • Hence alpha plus narrow and wide beta testing
  • System tests answer to this!

Limits of (Developer) Testing

Developing Test Strategies

  • Have one! However preliminary
    • Ad hoc testing rarely works out well
  • Review it regularly
    • You may need adjustements based on
      • Individual or team psychology
      • Situation
  • The McConnell basic strategy (22.2) is a good default

Developer Test Strategies

McConnell: 22.2 Recommended Approach to Developer Testing

  • "Test for each relevant requirement to make sure that the requirements have been implemented."
  • "Test for each relevant design concern to make sure that the design has been implemented... as early as possible"
  • "Use "basis testing" ...At a minimum, you should test every line of code."
  • "Use a checklist of the kinds of errors you've made on the project to date or have made on previous projects."
  • Design the test cases along with the product.

What about input coverage in WC?

  • By reverse engineering wc we aim for an alternative python implementation
  • With a clear spec according to CW1
  • How can we achieve functional correctness of miniwc?
    • By achieving 100% input coverage to satisfy the specification
    • Let's see some examples...

Empty text file

Common case: 1 line

Common case: 2 lines

Visualising potential errors

  • Guard against program input
    • What kind of file? Different types, wrong names...
    • Contents of file?
  • Provide input coverage for every output dimension
    • Number of lines (single, multiple)
    • Number of characters (common case, large, small)
    • Number of words (how are words counted?)
    • Number of bytes (encoding?)

Coursework Recap

Keep-calm-and-carry-on-scan

Coursework Activities

  • Reading
  • Q1
    • Mostly related to reading
    • Mostly "Recall"...with some interpretation
      • They will go higher on the Bloom taxonomy!
  • SE1
    • Reading and analysing
  • CW1
    • Reverse engineered a specification
    • Reengineered miniwc from the spec
      • Program construction

A note on marks

  • UK marks run from 0-100%
    • <=49 = Failing (<40% serious failure)
    • 50-59 = Pass
    • 60-69 = Merit
    • over 70 = Distinction
    • NOTE THE WIDE BAND AT THE TOP AND BOTTOM
  • A 65% is a good mark
  • An 85% is exceedingly rare
  • Over 70% is fairly rare

Q1

  • Mean of 3.40 (68%)
    • 2018: 3.57 (71%)
    • 2017: 3.71 (74%)
  • We will do some "in exam conditions"
  • Let's delve

SE1

  • Mean of 3.39 (68%)
  • SE2 is more challenging
  • We will do some "in exam conditions"
  • Talk with the TAs!

Simplified Problem

  • This was a small problem
    • With clear boundaries
  • Even here:
    • We ended up with support programs
      • And corners cut
  • Software engineering is (complex) system engineering
    • On both the product and project sides
    • We use a complex infrastructure!

Challenges

What were the challenges you encountered?

What challenges were inherent to the problem?

What challenges were environmental?

CW1 Marks

  • In Blackboard!
    • Average: 4.44/10 (40%) (same as last year)
    • Max: 8 (same as last year)
      • >70% 4
      • 60%s 4
      • 50%s 5
      • <49% 15
        • Min: 1
  • Not unusual for first assignment!
    • Final coursework aveage tends to be ≈62%

CW1 Feedback

  • Feedback is in Blackboard
  • Feedback is detailed but abstract
There seems to be a miniwc.py: 0.5/0.5 points. 
There seems to be a doctest_miniwc.py and test files: 0.5/0.5 points. 
Prohibited libraries have been used: 0/1 points. Formatting was correct: 1/1 points.
The script passed 35.7% of miniwc simple tests: 2/5 points. 
The script passed 0.0% of miniwc binary tests: 0/1 points. 
The script passed 0.0% of miniwc unicode tests: 0/1 points. 
Penalties: none. 
Total marks: 4.0/10.0

Next

  • We're going to do more wc
    • Job 1 is to fix your miniwc.py (now called wc.py)
      • Try to figure out what went wrong!
    • Job 2 is to add new functionality
      • Flags! Multiple input files
    • Job 3 is to update your tests
      • Fix your tests!
      • Add more tests!
        • Unicode! Binary!
      • Note that Job 3 isn't *temporally last!

Other coursework

  • SE2!
    • You need to read No Silver Bullet
  • SE1!
    • TAs are available to discuss