# Integrity of Input Data

## Integrity of Input Data

The integrity of data means its accuracy and completeness. Data has integrity if it has not been lost or corrupted in any way. (Security of files.)

In business and industry a great deal of time and effort is devoted to making sure that data is accurate and that it is not corrupted.

## CAUSES OF INACCURACIES IN INPUT DATA

The following are common causes of errors in input data.

Mistakes or inaccuracies in collecting the data

Example

A computer is used to control the temperature at which a chemical process takes place. A digital thermometer connected to the computer is faulty and as a result the computer sets the temperature to the wrong value.

Failure to organize the data in the way required by the program

Example

If a program expects names to be entered as "Christian name, Surname" and "JOI·IN SMITH", "BARRY JONES" is entered then the computer will take JOHN SMITH to be the Christian name and BARRY JONES the surname of the same person.

## Hardware errors

Common examples are:

1 Transmission errors-data sent from one device to another is changed due to a hardware failure.

2 Read errors-failure by an input device to read the input medium correctly.

Examples

1 A letter C is transmitted in ASCII code by a telephone line. One bit is changed and the character is read as a letter A. (IN ASCII letter C = 1000011, letter A = 1000001-see Fig 4).

2 Exactly the same type of error could occur if the letter C is encoded in ASCII magnetic tape and there is dirt on the reading head of the magnetic tape unit.

## Errors in preparing data

Common types are:

1- Simple typing errors.

2 Misreading characters on coding sheets.

Examples

1 Transposition- typing digits or letters in the wrong order (for example, 3256 instead of 3526).

2 Misreading a 2 as a Z. Often the Z is crossed to distinguish it from 2-Le. Z Similarly 0 (zero) is often crossed to distinguish it from O (the letter )-i.e. 0. The letter I is given a bar at the top and the bottom to distinguish it from l-i.e. I.

## Organized methods of data collection

if a large amount of data is to be collected for a system then data integrity can be improved by organizing the data collection, for example

1 Pre-printed forms-forms are printed with spaces provided for the data to be filled in (Fig 4).

2 In some situations the data itself can be pre-printed, for example, in bar codes .

## Adding a value to data just as a check

A calculation is done on data which produces a value to add on to it. Then if the data is transmitted or typed, etc., the value can be checked by doing the same calculation again. If the value is different then the data itself must have been corrupted in some way.

Some commonly used examples are:

1 A parity bit. This is a method of ensuring the integrity of bit strings.

The parity of a bit string depends on the number of 1's in it.

A bit string has even parity if the number of l's in it is an even number. It has odd parity if the number of 1'8 in it is an odd number.

A parity bit is a bit added to a bit string to adjust the parity. In most systems which have a parity bit the parity is even.

If a bit string has even parity and one bit gets changed (i.e. a 1 becomes a 0 or a 0 becomes a

1) then the parity becomes odd. A computer can check all the bit strings. If one string has odd parity then there must be an error in it.

2 A check digit. A check digit is an extra digit added to a number to ensure that, if the number is changed by mistake, the error will be detected. Check digits are most commonly added to numbers which it is very important to get right, for example, customer account numbers, international book numbers.

3 Control total. A control total is the sum worked out for a group of records by adding up particular items. The addition is done before and after an operation to check that all of the records are in fact processed. A hash total is a control total for which the sum used does not have any actual meaning-a 'nonsense total'.

## Worked question (parity bit)

Add a parity bit at the left of the following bit strings to give them even parity:

1 1011011.

2 1011010.

1 Parity of 1011011 is 5 and is odd.

Parity bit is 1 to make the parity even.

String with parity bit added = 1101101l.

2 Parity of 1011010 is 4 and is already even.

Parity bit is 0.

String with parity bit added = 01011010.

## Worked question (check digit)

Five-figure account numbers have a digit added to them according to the following rules:

1 Starting from the right, multiply the first digit by 1, the second by 2, the third by 3, etc.

3 Use the last digit of the result as a check digit to add to the end of the number. Find the new value, with the check digit added, of (a) 56037 (b) 50637

Comment on the result.

(a) For 56037 rules (1) and (2) give: 5X5+6x4+0x3+3x2+7x1=62 From rule (3) check digit = 2

New version of the number = 560372

(b) For 50637 rules (1) and (2) give: 5x5+0X4+6x3+3x2+7xl=56

From rule (3) check digit = 6

New version of the number = 506376

Comment: The only difference between the two numbers 56037 and 50637 is that the digits 6 and 0 have been transposed. However this method gives them different check digits. Thus it would be a useful method for detecting transposition errors.

## Example of a control total

For a file of customer accounts the actual account numbers can be added up and used as a check. The sum of the account numbers does not mean anything but, after any operation, the account numbers can be added up again. If the total is the same it can be assumed that all the right records were used.

Note: This is also an example of a hash total.

## Verification

Verification is the checking of data which has been copied from one medium to another to see that it still represents the original data.

## Examples of verification of data

1 When data is encoded onto disc, a keyboard operator reads the data from a source document and types it at a keystation, the data being recorded on disc. This data is then verified by a second operator, who retypes it all. The computer controlling the keystation checks the data stored against the data now being typed and reports differences, so that any errors can be corrected.

2 A magnetic tape cassette can be used to store the contents of part of a computer's main store. If the tape is rewound, the computer can then read what is recorded and verify that the contents have been 'saved' correctly.

Validation

Validation is checking of data before the main processing to see that it is acceptable for the process.

Validation may include checks that the data is:

1 In the right format.

2 Of the right type.

3 Complete.

4 Within the range of possible values.

## Examples of validation checks

1 A type check. A set of numbers which are to be totalled can be checked to make sure that:

(a) All the characters are either decimal digits or decimal points.

(b) There is at most one decimal point per number.

2 A range check. The month of a date of birth can be checked to see that it is a whole number between 1 and 12 inclusive.

## Errors in Calculation

Common causes of calculation errors are as follows.

When the program attempts an operation which is impossible

Examples

1 Division by zero.

2 Attempts to find the square root of a negative number.

Note: Usually, in a high level language, attempting either of these operations would result in an execution error message.

## Overflow

This occurs when the result of a calculation is too large for the store reserved for it. In a digital computer only a certain number of bits are used to store any given number.

## Loss of accuracy

This again is because of the limited amount of store available for each number. Each number may be inaccurate because some of its digits cannot be stored. If arithmetic operations are then performed on these inaccurate numbers the result may be even more inaccurate.

If not all of the digits of a number can be stored the number can be truncated or rounded. If a number is truncated the digits which cannot be stored are simply lost. If a number is rounded the last digits which can be stored are adjusted to make the number as accurate as possible.

## Worked question

In a certain machine only 4 digits of any decimal number can be stored. For the numbers 1 53.647 and 2 65.2145 give the result (a) if the numbers are truncated and (b) if they are rounded.

1 (a) 53.647 truncated to 4 figures=53.64. (b) 53.647 rounded to 4 figures=53.65.

2 (a) 65.2145 truncated = 65.21.

(b) 65.2145 rounded = 65.21.