CSCI 241 - Homework 4: Bits are bits

Due by 11:59.59pm Wednesday, March 17

Introduction

In this lab you will get experience dealing with data on the binary level. You will also have to deal with static local variables and using global variables. In addition, you'll be expected to use conditional compilation via a Makefile, creating header files, etc.

You will work with a partner on this assignment. It is expected that you work together and equally contribute to the development of your solution. Also, you are both responsible for understanding how your solution works. You need only submit one assignment per group, but clearly indicate your partnership in the README and comments for files. You should play with collaborating on github as you are doing this.

The URL for the github repository for this account is https://classroom.github.com/a/A0KZpaxf

Part 1 - frequency analysis

In this part, we will start using arrays to contain frequencies of characters and print a summary of a text's character distribution.

In C, an array is defined as

    int arr[LENGTH];
  
where LENGTH must be a constant. Typically, we will declare something like
    #define LENGTH 256
  
somewhere in our program before we define the array. Note, you cannot use a variable for the array length. It must be something that is evaluated at compile time, not at run-time.

Your assignment is to create a program that has an array of all possible lowercase characters (a-z). Your program will read from stdin, like the programs last week, and count how many times each letter has occured.

You will then print out, in tabular form, the letter, the number of times that it has appeared, and the percentage of all letters that this letter represents. Following this table, print out which is the most frequent and least frequent letter. If there are multiple letters that are most or least frequent, you should print them all out in sequence.

An example run is as follows, using the file "hamlet.txt" located in ~rhoyle/pub/cs241/hw04/hamlet.txt

rhoyle:hw2 rhoyle$ ./freq < ~/Downloads/hamlet.
char        Frequencies         Percentage
a:               9950             7.6459
b:               1830             1.4062
c:               2606             2.0025
d:               5025             3.8614
e:              14960            11.4958
f:               2698             2.0732
g:               2420             1.8596
h:               8731             6.7092
i:               8511             6.5401
j:                110             0.0845
k:               1272             0.9774
l:               5847             4.4930
m:               4253             3.2681
n:               8297             6.3757
o:              11218             8.6203
p:               2016             1.5492
q:                220             0.1691
r:               7777             5.9761
s:               8379             6.4387
t:              11863             9.1159
u:               4343             3.3373
v:               1222             0.9390
w:               3132             2.4067
x:                179             0.1375
y:               3204             2.4621
z:                 72             0.0553
Maximum character(s): E
Minimum character(s): Z
  

Part 2 - converting data to binary and back

In this part, you will be creating 2 programs. encode_bits which will generate the "binary" representation of a file and decode_bits which will take that representation and convert it back to the original format.

encode_bits

Create a program called encode_bits. This program should use getchar() to read in characters one at a time and then call print_bits() (see below) to output that character as a sequence of '1' and '0' characters. It should stop on EOF.

decode_bits

Create a program called decode_bits. This program should use getchar() to read in characters one at a time and then call decode_bits() (see below) to output that sequence of '1' and '0' characters as actual characters. It should stop on EOF.

bits.c & bits.h

Create a file called bits.c that contains the following two functions and a header file bits.h that contains a guard against multiple inclusion, other needed includes, and function prototypes.

Takes the character ch and outputs its value in binary format with all leading zeros and with the MSB first. For example, the letter 'A' has a value of 0x41, and should be output as 01000001. A newline character has a value of 0x0a and should be output as 00001010 .
You should not assume the number of bits that are in a char, instead use the constant CHAR_BIT from limits.h.
void decode_bits(int ch)
If the character is whitespace, skip it. (You might want to use isspace() in ctype.h.)
If the character is a '1' or a '0', you should add it into an output buffer (numerically, shifting the current contents appropriately). Once you've seen CHAR_BIT bits, you should print the corresponding character out.
If the character is neither white space, nor a binary digit, you should print a message to the screen, and then use exit() to quit your program with a non-zero value.

Programming hints for part 1

You'll probably need to use static local variables to handle decode_bits since it only prints out a character every CHAR_BIT calls to it.

Don't forget to make rules in your Makefile include the correct compilation. bits.o should be the dependency for the two other programs, and you should have a separate rule for its compilation.

Part 3 - Number Transformation

For the second part, you will be creating a function to read in a signed integer value and storing the result in a long integer variable. You will also be creating 4 short programs that will use that function to read in integers and output them in one of 4 different formats -- binary, decimal, octal, or hexadecimal.

reading in a number

Create a file called getnum.c and another called getnum.h. In getnum.c you will create the function getnum() that is used by your other programs.

long getnum(void)
Read in a signed integer in one of 4 formats described below and then return the value.
Skip leading whitespace and stop reading in a number on the next occurrence of whitespace.
All formats optionally begin with a - to indicate negative value. Positive values do not have this (i.e., no '+' for positive).
  1. Binary - begins with a leading 0b and then a sequence of 0 and 1 characters
  2. Octal - begins with a leading 0 followed by 1 or more digits from 0-7
  3. Decimal - either a single 0 or a digit from 1-9 followed by zero or more digits from 0-9
  4. Hexadecimal - begins with a leading 0x followed by 1 or more digits from 0-9 or A-F
If the integer read is invalid, somehow signal the caller that the value returned is not valid. I'd suggest you consider a global variable.
If the input is invalid, skip to the next whitespace in the input or EOF.
You might find it useful to "unread" a character. You can do so using ungetc(ch, stdin); where "ch" is the character you just read. Note that you can only un-read one character at a time until you read in a new character.
Building a state diagram like the one below might be useful in visualizing the behavior of this function. state machine

Printing out a number

You will then create 4 short programs that will read in a sequence of number and then output them one per line in a specified format. All 4 will be using sign-magnitude format. (If negative, print out the sign and then the rest as if it were positive -- important for binary.)

In the event that the integer being read is invalid, simply print "ERROR" to the screen.

Loop until no more integers remain.

Programming hints for part 3

Don't forget to add these targets into your Makefile.


The rest of the info

Sample program

I've included my sample solution in ~rhoyle/pub/cs241/hw04/ with binaries that should work on the lab machines.

decode_bits should exactly undo encode_bits. For example, you should get no output from the following:

% ./encode_bits < ./encode_bits | ./decode_bits > output 
% diff -q encode_bits output 

You can also chain together the base transformation if you like

% echo -32767 0xffff 071 0b101 cat | ./tobinary | ./tohex | ./tooctal | ./todecimal 
-32767
65535
57
5
ERROR

Extra Credit

Read in a flag on the command line that specifies the base for the output. It should be one of:

Note, your output, if octal, should be the regular number without the leading 0. The number, if hex, should be the regular number without the leading 0x. You will need to modify both encode_bits and decote_bits for full credit, as you'll need to be able to decode each type of encoding.

README

As with the first project, I want you to create a file called README

  1. Your name (and partner's name) and the date
  2. A list of the programs with a short one-line description of each
  3. A description of how someone should use your version of getnum() including a description of how it signals the validity of the value read.
  4. A list of all remaining compilation problems, warnings, or errors. Note that for full marks, it is expected that you will have corrected all of these things.
  5. A statement about any valgrind errors which occur
  6. An estimate of the amount of time you spent designing, implementing, and deubgging these programs
  7. The honor code statement: I have adhered to the Honor Code in this assignment

Submission

Now you should make clean to get rid of your executables and commit your folder containing your source files, README, and Makefile through git, as you did in last week's assignment. For a refresher, refer to those instructions.

Grading

Here is what I am looking for in this assignment:


Last Modified: Sep 11, 2017 - Roberto Hoyle, with material from Ben Kuperman