Graph Representation and Traversal

Duration: 1 lab session

For this assignment, you can only get full credit if you do not use code from outside sources.

- Have developed a data structure to represent directed graphs using adjacency lists.
- Will have implemented depth-first, breadth-first (optionally both) traversal algorithms.
- Will be able to explain the advantages of adjacency lists over adjacency matrix representation.

- an algorithm which finds the node with the largest out-degree.
- an algorithm which finds the node with the largest in-degree.
- an algorithm which takes a node as input, and finds the node with the largest out-degree which can be reached from that input node.

**Deadlines**: The unextended deadline is the end of your scheduled lab. If you need it, if you attend the lab you will get an automatic
extension to your next COMP26120 lab session (you must use 'submit' to prove you
finished in time and get it marked at the start of your next scheduled
lab).
You can also get an extension for good reason e.g. medical problems.

This lab is intended to give you a gentle introduction to algorithms on graphs. If you are taking COMP26120, the data structure and the algorithm you develop in this lab will be used and extended in the next lab, so it is important that you do this lab on time in order for you to be able to carry on with the later labs.

We will be working with "directed graphs". This means that the edges have a direction. A edge points from a source node to a target node. An adjacency list representation associates with each node a list of the nodes which are targets for the node's edges.

To get you started, we will give you the representation of a
node. This can be found in the file `graph.h`

.

A node is a structure which contains fields for the following: its name (a string), an adjacency list of indices of the nodes it has links pointing to (a linked list of int), its out-degree, which is the length of the adjacency list (an int) and a field for its pagerank score. Not all of these fields are needed for this lab; some will be used in the next lab.

A graph is a collection of nodes. You will set up the data
structure for this. For this lab, the size of graph (the number of
nodes it contains) will be given as the first piece of information
about the graph. Thus, you know when the graph is initialized how much
memory will be required. You will need to produce code to initialize a
graph, and add nodes and add edges. Put this in the file
`graph_functions.c`

.

Produce code to find the following three nodes: the node with the
largest out-degree, the node with the largest in-degree, and the node
with the smallest (non-zero) in-degree. Put this code in
`part1.c`

. This part should be very easy, as you just need
loop through the graph an appropriate number of times. Think about,
and be prepared to answer how these tasks would be different had we
used adjacency matrix representation of the graph. Which would be easy
or run faster. Which would be harder or run slower.

Produce functions to traverse the graph. This should take as input a node index, and search all the nodes reachable from that node. Use this code to answer the following questions:

- Is the node with the largest out-degree reachable from the node with the smallest (non-zero) out-degree?
- If not, what is the largest out-degree of a node which can be reached from the node with the smallest (non-zero) out-degree of all the nodes in the graph?
- Can the entire graph be reached from the node with the smallest (non-zero) out-degree?

For full credit, produce a depth-first search and a breadth-first search traversal. For depth-first search, you can use the recursive algorithm described in the lecture. For breadth-first search, you need to maintain a search queue and a list of explored nodes. To carry out a step of search, you get and remove the first node from the search queue and process it. You then get its adjacency list, and check each node to see if it is in the explored list. If it is not, add it to the back of the queue, and to the explored list. When the search list is empty, the algorithm will have explored all the nodes reachable from the starting node.

In the above description of breadth-first search, the search list is a queue; you remove from the front and add to the back. I.e. it is first-in first-out (FIFO). If you replace the search list by stack in which you remove from the front and add to the front (FILO), the traversal becomes depth-first search. Thus, you can implement depth-first search and breadth-first search by sharing most of the code, and use conditional compilation to determine which parts of the code get compiled. I.e. surround the code which adds to the search list at the back by:

#ifdef BFS ... #endifand the bit which adds to the front by:

#ifdef DFS ... #endifThen put at the start of your file containing the search functions:

#define BFS #undef DFSto get breadth-first search, and

#define DFS #undef BFSto get depth-first search. Put the search code in the file

`graph_search.c`

. Put the main for this part in
`part2.c`

. (We separate them into two files because you
will reuse the graph and search functions in later labs.)
Is it possible to say which is better for this task, breadth-first or depth first search?

In the directory `/opt/info/courses/COMP26120/problems/ex11/data`

you will find files with the suffix .gx. These contain descriptions of
graphs, using a simple format. Here is an example:

MAX 10 NODE 1 one EDGE 1 2 EDGE 1 3 EDGE 1 1 NODE 3 two EDGE 3 5 EDGE 3 1 NODE 5 three EDGE 5 7 EDGE 5 1 NODE 7 four EDGE 7 7 NODE 9 five NODE 2 sixThe first line "MAX 10" tells that the node will have indices 1-10. The first line of these graphical files will always have such a line. Lines of the form "NODE 1 name" defines node number 1 with name "name". Lines of the form "EDGE 3 5" define an edge from node 3 to node 5. So, in the above example, Node 1 has as its adjacency list (1,2,3) since there are edges from 1 to 1, 1 to 2 and 1 to 3. The order of these lines is unimportant except the MAX line must come first. We have given you code which reads in such files, which can be found in

`graph_functions.c`

.
- Modify
`graph.h`

so it contains the definition of data type GRAPH. - Modify
`graph_functions.c`

so the functions`initialize_graph`

,`insert_graph_node`

and`insert_graph_link`

are implemented. If you change the prototypes, be sure to change the calls in`read_graph`

. - Produce
`part1.c`

to run through the graph looking for the nodes with the largest and smallest (non-zero) in-degrees and out-degrees. Run`make part1`

to make this. - Think about the question of how the code and run-times would be different if you used adjacency matrix representation.
- Produce
`graph_search.c`

with the graph traversal functions (depth-first search, breadth-first search or both) and`part2.c`

containing the main which calls this to find the node with the largest outdegree which can be reached from the node with the smallest (non-zero) out-degree, and count the number of nodes reachable from the node with the smallest (non-zero) out-degree. Run`make part2`

to make this. Note that this does not recompile`graph_functions.c`

if this was not changed. - Is performance any different for depth-first and breadth-first search?

The marks are awarded as follows:

Graph representation: 1 mark: completing the representation of the graph 2 marks: code to initialize the graph, insert nodes and edges. Graph search: 1 mark: simple code to find nodes with extremal degrees. 2 marks: depth-first search traversal 2 marks: breadth-first search traversal Understanding: 2 marks: answering questions about efficiency of adjacency list versus adjacency matrix representation, and breadth-first versus depth-first search. Total 10