Writting a C# Program to Read Excel files

Introduction

I am recently given a task to write automation tests for my C++ program. The automation project use Ranorex tool, its library to support the test operations. However, at the core of the project are some simple operations as below:

  1. Read excel file for test data.
  2. Parse the data and invoke the C++ program.
  3. Compare the C++ program result with the test data's expected output.

This is a good opportunity to learn a new language and get my hands dirty on a different technology such as Dot Net Framework. I decide to write some small blogs to describe the technical details for the above 3 basic operations. Ranorex and its framework, tools will not be covered. The blog series will use codes that are built from scratch.

This is the first blog which will focus on the codes to read test data from an excel file.

Requirements

The following are requirements for the program:

  1. Test data are on an excel file.
  2. There are at least 3 sheets, Operation, Input, Output.
  3. Operation sheet lists the functions or APIs to be invoked in the test program, and their parameters on each row. The first row lists the column name, and subsequent rows list the data.
  4. Input sheet lists the input data that the tested program is provided with. Each row is corresponding to the data given to the program when a function or API, identified on the same row number in Operation sheet is invoked.
  5. Output sheet lists the output data that the tested program generates. Each row is corresponding to the data that the tested program generates when a function or API, identified on the same row number in Operation sheet, is invoked.
  6. The program reads the data from the excel file and put the data on appropriate data structures.
  7. The data should be kept as string, and need not be parsed to different data types.
  8. The program should recognise empty string. That is it should recognize to stop reading empty cell without column name.
  9. Sometime, the cell with a column name is not provided on purpose, and the program should take this as empty (null) data.
  10. The program should use only open source library.

C# concepts

I find the following concepts especially useful in an automation test project using C#.

Nullable

  • Nullable: nullable types allow you to represent a value type (such as int, float, bool, etc.) with the ability to assign an additional value of null to it. This is especially useful in writing test cases where the null value logic is to be tested.
  • Value type variable needs to be declared nullable before it can be assigned null value. Likewise, it need to be null-check before assigned a value. Nullable operator is used to declare a variable null.
using System;

public class HelloWorld
{
    public static void Main(string[] args)
    {
        int myNum = 5;
    // error Cannot convert null to 'int' because it is a non-nullable value type
        // myNum = null; 
    // declare a variable as nullable using null-condition operator
        int ? myNullableNum = 10;
        Console.WriteLine($"myNullableNum: {myNullableNum}");
        myNullableNum = null;
        Console.WriteLine($"myNullableNum: {myNullableNum}");
    }
}

The output on the second line is an empty value while the first is 15:

myNullableNum: 10
myNullableNum: 
  • String in C# is a reference type and thus could be null by default. We don't have to make it nullable when we declare a string.
  • Nullable item should be check null before access. For example, the following string should be check null before we call on any function on the string; otherwise, NullReferenceException error occurs
string myString = null;
Console.WriteLine(myString.Length);
// error: System.NullReferenceException: Object reference not set to an instance of an object
  • Reference type data structure are nullable, and should be check null before access. Sample of reference type data structure in C# are class, list, dictionary. Struct is not reference type, and thus is not nullable by default. I have this blog post to discuss about reference in C#
  • null-coalescing operator is used to assign values to a variable if it checks the variable to be null or not null. For example, the below codes evaluate the integer value and print its value. If it is null, then it prints "null values", if it is not null, then it prints the value
int ? a = null;
Console.WriteLine($"Value of num: {a ?? 0}");  //output Value of num: 0
a = 10;
Console.WriteLine($"Value of num: {a ?? 0}");  // output Value of num: 10
  • null-coalescing assignment operator ??= assigns the value of its right-hand operand to its left-hand operand only if the left-hand operand evaluates to null. Otherwise, it keeps the values of the left-hand operand.
string  myString = null;
myString ??=  "null string";
Console.WriteLine(myString);  // Output null string
myString = "freewind";
myString ??=  "null string";
Console.WriteLine(myString);  // Output: freewind

Implementation

Installing EPPlus library:

EPPlus library is a 3rd party open source library that provides API to support reading and manipulating Excel files.

On Visual Studio, choose Tools -> Nuget Package Manager -> Package Manager Console and input the command provide from EPP website and Nuget gallery:

NuGet\Install-Package EPPlus -Version 6.2.4

The C# program could then use the library by

using OfficeOpenXml;

Code

I write a class to implement excel reading.

  • The class is initialized with a provided fileName and sheetName.
  • After reading the sheetName to a data structure List<Dictionary<string,string>> TestCaseDataList.
    • The Dictionary stores the value of each cell, where the key is the column name.
    • The list contains the values in each row.
  • Since we have the first row containing the column names, we need to read the column names first, and use them as key in the Dictionary.
    • The boundary of the table is the last column with column name. We don't have to add the add the empty column to column name list. Null-conditional operator ? is useful for this case.
var columnName = worksheet.Cells[1, column].Value?.ToString();
  • Take notice of the indexes. The row and column has index starting from 1, as it is read that way by EPP. However, the list would have index starting from 0.
  • To handle the case when the cell is empty in the Excel file and add an empty value to the dictionary for the corresponding key, I use null-conditional operator ? and null-coalescing operator ??
rowData[columnName] = cellValue?.ToString() ?? string.Empty;

The following is full code of the C# ExcelReader:

using OfficeOpenXml;
using System;
using System.Collections.Generic;
using System.IO;


namespace ExcelReader
{
    internal class ExcelReader
    {
        string filePath;
        string sheetName;
        List<Dictionary<string, string>> testCaseDataList;

        // allow getting testCaseDataList but not setting. Setting is done in ReadExcelFile
        public List<Dictionary<string, string>> TestCaseDataList { get => testCaseDataList; private set => testCaseDataList = value; }

        public ExcelReader(string filePath, string sheetName)
        {
            this.filePath = filePath;
            this.sheetName=sheetName;
            TestCaseDataList = new List<Dictionary<string, string>>();
            ExcelPackage.LicenseContext = LicenseContext.NonCommercial;
        }

        private ExcelWorksheet FindWorksheetByName(ExcelPackage package, string sheetName)
        {
            foreach (var worksheet in package.Workbook.Worksheets)
            {
                if (worksheet.Name.Equals(sheetName, StringComparison.OrdinalIgnoreCase))
                {
                    return worksheet;
                }
            }

            return null;
        }

        public void readExcelFile()
        {
            using (var package = new ExcelPackage(new FileInfo(filePath)))
            {
                ExcelWorksheet worksheet = FindWorksheetByName(package, this.sheetName);

                if (worksheet != null)
                {
                    int rowCount = worksheet.Dimension.Rows;
                    int columnCount = worksheet.Dimension.Columns;

                    List<string> columnNames = new List<string>();

                    // Get the column names from the first row
                    for (int column = 1; column <= columnCount; column++)
                    {
                        var columnName = worksheet.Cells[1, column].Value?.ToString();
                        columnNames.Add(columnName);
                    }

                    // Read data row by row, then add to testCases
                    for (int row = 2; row <= rowCount; row++)
                    {
                        var rowData = new Dictionary<string, string>();

                        for (int column = 1; column <= columnCount; column++)
                        {
                            var columnName = columnNames[column - 1];
                            var cellValue = worksheet.Cells[row, column].Value;
                            rowData[columnName] = cellValue?.ToString() ?? string.Empty;
                            Console.WriteLine($"Read Data: Column_Name: {columnName} Cell_Value {cellValue}");
                        }
                        TestCaseDataList.Add(rowData);
                    }
                }
                else
                {
                    Console.WriteLine($"Sheet '{sheetName}' not found in the Excel file.");
                }
            }
        }
    }
}

I define a struct that stores a test case. Each test case has 3 Dictionary data structures: Operation, Input and Output.

struct TestCase
{
     public List<Dictionary<string, string>> Operation;
     public List<Dictionary<string, string>> Input;
     public List<Dictionary<string, string>> Output;
}

The main function utilized all the above data structures. The main function invokes ExcelReader to read 3 sheets and pass the information to TestCase data structure.

using System;

namespace ExcelReader
{
    internal class Program
    {
        static void Main(string[] args)
        {
            TestCase testCase = new TestCase();

            ExcelReader reader = new ExcelReader("test.xlsx", "Operation");
            reader.readExcelFile();         
            testCase.Operation =  reader.TestCaseDataList;

            reader = new ExcelReader("test.xlsx", "Input");
            reader.readExcelFile();
            testCase.Input = reader.TestCaseDataList;

            reader = new ExcelReader("test.xlsx", "Output");
            reader.readExcelFile();
            testCase.Output = reader.TestCaseDataList;

            Console.ReadLine();
        }
    }
}

Leave a Reply

Your email address will not be published. Required fields are marked *