Welcome to the first post of my data analysis: 24 years and 8 elections. Ghana has been described as one of the most democratic countries on th continent of Africa. The country has successfully organised 10 elections since its independence. 8 of which were consecutively held from 1992 to 2016.

This post will be about the 1992 elections. This was the 3rd election ever held in the country.Data about the presidential elections were stored on the Electoral Commission’s website and is accessible here. There was no data about the parliamentary elections so we will work with what we currently have.

Load the required libraries and read in the data

# load the needed libraries
library(tabulizer)
library(tidyverse)

# read table as dataframe,
elect_1992 <- extract_tables("1992-presidential-election-results.pdf",output="data.frame")

# check the details about the variable: list has 1 element
class(elect_1992)
## [1] "list"
length(elect_1992)
## [1] 1
str(elect_1992)
## List of 1
##  $ :'data.frame':    25 obs. of  14 variables:
##   ..$ X1992.PRESIDENTIAL.ELECTIONS.RESULTS: chr [1:25] "" "" "REGION" "Darko" ...
##   ..$ X                                   : chr [1:25] "" "NPP" "A. Adu Boahen" "%" ...
##   ..$ X.1                                 : chr [1:25] "" "PNC" "%" "J.J Rawlings" ...
##   ..$ X.2                                 : chr [1:25] "" "NIP" "Dr. Hilla Liman" "%" ...
##   ..$ X.3                                 : chr [1:25] "" "NDC" "%" "Gen. Erskine" ...
##   ..$ X.4                                 : chr [1:25] "" "PHP" "Kwabena" "%" ...
##   ..$ X.5                                 : chr [1:25] "" "" "" "Val.votes" ...
##   ..$ X.6                                 : chr [1:25] "" "" "" "% Turno" ...
##   ..$ X.7                                 : chr [1:25] "" "" "" "tReg. Voters" ...
##   ..$ X.8                                 : chr [1:25] "" "" "" "" ...
##   ..$ X.9                                 : num [1:25] NA NA NA NA 2.4 NA 2.2 NA 1.3 NA ...
##   ..$ X.10                                : chr [1:25] "" "" "" "" ...
##   ..$ X.11                                : num [1:25] NA NA NA NA 47.8 NA 47.7 NA 46 NA ...
##   ..$ X.12                                : chr [1:25] "" "" "" "" ...
# check the class of the element
class(elect_1992[[1]])
## [1] "data.frame"
# extract the first item into a variable
# verify the extracted element is a dataframe
dframe_1992 <- elect_1992[[1]]
class(dframe_1992)
## [1] "data.frame"
# view the first few rows of the dataframe
head(dframe_1992)
##   X1992.PRESIDENTIAL.ELECTIONS.RESULTS             X          X.1
## 1                                                                
## 2                                                NPP          PNC
## 3                               REGION A. Adu Boahen            %
## 4                                Darko             % J.J Rawlings
## 5                              WESTERN        89,800         22.8
## 6                                                                
##               X.2          X.3     X.4       X.5     X.6          X.7
## 1                                                                    
## 2             NIP          NDC     PHP                               
## 3 Dr. Hilla Liman            % Kwabena                               
## 4               % Gen. Erskine       % Val.votes % Turno tReg. Voters
## 5          33,760          8.6  21,924       5.6 239,477         60.7
## 6                                                                    
##     X.8 X.9    X.10 X.11    X.12
## 1        NA           NA        
## 2        NA           NA        
## 3        NA           NA        
## 4        NA           NA        
## 5 9,325 2.4 394,286 47.8 858,246
## 6        NA           NA

Cleaning the column and row names

We will be cleaning the row and column names. First of all the,the beginning 2 rows and the final row in the dataframe will be deleted. The first row is a system generated list of columns and the second is an empty row. The total cum of values in each column is displayed in the last row. I have no need of this data so will remove that as well.

# remove the first 2 and last rows
dframe_1992 <- dframe_1992[-c(1,2, nrow(dframe_1992)),]

# re-assign column headers
# rename columns in dataframe
colnames(dframe_1992) <- c("Region","A.Adu Boahen","NPP%Vote",
                           "Dr.Hilla Liman","PNC%Vote",
                           "Kwabena Darko","NIP%Vote",
                           "J.J Rawlings","NDC%Vote",
                           "Gen. Erskine","PHP%Vote",
                           "Valid Votes","Turnout%",
                           "Tt_Reg_Voters")

# confirm the changes in column header
colnames(dframe_1992)
##  [1] "Region"         "A.Adu Boahen"   "NPP%Vote"       "Dr.Hilla Liman"
##  [5] "PNC%Vote"       "Kwabena Darko"  "NIP%Vote"       "J.J Rawlings"  
##  [9] "NDC%Vote"       "Gen. Erskine"   "PHP%Vote"       "Valid Votes"   
## [13] "Turnout%"       "Tt_Reg_Voters"
head(dframe_1992, n=3)
##    Region  A.Adu Boahen     NPP%Vote  Dr.Hilla Liman     PNC%Vote
## 3  REGION A. Adu Boahen            % Dr. Hilla Liman            %
## 4   Darko             % J.J Rawlings               % Gen. Erskine
## 5 WESTERN        89,800         22.8          33,760          8.6
##   Kwabena Darko  NIP%Vote J.J Rawlings     NDC%Vote Gen. Erskine PHP%Vote
## 3       Kwabena                                                        NA
## 4             % Val.votes      % Turno tReg. Voters                    NA
## 5        21,924       5.6      239,477         60.7        9,325      2.4
##   Valid Votes Turnout% Tt_Reg_Voters
## 3                   NA              
## 4                   NA              
## 5     394,286     47.8       858,246
# delete the next 2 rows as they are a duplication of the column headers
# remove all empty rows with na.omit()
dframe_1992 <- dframe_1992[-c(1:2),]
dframe_1992 <- na.omit(dframe_1992)

# reset the row names to start from 1,...
rownames(dframe_1992) <- seq(length=nrow(dframe_1992))

# view the dimensions (number of rows and columns)
dim(dframe_1992)
## [1] 10 14

Putting the data into a long format

One of the “Principles of tidy data” says observations should be rows and variables be stated as columns. A quick quiz: is the dataframe in a tidy format? 1, 2, 3 secs, 4 and 5. Time is up.

No, it’s not tidy. The columns from A.Adu Boahen to PHP%Vote can be placed into 2 columns: Candidate and PercentVote. So each candidate has data for the number of valid votes per region and the vote in percent. After many tries, I decided to use 2 functions: dframe_Names and dframe_Percent. ## dframe_Names

dframe_Names <- function(dframe, rowNum){
    namesOnly <- dframe[, !grepl("%", colnames(dframe))]
    namesOnly <- gather(namesOnly, Candidate, NumVote, -1)
    namesOnly <- namesOnly[-c(rowNum:nrow(namesOnly)), ]
    
    return(namesOnly)
}

This function takes 2 arguments: dataframe and row number. The row number is important as we want details about the candidates only. Any other information per region such as valid votes, the percentage of turnout and the total number of registered voters are not needed. - 1st line: Extract columns that do not have ‘%’ in their names. - 2nd line: Collapse, the extracted columns into key-value pairs with the exception of the Region column (-1) - 3rd line: Delete all rows that do not have a canidate’s name.

For line 3, let me expain further. 5 candidates stood for this election and there are 10 regions in this dataset. If we each candidate/region combination per row, then a total of 50 rows are all that we require since the first 5 columns returned from line 1 are names of candidates.This means that rows 51 till the end of dataframe contain values from the columns Valid Vote, Turnout% and Tt_Reg_Voters and we do not want these rows as part of the returned dataframe. - 4th line: pass the dataframe to the variable that call this function

# call the function: pass dframe_1992 and 51 as arguments
names_1992 <- dframe_Names(dframe_1992, 51)
dim(names_1992)
## [1] 50  3
head(names_1992)
##     Region    Candidate NumVote
## 1  WESTERN A.Adu Boahen  89,800
## 2  CENTRAL A.Adu Boahen  86,683
## 3 GT ACCRA A.Adu Boahen 188,000
## 4    VOLTA A.Adu Boahen  17,295
## 5  EASTERN A.Adu Boahen 190,327
## 6  ASHANTI A.Adu Boahen 431,380

dframe_Percent

The function returns the party name and the votes as a percentage for each candidate. This function follows in the footsteps of dframe_Names. The major difference is: - Region column was added as the returned columns from grepl does not include the names of the regions. - Column Turnout% was removed. - PartyName was also added created the PartyName column. The function takes 2 arguments: a dataframe and the row number to delete from.

dframe_Percent <- function(dframe, rowNum){
    percentOnly <- dframe[,grepl("%",colnames(dframe))]
    # add the Regions
    percentOnly <- percentOnly %>%
        mutate(Region = c("WESTERN","CENTRAL","GT ACCRA","VOLTA",
                          "EASTERN","ASHANTI","B/AHAFO","NORTHERN",
                          "UPPER EAST","UPPER WEST"),
               `Turnout%` = NULL)

    # convert from wide to long format
    # separate to get party name
    percentLong <- percentOnly %>%
        gather(Details, PercentVote) %>%
        separate(Details, c("PartyName","Vote"),"%")

    percentLong <- percentLong[-c(rowNum:nrow(percentLong)),]
    return(percentLong)
}

Just as with the dframe_Names function we pass a dataframe and the row number to delete the extra rows from. We pass 51 as the data from the 51st to the 60th row doesn’t contain data about any candidate.

Don’t worry if a warning message about missing pieces is shown. The Region values do not have a % character so R fills it with NA and informs you about it.

percent_1992 <- dframe_Percent(dframe_1992, 51)
## Warning: Expected 2 pieces. Missing pieces filled with `NA` in 10 rows [51,
## 52, 53, 54, 55, 56, 57, 58, 59, 60].
dim(percent_1992)
## [1] 50  3
head(percent_1992)
##   PartyName Vote PercentVote
## 1       NPP Vote        22.8
## 2       NPP Vote          26
## 3       NPP Vote          37
## 4       NPP Vote         3.6
## 5       NPP Vote        37.7
## 6       NPP Vote        60.5

Combining dataframes

Final step is to combine both dataframes.Also, we need to delete the 5th column as it is a repetition of the word Vote

full_1992 <- cbind(names_1992, percent_1992)
# delete 5th column
full_1992 <- full_1992[, -5]

full_1992
##        Region      Candidate NumVote PartyName PercentVote
## 1     WESTERN   A.Adu Boahen  89,800       NPP        22.8
## 2     CENTRAL   A.Adu Boahen  86,683       NPP          26
## 3    GT ACCRA   A.Adu Boahen 188,000       NPP          37
## 4       VOLTA   A.Adu Boahen  17,295       NPP         3.6
## 5     EASTERN   A.Adu Boahen 190,327       NPP        37.7
## 6     ASHANTI   A.Adu Boahen 431,380       NPP        60.5
## 7     B/AHAFO   A.Adu Boahen 116,041       NPP        29.5
## 8    NORTHERN   A.Adu Boahen  52,539       NPP        16.3
## 9  UPPER EAST   A.Adu Boahen  21,164       NPP        10.5
## 10 UPPER WEST   A.Adu Boahen  11,535       NPP         8.9
## 11    WESTERN Dr.Hilla Liman  33,760       PNC         8.6
## 12    CENTRAL Dr.Hilla Liman   6,308       PNC         1.9
## 13   GT ACCRA Dr.Hilla Liman  22,027       PNC         403
## 14      VOLTA Dr.Hilla Liman   7,431       PNC         1.6
## 15    EASTERN Dr.Hilla Liman   9,747       PNC         1.9
## 16    ASHANTI Dr.Hilla Liman  17,620       PNC         2.5
## 17    B/AHAFO Dr.Hilla Liman  20,646       PNC         5.3
## 18   NORTHERN Dr.Hilla Liman  35,452       PNC          11
## 19 UPPER EAST Dr.Hilla Liman  65,644       PNC        32.5
## 20 UPPER WEST Dr.Hilla Liman  48,075       PNC        37.1
## 21    WESTERN  Kwabena Darko  21,924       NIP         5.6
## 22    CENTRAL  Kwabena Darko  11,631       NIP         3.5
## 23   GT ACCRA  Kwabena Darko  20,731       NIP         4.1
## 24      VOLTA  Kwabena Darko   3,534       NIP         0.7
## 25    EASTERN  Kwabena Darko  11,730       NIP         2.3
## 26    ASHANTI  Kwabena Darko  25,298       NIP         3.6
## 27    B/AHAFO  Kwabena Darko   8,979       NIP         2.3
## 28   NORTHERN  Kwabena Darko   4,682       NIP         1.5
## 29 UPPER EAST  Kwabena Darko   2,791       NIP         1.4
## 30 UPPER WEST  Kwabena Darko   2,329       NIP         1.8
## 31    WESTERN   J.J Rawlings 239,477       NDC        60.7
## 32    CENTRAL   J.J Rawlings 222,097       NDC        66.5
## 33   GT ACCRA   J.J Rawlings 270,825       NDC        53.4
## 34      VOLTA   J.J Rawlings 446,365       NDC        93.2
## 35    EASTERN   J.J Rawlings 288,726       NDC        57.3
## 36    ASHANTI   J.J Rawlings 234,237       NDC        32.9
## 37    B/AHAFO   J.J Rawlings 243,361       NDC        61.9
## 38   NORTHERN   J.J Rawlings 203,004       NDC          63
## 39 UPPER EAST   J.J Rawlings 108,999       NDC          54
## 40 UPPER WEST   J.J Rawlings  66,049       NDC          51
## 41    WESTERN   Gen. Erskine   9,325       PHP         2.4
## 42    CENTRAL   Gen. Erskine   7,312       PHP         2.2
## 43   GT ACCRA   Gen. Erskine   5,861       PHP         1.3
## 44      VOLTA   Gen. Erskine   4,105       PHP         0.9
## 45    EASTERN   Gen. Erskine   3,663       PHP         0.7
## 46    ASHANTI   Gen. Erskine   4,049       PHP        0.57
## 47    B/AHAFO   Gen. Erskine   3,837       PHP           1
## 48   NORTHERN   Gen. Erskine  26,715       PHP         8.3
## 49 UPPER EAST   Gen. Erskine   3,348       PHP         1.7
## 50 UPPER WEST   Gen. Erskine   1,612       PHP         1.2
full_1992[,1] <- as.factor(full_1992[,1])
full_1992[,5] <- as.numeric(full_1992[,5])

dim(full_1992)
## [1] 50  5
head(full_1992)
##     Region    Candidate NumVote PartyName PercentVote
## 1  WESTERN A.Adu Boahen  89,800       NPP        22.8
## 2  CENTRAL A.Adu Boahen  86,683       NPP        26.0
## 3 GT ACCRA A.Adu Boahen 188,000       NPP        37.0
## 4    VOLTA A.Adu Boahen  17,295       NPP         3.6
## 5  EASTERN A.Adu Boahen 190,327       NPP        37.7
## 6  ASHANTI A.Adu Boahen 431,380       NPP        60.5

Quick sanity check

A check through our dataframe shows that the Percentage vote for Dr. Hilla Liman for the Greater Accra Region was supposed to be 4.03 instead of 403%

full_1992[13,5] <- 4.03

I am glad you made it this far. The dataset is available here. Thank you and do well to check the next episode.

NB: In other articles J.J Rawlings represented Progressive Alliance. It was a coalition made up of the political parties: National Democrati Congress, National Covention Party and Every Ghanaian Living Everywhere.