Episode 1: The 1992 Presidential Results
Welcome to the first post of my data analysis: 24 years and 8 elections. Ghana has been described as one of the most democratic countries on th continent of Africa. The country has successfully organised 10 elections since its independence. 8 of which were consecutively held from 1992 to 2016.
This post will be about the 1992 elections. This was the 3rd election ever held in the country.Data about the presidential elections were stored on the Electoral Commission’s website and is accessible here. There was no data about the parliamentary elections so we will work with what we currently have.
Load the required libraries and read in the data
# load the needed libraries
library(tabulizer)
library(tidyverse)
# read table as dataframe,
elect_1992 <- extract_tables("1992-presidential-election-results.pdf",output="data.frame")
# check the details about the variable: list has 1 element
class(elect_1992)
## [1] "list"
length(elect_1992)
## [1] 1
str(elect_1992)
## List of 1
## $ :'data.frame': 25 obs. of 14 variables:
## ..$ X1992.PRESIDENTIAL.ELECTIONS.RESULTS: chr [1:25] "" "" "REGION" "Darko" ...
## ..$ X : chr [1:25] "" "NPP" "A. Adu Boahen" "%" ...
## ..$ X.1 : chr [1:25] "" "PNC" "%" "J.J Rawlings" ...
## ..$ X.2 : chr [1:25] "" "NIP" "Dr. Hilla Liman" "%" ...
## ..$ X.3 : chr [1:25] "" "NDC" "%" "Gen. Erskine" ...
## ..$ X.4 : chr [1:25] "" "PHP" "Kwabena" "%" ...
## ..$ X.5 : chr [1:25] "" "" "" "Val.votes" ...
## ..$ X.6 : chr [1:25] "" "" "" "% Turno" ...
## ..$ X.7 : chr [1:25] "" "" "" "tReg. Voters" ...
## ..$ X.8 : chr [1:25] "" "" "" "" ...
## ..$ X.9 : num [1:25] NA NA NA NA 2.4 NA 2.2 NA 1.3 NA ...
## ..$ X.10 : chr [1:25] "" "" "" "" ...
## ..$ X.11 : num [1:25] NA NA NA NA 47.8 NA 47.7 NA 46 NA ...
## ..$ X.12 : chr [1:25] "" "" "" "" ...
# check the class of the element
class(elect_1992[[1]])
## [1] "data.frame"
# extract the first item into a variable
# verify the extracted element is a dataframe
dframe_1992 <- elect_1992[[1]]
class(dframe_1992)
## [1] "data.frame"
# view the first few rows of the dataframe
head(dframe_1992)
## X1992.PRESIDENTIAL.ELECTIONS.RESULTS X X.1
## 1
## 2 NPP PNC
## 3 REGION A. Adu Boahen %
## 4 Darko % J.J Rawlings
## 5 WESTERN 89,800 22.8
## 6
## X.2 X.3 X.4 X.5 X.6 X.7
## 1
## 2 NIP NDC PHP
## 3 Dr. Hilla Liman % Kwabena
## 4 % Gen. Erskine % Val.votes % Turno tReg. Voters
## 5 33,760 8.6 21,924 5.6 239,477 60.7
## 6
## X.8 X.9 X.10 X.11 X.12
## 1 NA NA
## 2 NA NA
## 3 NA NA
## 4 NA NA
## 5 9,325 2.4 394,286 47.8 858,246
## 6 NA NA
Cleaning the column and row names
We will be cleaning the row and column names. First of all the,the beginning 2 rows and the final row in the dataframe will be deleted. The first row is a system generated list of columns and the second is an empty row. The total cum of values in each column is displayed in the last row. I have no need of this data so will remove that as well.
# remove the first 2 and last rows
dframe_1992 <- dframe_1992[-c(1,2, nrow(dframe_1992)),]
# re-assign column headers
# rename columns in dataframe
colnames(dframe_1992) <- c("Region","A.Adu Boahen","NPP%Vote",
"Dr.Hilla Liman","PNC%Vote",
"Kwabena Darko","NIP%Vote",
"J.J Rawlings","NDC%Vote",
"Gen. Erskine","PHP%Vote",
"Valid Votes","Turnout%",
"Tt_Reg_Voters")
# confirm the changes in column header
colnames(dframe_1992)
## [1] "Region" "A.Adu Boahen" "NPP%Vote" "Dr.Hilla Liman"
## [5] "PNC%Vote" "Kwabena Darko" "NIP%Vote" "J.J Rawlings"
## [9] "NDC%Vote" "Gen. Erskine" "PHP%Vote" "Valid Votes"
## [13] "Turnout%" "Tt_Reg_Voters"
head(dframe_1992, n=3)
## Region A.Adu Boahen NPP%Vote Dr.Hilla Liman PNC%Vote
## 3 REGION A. Adu Boahen % Dr. Hilla Liman %
## 4 Darko % J.J Rawlings % Gen. Erskine
## 5 WESTERN 89,800 22.8 33,760 8.6
## Kwabena Darko NIP%Vote J.J Rawlings NDC%Vote Gen. Erskine PHP%Vote
## 3 Kwabena NA
## 4 % Val.votes % Turno tReg. Voters NA
## 5 21,924 5.6 239,477 60.7 9,325 2.4
## Valid Votes Turnout% Tt_Reg_Voters
## 3 NA
## 4 NA
## 5 394,286 47.8 858,246
# delete the next 2 rows as they are a duplication of the column headers
# remove all empty rows with na.omit()
dframe_1992 <- dframe_1992[-c(1:2),]
dframe_1992 <- na.omit(dframe_1992)
# reset the row names to start from 1,...
rownames(dframe_1992) <- seq(length=nrow(dframe_1992))
# view the dimensions (number of rows and columns)
dim(dframe_1992)
## [1] 10 14
Putting the data into a long format
One of the “Principles of tidy data” says observations should be rows and variables be stated as columns. A quick quiz: is the dataframe in a tidy format? 1, 2, 3 secs, 4 and 5. Time is up.
No, it’s not tidy. The columns from A.Adu Boahen
to PHP%Vote
can be placed into 2 columns: Candidate
and PercentVote
. So each candidate has data for the number of valid votes per region and the vote in percent. After many tries, I decided to use 2 functions: dframe_Names and dframe_Percent. ## dframe_Names
dframe_Names <- function(dframe, rowNum){
namesOnly <- dframe[, !grepl("%", colnames(dframe))]
namesOnly <- gather(namesOnly, Candidate, NumVote, -1)
namesOnly <- namesOnly[-c(rowNum:nrow(namesOnly)), ]
return(namesOnly)
}
This function takes 2 arguments: dataframe and row number. The row number is important as we want details about the candidates only. Any other information per region such as valid votes, the percentage of turnout and the total number of registered voters are not needed. - 1st line: Extract columns that do not have ‘%’ in their names. - 2nd line: Collapse, the extracted columns into key-value pairs with the exception of the Region column (-1) - 3rd line: Delete all rows that do not have a canidate’s name.
For line 3, let me expain further. 5 candidates stood for this election and there are 10 regions in this dataset. If we each candidate/region combination per row, then a total of 50 rows are all that we require since the first 5 columns returned from line 1 are names of candidates.This means that rows 51 till the end of dataframe contain values from the columns Valid Vote, Turnout% and Tt_Reg_Voters and we do not want these rows as part of the returned dataframe. - 4th line: pass the dataframe to the variable that call this function
# call the function: pass dframe_1992 and 51 as arguments
names_1992 <- dframe_Names(dframe_1992, 51)
dim(names_1992)
## [1] 50 3
head(names_1992)
## Region Candidate NumVote
## 1 WESTERN A.Adu Boahen 89,800
## 2 CENTRAL A.Adu Boahen 86,683
## 3 GT ACCRA A.Adu Boahen 188,000
## 4 VOLTA A.Adu Boahen 17,295
## 5 EASTERN A.Adu Boahen 190,327
## 6 ASHANTI A.Adu Boahen 431,380
dframe_Percent
The function returns the party name and the votes as a percentage for each candidate. This function follows in the footsteps of dframe_Names. The major difference is: - Region column was added as the returned columns from grepl
does not include the names of the regions. - Column Turnout% was removed. - PartyName was also added created the PartyName column. The function takes 2 arguments: a dataframe and the row number to delete from.
dframe_Percent <- function(dframe, rowNum){
percentOnly <- dframe[,grepl("%",colnames(dframe))]
# add the Regions
percentOnly <- percentOnly %>%
mutate(Region = c("WESTERN","CENTRAL","GT ACCRA","VOLTA",
"EASTERN","ASHANTI","B/AHAFO","NORTHERN",
"UPPER EAST","UPPER WEST"),
`Turnout%` = NULL)
# convert from wide to long format
# separate to get party name
percentLong <- percentOnly %>%
gather(Details, PercentVote) %>%
separate(Details, c("PartyName","Vote"),"%")
percentLong <- percentLong[-c(rowNum:nrow(percentLong)),]
return(percentLong)
}
Just as with the dframe_Names function we pass a dataframe and the row number to delete the extra rows from. We pass 51 as the data from the 51st to the 60th row doesn’t contain data about any candidate.
Don’t worry if a warning message about missing pieces is shown. The Region values do not have a % character so R fills it with NA and informs you about it.
percent_1992 <- dframe_Percent(dframe_1992, 51)
## Warning: Expected 2 pieces. Missing pieces filled with `NA` in 10 rows [51,
## 52, 53, 54, 55, 56, 57, 58, 59, 60].
dim(percent_1992)
## [1] 50 3
head(percent_1992)
## PartyName Vote PercentVote
## 1 NPP Vote 22.8
## 2 NPP Vote 26
## 3 NPP Vote 37
## 4 NPP Vote 3.6
## 5 NPP Vote 37.7
## 6 NPP Vote 60.5
Combining dataframes
Final step is to combine both dataframes.Also, we need to delete the 5th column as it is a repetition of the word Vote
full_1992 <- cbind(names_1992, percent_1992)
# delete 5th column
full_1992 <- full_1992[, -5]
full_1992
## Region Candidate NumVote PartyName PercentVote
## 1 WESTERN A.Adu Boahen 89,800 NPP 22.8
## 2 CENTRAL A.Adu Boahen 86,683 NPP 26
## 3 GT ACCRA A.Adu Boahen 188,000 NPP 37
## 4 VOLTA A.Adu Boahen 17,295 NPP 3.6
## 5 EASTERN A.Adu Boahen 190,327 NPP 37.7
## 6 ASHANTI A.Adu Boahen 431,380 NPP 60.5
## 7 B/AHAFO A.Adu Boahen 116,041 NPP 29.5
## 8 NORTHERN A.Adu Boahen 52,539 NPP 16.3
## 9 UPPER EAST A.Adu Boahen 21,164 NPP 10.5
## 10 UPPER WEST A.Adu Boahen 11,535 NPP 8.9
## 11 WESTERN Dr.Hilla Liman 33,760 PNC 8.6
## 12 CENTRAL Dr.Hilla Liman 6,308 PNC 1.9
## 13 GT ACCRA Dr.Hilla Liman 22,027 PNC 403
## 14 VOLTA Dr.Hilla Liman 7,431 PNC 1.6
## 15 EASTERN Dr.Hilla Liman 9,747 PNC 1.9
## 16 ASHANTI Dr.Hilla Liman 17,620 PNC 2.5
## 17 B/AHAFO Dr.Hilla Liman 20,646 PNC 5.3
## 18 NORTHERN Dr.Hilla Liman 35,452 PNC 11
## 19 UPPER EAST Dr.Hilla Liman 65,644 PNC 32.5
## 20 UPPER WEST Dr.Hilla Liman 48,075 PNC 37.1
## 21 WESTERN Kwabena Darko 21,924 NIP 5.6
## 22 CENTRAL Kwabena Darko 11,631 NIP 3.5
## 23 GT ACCRA Kwabena Darko 20,731 NIP 4.1
## 24 VOLTA Kwabena Darko 3,534 NIP 0.7
## 25 EASTERN Kwabena Darko 11,730 NIP 2.3
## 26 ASHANTI Kwabena Darko 25,298 NIP 3.6
## 27 B/AHAFO Kwabena Darko 8,979 NIP 2.3
## 28 NORTHERN Kwabena Darko 4,682 NIP 1.5
## 29 UPPER EAST Kwabena Darko 2,791 NIP 1.4
## 30 UPPER WEST Kwabena Darko 2,329 NIP 1.8
## 31 WESTERN J.J Rawlings 239,477 NDC 60.7
## 32 CENTRAL J.J Rawlings 222,097 NDC 66.5
## 33 GT ACCRA J.J Rawlings 270,825 NDC 53.4
## 34 VOLTA J.J Rawlings 446,365 NDC 93.2
## 35 EASTERN J.J Rawlings 288,726 NDC 57.3
## 36 ASHANTI J.J Rawlings 234,237 NDC 32.9
## 37 B/AHAFO J.J Rawlings 243,361 NDC 61.9
## 38 NORTHERN J.J Rawlings 203,004 NDC 63
## 39 UPPER EAST J.J Rawlings 108,999 NDC 54
## 40 UPPER WEST J.J Rawlings 66,049 NDC 51
## 41 WESTERN Gen. Erskine 9,325 PHP 2.4
## 42 CENTRAL Gen. Erskine 7,312 PHP 2.2
## 43 GT ACCRA Gen. Erskine 5,861 PHP 1.3
## 44 VOLTA Gen. Erskine 4,105 PHP 0.9
## 45 EASTERN Gen. Erskine 3,663 PHP 0.7
## 46 ASHANTI Gen. Erskine 4,049 PHP 0.57
## 47 B/AHAFO Gen. Erskine 3,837 PHP 1
## 48 NORTHERN Gen. Erskine 26,715 PHP 8.3
## 49 UPPER EAST Gen. Erskine 3,348 PHP 1.7
## 50 UPPER WEST Gen. Erskine 1,612 PHP 1.2
full_1992[,1] <- as.factor(full_1992[,1])
full_1992[,5] <- as.numeric(full_1992[,5])
dim(full_1992)
## [1] 50 5
head(full_1992)
## Region Candidate NumVote PartyName PercentVote
## 1 WESTERN A.Adu Boahen 89,800 NPP 22.8
## 2 CENTRAL A.Adu Boahen 86,683 NPP 26.0
## 3 GT ACCRA A.Adu Boahen 188,000 NPP 37.0
## 4 VOLTA A.Adu Boahen 17,295 NPP 3.6
## 5 EASTERN A.Adu Boahen 190,327 NPP 37.7
## 6 ASHANTI A.Adu Boahen 431,380 NPP 60.5
Quick sanity check
A check through our dataframe shows that the Percentage vote for Dr. Hilla Liman for the Greater Accra Region was supposed to be 4.03 instead of 403%
full_1992[13,5] <- 4.03
I am glad you made it this far. The dataset is available here. Thank you and do well to check the next episode.
NB: In other articles J.J Rawlings represented Progressive Alliance. It was a coalition made up of the political parties: National Democrati Congress, National Covention Party and Every Ghanaian Living Everywhere.