Project Proposal
We are interested to know how the salary of faculty working at ODU varies across each department. We also want to know the answers for some of the questions that are listed below:
- What is the median salary for each department by position? How does it compare with that of the median salary for each department in other universities in U.S.A?
- Geographic diversity of the faculty working at ODU. From which state/country did they obtain their degree from?
- How does the faculty salary compare with that of the industry salary, both holding the same degree?
- Which department has the highest median salary?
Ruby Script to extract data from ODU directory
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
require 'mechanize'
require 'csv'
salary_data = CSV.read('salary_data.csv')
universities = CSV.read('postscndryunivsrvy2013dirinfo12.csv', encoding: "iso-8859-1:UTF-8", :headers => true)
states = CSV.read('state_table.csv', encoding: "iso-8859-1:UTF-8", :headers => true)
teachers = []
salary_data.each do |sd|
teacher = {
first_name: "",
last_name: "",
position: "",
salary: "",
department: "",
university: "",
st_abbr: "",
state: "",
year: "",
major: "",
degree: ""
}
puts "-"*25
mechanize = Mechanize.new
first_name = sd[1]
last_name = sd[0]
teacher[:first_name] = first_name
teacher[:last_name] = last_name
teacher[:position] = sd[2]
teacher[:salary] = sd[3]
page = mechanize.get('https://www.odu.edu/directory?F_NAME='+first_name+'&L_NAME='+last_name+'&SEARCH_IND=E')
# puts page.inspect
link = ""
page.search("table.bordertable tr:nth-child(3) td:first a").each do |a|
link = a['href']
# puts a['href']
puts "Name - " + a.text.strip
end
page = mechanize.get('https://www.odu.edu'+link)
# puts page.inspect
img_url = ""
page.search("section.alpha ul.left_column li:nth-child(1) img").each do |img|
img_url = img['src']
puts "Image Url - " + img_url
end
department = ""
if img_url == ""
page.search("section.alpha ul.left_column li:nth-child(3)").each do |a|
department = a.text.strip
puts "Department - " + department
end
else
page.search("section.alpha ul.left_column li:nth-child(4)").each do |a|
department = a.text.strip
puts "Department - " + department
end
end
teacher[:department] = department
education = []
page.search(".tab-content ul.ul_in_tab li.fas_education dl").each do |a|
edu = {}
university, year, major, degree = ""
a.search('dt').each do |a|
university, year = a.text.strip.split(',')
end
a.search('dd:first').each do |a|
a.search('strong').remove
major = a.text
end
a.search('dd:last').each do |a|
a.search('strong').remove
degree = a.text
end
edu = {
'university' => (university || "").strip,
'year' => (year || "").strip,
'major' => (major || "").strip,
'degree' => (degree || "").strip,
}
education << edu
end
puts "education"
puts education.first.inspect
if education.any?
teacher[:university] = education.first['university']
uni = universities.select{ |u| u['INSTNM'] == education.first['university'] }
puts uni.inspect
if uni.any?
teacher[:st_abbr] = uni.first['STABBR']
end
state = states.select{ |u| u['abbreviation'] == teacher[:st_abbr] }
puts state.inspect
if state.any?
teacher[:state] = state.first['name']
end
teacher[:year] = education.first['year']
teacher[:major] = education.first['major']
teacher[:degree] = education.first['degree']
end
teachers << teacher
puts "#{teacher[:first_name]} #{teacher[:last_name]}- Done"
end
puts teachers.inspect
CSV.open('processed_salary_data.csv', 'w') do |csv_object|
csv_object << teachers.first.keys
teachers.each do |row_array|
csv_object << row_array.values
end
end
What-Why-How framework
- What : Data – Tables with Items & Attributes. Here the attributes will be used as filters and items are the values for each attribute.
- The attribute types were mostly categorical and quantitative.
- The entire dataset was a static file.
- What : Derived – Combination of original attributes to form new attributes. For example combining all Masters and Doctorate degrees into just two different attributes.
- Why : Tasks – Cross-attribute comparison, for example different colleges and departments. Find trends within the attributes or derived attributes.
- How : Encode – Histograms, Line charts, Choropleth maps.
- How : Reduce – Dynamic filtering and aggregation.
- How : Manipulate – Navigate with zoom in and zoom out.
- How : Facet – Multiple juxtaposed views with linked highlighting.
Planned milestone check-in dates
March 27th & April 27th
List of data sources
- http://www.richmond.com/data-center/salaries-virginia-state-employees-2013/
- http://www1.salary.com/Edu-Govt-and-Nonprofit-Industry-Education-Salaries.html
- https://www.odu.edu/directory
- http://www.payscale.com/
- Mace & Crown
- http://chronicle.com/article/2013-14-AAUP-Faculty-Salary/145679
Links to tool and technologies used
- http://d3js.org/
- http://www.tableau.com/
Link to the Visualization
Authors and Contributors
Prasanna Sajjan
Avinash Gosavi