Visualizing the History of Philosophy as a social network: The Problem with Hegel

2012-09-20 19:47:38 · data visualization, gephi, graph theory, networks, r-bloggers, social network analysis

Introduction

This is Part I of a series.  Part II is available here, and has an updated graphic.

How Important is Hegel?!

I was surprised I hadn't seen this graphic at Drunks and Lampposts made with Gephi until a friend posted it on facebook last week.  The  original is here, and here's my version:

 

Graph History of Philosophy

Using a scrape of the data behind wikipedia's sidebar for philosophers, Simon Rapier put together a fantastic visualization of the schools and interconnections among philosophers.  Griffsgraphs followed up by expanding the scrape to the entire network of influencers and influenced on wikipedia.  Both of these are insightful humanities studies in graphs and visualization---even though the algorithm wasn't told which common ideas link Hegel and Marx, it saw that they were similar enough to be grouped together (shown by making them the same color), and that the way Hegel influenced, say, Husserl, was different enough to warrant another school, simply by observing a different group of people followed them.

That's a solid aggregation of a lot of humanities information.  Who knew Skynet's tweed jacket had patches on the elbows?

However, looking at the original graphs on D&L and Griffs, I was struck that Hegel seems far too influential in the domain of philosophers---if you've ever taken the singular continental philosophy course offered by your local analytic university department, you'll know what I mean, that that simply isn't his status in the field.  Aufhebung and The End of History are great concepts and were powerful historical influences, but do they really warrant being judged the single most influential philosopher of all time, even when the sample also includes the shoulders of the philosophers he stood on to think through those ideas (ie, Socratic dialectic)--and the after-traditions that cast him aside?  My money would have been on Plato, but this graphic suggests that in fact, even now, Hegel is king.

So, to explore why this was the case, I replicated this graph based on Simon's excellent instructions.  I wrote it up in an ugly little R script which you're welcome to pull yourself to take care of the copying and Excel-munging.  My overarching question here: what is causing this influence, and what alternate measures of authority show Hegel's influence more in line with these other network effects?

What I Really Want to do is Direct

My first guess was that changing this from an undirected to directed graph would make all the difference.  That is, knowing that de Morgan and Wittgenstein are connected is less informative than knowing that de Morgan influenced Wittgenstein directionally---and we would see that changing from undirected to directed would mean that de Morgan would become a larger node and Wittgenstein a smaller one.  A lot of this, quite frankly, is simply going to reflect time---you'd be hard pressed to influence Socrates today.  While temporally unfair, it reflects something additional about the state of the world and might solve our super-Hegel problem.  So, when I was about to make the switch, I anticipated that if we added arrows to the lines, we might see something very dramatic, perhaps as large as seeing all of Western thought divide itself between Plato and Aristotle, with everyone else just peripheral.

What happened when I ran it?

Undirected

Undirected philosophers

Directed

Directed

I don't see much of a difference, so I would say not much has changed.  Unfortunately, making these changes did not cause a real reduction in the super-Hegel problem. 

Two technical notes:

Man is the measure of... Man?

Second, I assumed Simon used gephi's default Authority measure for node sizing rather than PageRank.  PageRank is the core of google's algorithm to decide who's important on the internet.  It's a recursive measure that asks roughly whether more important people are referring to you and how much, which is maybe more in line with our intuition of what "influence" means.  It turns out that the algorithm gephi uses for measuring Authority has a known bug, and only uses in-bound links. A-ha, this had to be the answer!

(Directed) Authority

Authority

(Directed) PageRank

PageRank

Switching from Authority to PageRank did help some, but did not eliminate the super-Hegel problem.  Aristotle and Descartes won some points, as did a few pre-Socratic philosophers like Democritus and Heraclitus.

Still, it did not give much more weight to my Aristotle v Plato hypothesis.  Hegel is still number 1.

Making it even better

The guesses above were low-hanging fruit.  Anything more would require a deep dive into the data and a fair amount of time.  Here are my thoughts on how to do that better if you were interested in refining this to be a better resource:

  1. Data Ontology: There seem to be some issues with the ontology in the database behind the scenes.  So, for example, Aristotle apparently had so many people catalogued that rather than list them, there is an entire page dedicated to "List of writers influenced by Aristotle."  They don't show up individually in the query.  I went onto this page and added them by hand, which helped some, but I suspect there are similar issues elsewhere with other major figures. 
     
  2. Data Reliability: Several abstract nouns were listed as philosophers.  Among them were:

    Western Philosophy
    Philosophy
    Naturalism (philosophy)
    Chinese Philosophy
    Philosophy of Science
    Philosophy of Artificial Intelligence
    Platonism in the Renaissance
    Indonesian Ulema Council
    May 1968 in France
    Solidarity (Polish Trade Union)
    Unity of the Brethren
    World Trade Organization Ministerial Conference of 1999 protest activity

    While these entities convey aggregate influence, they're inconsistently applied, which muddies the graph.  A better data munger than me would de-aggreggate these down to the philosophers influenced by or contributing to them.  This may be possible with something as simple as improving the query to return only individuals, or perhaps a wikipedia guy fixing the ontology in the database.
     
  3. Sample Accuracy: There may be some sample issues from wikipedians' bias: Given that the sample pulled in was not limited to conventional philosophers, I wonder also if the general level of coverage of continental voices on wikipedia is just plain higher than analytical or ancients, just a bias existing from the interests of the participants.  As a point of comparison, if you read any wikipedia articles on the discipline of economics, you would probably arrive at the conclusion that "Austrian School Economics" was on-par in the field with neo-Keynesian or neo-Classical schools, although it's a non-contender in economics departments.
     
  4. Sample Precision: Examining the query Simon developed, the set being drawn here is not exclusively philosophers.  It rather contains people who have been influenced by philosophers---be they artists, musicians, writers, whatever.  This explains a big portion of the post-Hegelian continental slant.  Interestingly, it also pulls up a few political figues who had positions like administrative heads of propaganda in former soviet states.  Yes, they were influenced by Marx (and Hegel), but arguably not philosophers.  Whether the domain of philosophy is or ought to be restricted purely to professionals doing academic philosophy is a fair argument---it's just not the lens of the field I'm interested in seeing out of this current dig.

    This isn't an easy line to draw, however.  Both Leo Strauss and Allan Bloom are in the sample.  As are Harold Bloom and Terry Eagleton.  I'd argue all four, right or left, are something more like "applied textual philosophers" who are to philosophy what engineering is to science in the literary sphere. Knowing their influences might be useful to understanding the influence of philosophers, it likely does not tell us the influence of philosophers on philosophy.

    A further problem with drawing this line is the departmentalization of ideas that were once entirely philosophy and became silo-ed fields of their own.  You can see a group of political philosophers (from Rousseau, Burke, a lot of the US Constitution Framers) quite clearly, as well as a group of Utilitarians that have as their descendents modern economics as much as modern philosophical or distributional ethics.  How do the heirs of philosophical traditions fit in in determining the influence of philosophers on philosophy?  Is it fair to count philosophy's daughter fields at all?  (Is it fair not to?!)
     
  5. Maybe I'm just wrong.  Maybe Hegel is king after all.

Summary

Anyway, overall, there are issues in the sample, some clear and some debatable, but Drunks and Lampposts really has won a victory for data munging.  All of this data and the tools to process it have been available for some time.  But it was Simon finding this Bricoleur hodge-podge of scraping, database, and visualization techniques to fashion them together that made an insightful graphic.  And this is why I'd definitely support the claim that these kinds of "hacking skills" are on-par with traditional statistics, subject expertise, and visualization skills in the "Data Science" (I hate this phrase, but it's useful here) toolbox---all the latter are as good as nothing if you can't fashion the information together usefully.  Right-brained creativity and narrative thinking are key.

For Further Reading

Here's the R Code to Pull the Data

library(XML)
library(scales)
library(reshape)
library(gridExtra)
library(SPARQL)

# Create the query
qq <- 'SELECT * WHERE { ?p a  . ?p  ?influenced. }'

# Use it in SPARQL
data <- SPARQL(url='http://dbpedia.org/sparql',query=qq)

# Make it directed
orig <- unlist(data$results[[2]], use.names=F)
dest <- unlist(data$results[[1]], use.names=F)

# Turn URLs into handsome names
for (x in seq(1,length(orig))) {
  orig[x] <- gsub("^<+|>+$","", orig[x])
  orig[x] <- tail(strsplit(orig[x],'/')[[1]],1)
  orig[x] <- URLdecode(orig[x])
  orig[x] <- gsub("_"," ", orig[x])
}
for (x in seq(1,length(dest))) {
  dest[x] <- gsub("^<+|>+$","", dest[x])
  dest[x] <- tail(strsplit(dest[x],'/')[[1]],1)
  dest[x] <- URLdecode(dest[x])
  dest[x] <- gsub("_"," ", dest[x])
}

# Format it as an edge graph.
edges <- data.frame(cbind(as.matrix(orig),as.matrix(dest), rep(1,length(orig))), stringsAsFactors=F)

# Rename data fields to gephi-friendly things
names(edges) <- c('Source', 'Target', 'Weight')

# Clean up Aristotle: These names were on the "List of People Influenced by Aristotle" page
## You can cut out this Aristotle fixup section and it still works
I_by_Aristotle<- c("Francis Bacon",
"Franco Burgersdijck",
"Nicolaus Copernicus",
"René Descartes",
"Georg Wilhelm Friedrich Hegel",
"Thomas Hobbes",
"Immanuel Kant",
"Jean-Jacques Rousseau",
"Baruch Spinoza",
"Mortimer Adler",
"Hannah Arendt",
"Philippa Foot",
"Hans-Georg Gadamer",
"Martin Heidegger",
"Muhammad Iqbal",
"James Joyce",
"Alasdair MacIntyre",
"Jacques Maritain",
"Martha Nussbaum",
"Leo Strauss")
# If someone Aristotle influenced doesn't already have a node, drop them.
present <- unique(orig[(orig %in% I_by_Aristotle)])
I_by_Aristotle <- I_by_Aristotle[(I_by_Aristotle %in% present)]
for (i in seq(1,length(I_by_Aristotle))){
    edges <- data.frame(rbind(edges, c(I_by_Aristotle[i], 'Aristotle', 1)), stringsAsFactors=F)
}
## End of Aristotle fixup section

# Write the file
write.csv(edges,file="Edge_file.csv", row.names=F)

Next up

The first graph I showed has better colors, more aesthetic spacing layout, and even a different authority measure (Katz Centrality) than the others listed in the body.  To accomplish these things yourself, we'll need to add some R power to Gephi. In a later post, I'll show you how to do this directly in R using igraph, and after that, move on to an even cooler expansion.  Stay tuned for Part II

Questions, comments, corrections?  Contact me!

data visualization gephi graph theory networks r-bloggers social network analysis