Time Constrained Networks

One of the key features of complex networks is that they capture interactions which have no limitations.  In most electronic systems, be they Facebook, emails or web pages, we can make connections across the world with little if any cost.

However what if there are constraints on the links made in a network? Surely we should change the way we study networks if space, time or some other constraint is having a significant effect on the formation or use of the network.  This has been a major interest of mine over the last few years. Space is one obvious limitation as in some cases long distance are less likely to be made. There has been a lot of work in this area over many decades but I will leave this constraint for another blog.

It is only more recently that the role of time in networks has began to receive more attention. A lot of this recent interest in how to deal with networks where the connections, are made at one time.  That is because most communication networks, emails, phone calls and so forth, are of this type. The recent review by Holmes and Saramäki (2012) is such temporal edge networks.

Yet networks are made of two parts: vertices and edges. My recent work has focussed on the case where it is the vertices, not the edges, which are created at a definite time. In such temporal vertex networks, causality forces the interactions between nodes to always point in one direction. For example consider a citation network formed by academic papers. The nodes in our network are the academic papers and the links are formed by their bibliographies.  So if paper A refers to another paper B then we can be (almost) sure that A was written after B. Information can therefore flow only from B to A. In fact any set of documents can only refer to older ones such networks are common. In law, judges refer to previous judgments to support their arguments.  When registering a patent, prior art needs to be cited, that is other previously granted work which may have some relevance to the current claim.

The same types of structure occur in several other situations.  Any situation where there is a logical flow has the same causal structure.  If we have a project where the nodes represent individual tasks then an edge from task S to task T could represent the fact that task T requires task S to have been completed before task T is started.  This has been the focus of work on temporal vertex networks in computer science. The logical flow of a mathematical argument or an excel spreadsheet show the same properties.  These networks define what is called a partially ordered set or poset and it is under this title that you find relevant work coming from mathematicians. A final example comes from the Causal Sets approach to quantum gravity (see Dowker 2006 for a review).  Here space-time is discrete not continuous, and these discrete points are the nodes of the network.  The nodes are connected by edges only if they are causally connected and causality again gives these a common direction.

All of these temporal vertex networks have a key property.  That is they contain no loops if you always follow the direction on the edges.  You can not go backwards in time.  Thus the traditional name for such a network is a directed acyclic networks or DAG for short.

So the question is how can we adapt traditional network measures to deal with the fact that these networks, DAGs, are constrained by causality?  Are there new measures we should employ which give more insights to such networks?

I’ve been looking at these problems with several students (undergraduates in their final year projects and some MSc students), one of whom, James Clough, is now working for his PhD on this topic.

Paths in networks are always important.  However one feature of a DAG we have been exploiting is that if we always follow the direction of the arrows, the direction of time, then not all nodes are connected. If we like we could add edges whenever there is such a path connected a late node to an earlier one, a process known as transitive completion.  On the other hand we could remove as many edges as we can while leaving the causal relationships intact, a process known as transitive reduction. That is, if there is a path between two nodes in the network before transitive reduction, then there will still be a link afterwards.

Example of the transitive reduction (left) and the transitive completion (right) of the directed acyclic graph in the centre.

What we have done (in Transitive Reduction of Citation Networks) is look at how real data from citation networks behaves after transitive reduction.  What we find is that different types of citation network behave very differently. The citation network formed from academic papers taken from the arXiv repositoryand the network of US Supreme Court judgments both show that about 80% of the edges are not needed to retain all the causal relationships.  On the other hand the patents network shows the opposite behaviour with all but 15% of edges being essential. The edges removed tend to be the citation to older papers. One interpretation is that academics and and judges may be citing well known early papers and judgments though their current work is only directly related to more recent documents. Perhaps some of these citations do not indicate the early work was needed but reflect other motivations, such as simple copying of popular papers or review in the field which at best only have general relevance. For academic papers this interpretation is supported by the work of Simkins and Roychowdhury In this sense unnecessarily.

The number of citations to a document after transitive reduction certainly gives us a different view of the importance of different documents. For instance paper hep-th/9802109 on the arXiv (Gauge Theory Correlators from Non-Critical String Theory by Gubsner et al.) was cited by 1641 papers in the network, but only three citations remained after TR! On the other hand, paper hep-th/9905111 (Large N Field Theories, String Theory and Gravity by Aharony et al.) has also large number of citations in the raw data, 806, yet after transitive reduction it has 77, so retaining far more of its original citations. Perhaps information in the second paper was used more diversely.

We can find similar examples in the US Supreme Court citation network. The case Schneider vs. New Jersey (1939)’ has 144 citations in the original data but this drops to just just one after transitive reduction. Stromberg vs. California (1931) also falls from 132 citations to just one. Conversely, the case Heller vs. New York (1973) only shows a slight fall after transitive reduction, falling from from 68 to 48 citations and has the most citations in our reduced network. The second most cited case after transitive reduction is Hamling vs. United States, which drops from 68 to 38 citations. Wikipedia lists hundreds of Supreme Court cases but the last two are not famous enough to make the Wikipedia list.  Our analysis suggests they may have more importance than a simple citation count would suggest. At the very least it might be be worth checking out documents that are highly cited in the essential.

Another way to look at citation networks is to see if we can define a dimension to the network.  That is we can try to quantify how much variation there is in the citation process.  A low dimension means that there are few directions , few distinct themes relevant for citation in a document.  A high dimension indicates that there is a wide range of relevant but distinct directions from which a document will draw on for inspiration.  What James Clough and I found (in What is the dimension of citation space?) is that we were often able to assign an interesting value for the dimension of our citation data.  For academic papers, we found that different fields of research have different dimensions.  For papers in the hep-th arXiv section (largely string theory) we found a low dimension of around 2 while for theoretical papers closely linked to particle physics experiments (hep-ph section) we found more variation as indicated by a higher dimension of 3. The quant-ph also around 3 while the astro-ph section had a slightly higher dimension of around 3.5. So clearly despite similarities in the main data using standard measures, our time-aware dimension measures show clear differences in the citation behaviour of different areas. String theory in particular seems to be a tightly knit collection of work with each work largely dependent on all the other work, few independent directions can be pursued. The US Supreme Court judgments were more complicated.  Small samples (usually from modern judgments) showed a dimension of around 2.5 to 3 but larger samples, typically ranging from modern to the earliest judgments, had lower dimensions, closer to 2. We interpreted this as reflecting the way that there were few early judgments compared to the number produced to day.  So that the further back we traced in time to find the influence of judgments on recent ones, the smaller the variation.  Perhaps that is not so surprising and we might expect a similar shape if we could follow scientific papers back to the 18th century! patents on the other hand showed a much higher dimension though again these were involved.

It is clear from just the few studies we have made that time makes a crucial difference to the structure of a network. We have tried a few new measures adapted to take account of time and in doing so we have thrown up some intriguing features in real data.  There is surely much more to find when networks are embedded in time.

References

Clough, J.R. & Evans, T.S. What is the dimension of citation space?, arXiv:1408.1274

Clough, J. R.; Gollings, J.; Loach, T. V. & Evans, T. S., Transitive Reduction of Citation Networks, J.Complex Networks to appear 2014, arXiv:1310.8224 DOI: 10.1093/comnet/cnu039 (open access)

Dowker, F. Causal sets as discrete spacetime, 2006. Contemporary Physics, 47, 1-9

Holme, P. & Saramäki, J. 2012. Temporal Networks Physics Reports, 519, 97-125

Simkin M.V. and Roychowdhury V.P., 2003. Read before you cite! Complex Systems 14 269-274

Sculplexity: sculptures of complexity using 3D printing

Sculplexity: sculptures of complexity using 3D printing

I have just had an unusual paper published by European Physics Letters (doi: 10.1209/0295-5075/104/48001). 3D printing (or additive manufacturing technologies to give the field its more formal name) is very much in vogue: the first 3D printed gun, Obama’s 2013 state of the Union address, and so forth. I like to know about new technological advances, even better if I can play with it. So how can a theoretical physicist get involved with 3D printing?

One of the ways I learn about a new topic is to set it up as a project for final year undergraduate students or as a masters level project. We have some of the best students in the UK here so all I have to do is come up with an outline and the students usually rise to the challenge.

What I needed was a connection between my work in complexity and 3D printing. This link came from one of those serendipitous moments. Another advantage of being at Imperial is that it is part of a Victorian complex of arts and science institutions so we are surrounded by national museums. One is the V&A (Victoria and Albert museum) which is dedicated to all aspects of design, from all ages and all locations. It also has some amazing tea rooms, covered in tiles designed by William Morris, great when I have a visitor and a little more time. It was on one of those trips that I just happened to walk past something that caught my eye. At one level it is just a table. However I saw this table as a branching process. To the designers, it was inspired by the tree-like structures of nature. The designers had embraced technology to do this, using Computer Aided Design (CAD) and 3D printing. For on further investigation this was the Fractal.MGX table designed by WertelOberfell and Matthias Bär, the first 3D printed object the V&A had acquired.

Fractal.MGX table WertelOberfell photo credit Stephane Briolant

The Fractal.MGX table designed by WertelOberfell and Mathias Bär, photo credit Stephane Briolant (Image rights WertelOberfell).

Branching processes occur in many problems in complex systems and they have a long history of mathematical investigation. So here was the link I was looking for. The question I asked was what other physics models familiar to me could be turned into a 3D printed object? How would you actually do this conversion? Does the tool, the 3D printer, impose its own limitations on the type of mathematical model we can use? Are these new limitations interesting mathematically in their own right? Until now researchers had only seen these mathematical models visualised using two-dimensional representations, often not even using perspective to give the impression of a 3D object. Making a 3D printed object opens up uncharted territory. My project shows one can move from traditional visualisations to new “tactilisations”. So can we gain new insights by using touch rather than vision?

The approach might also be useful for outreach as well as research. The same things that got my students and I interested in the project might intrigue school children, a retired judge or whoever. These objects might be particularly useful when explaining science to those whose sense of touch is better than their sight. However we could also go back to where this project started and see if models of complexity can produce something of aesthetic value alone – hence Sculplexity: sculptures of complexity.

The basic idea is simple. A 3D printer builds up its object in layers. So the height of the object can be thought of as a time. Suppose I have a model which defines a flat (two dimensional) picture. Typically this will be a grid with some squares full and some empty. The model also has to describe how this picture evolves in time. For instance there is a famous example known as Conway’s Game of Life for which are there many 2D visualisations. What I do is use the model at each point in time to define what the printer should print at one height. The next time step in the model will then define what to print on top of the first layer, and so forth.

In fact while the basic idea is straightforward, the implementation turned out to be much much harder than I expected. It is to the real credit of the undergraduate students working with me on this project, Dominic Reiss and Joshua  Price, that we pushed this idea through and actually got a final 3D printed object representing our modified forest fire model. OK so our final result is a bit of a lump of black plastic compared to the inspiration for this project, the Fractal.MGX table in the V&A. But this is just the start.

Now that we have shown that this can be done, there is so much more to explore.  We are adding another dimension to representations of mathematical models but the possibilities for 3D printing are endless.  All we have done is made the first step in terms of 3d printing and mathematical models. We have highlighted the key problems and given at least one way to fix them. I can already see how to extend the existing approach, new solutions to some of the problems, different ways to create an object from a wider variety of theoretical models. Imagination and ingenuity are all that are required.

View of top of Sculplexity output

Top view of the output produced from the Sculplexity project. Yes, its on a bed of rice since rice was used in an experiment to simulate another classic model of complexity - the sandpile model.

Note added

I have since found some other work using 3D printing to visualise maths which is full of useful ideas.  So look for the work of Aboufadel, Krawczyk and Sherman-Bennettand as well as that by Knill and Slavkovsky listed below.

References

  1. Reiss D.S., Price J.J. and Evans T.S., 2013. Sculplexity: Sculptures of Complexity using 3D printing, European Physics Letters 104 (2013) 48001,  doi 10.1209/0295-5075/104/48001.
    Copy of Sculplexity: Sculptures of Complexity using 3D printing on personal web page.
    Chosen to be part of IOPselect, a collection of papers chosen by the editors for their novelty, significance and potential impact on future research.
    (altmetrics for paper).
  2. Evans T.S., 2013. Images of 3D printing output for Sculptures of Complexity – Sculpexityhttp://dx.doi.org/10.6084/m9.figshare.868866
  3. Reiss D.S. and Price J.J., 2013. Source-code for Complex Processes and 3D Printing, https://github.com/doreiss/3D_Print, doi 10.6084/m9.figshare.718155 .
  4. Reiss D.S., 2013. Complex Processes and 3D Printing, project report, http://dx.doi.org/10.6084/m9.figshare.718146.
  5. Price J.J., 2013. Complex Processes and 3D Printing, project report, http://dx.doi.org/10.6084/m9.figshare.718147.
  6. 3D printing used as a tool to explain theoretical physics by Laura Gallagher, Imperial College News and Events, 9th December 2013
  7. 3D-printed models of complex theoretical physics“ The IET Engineering and Technology Magazine, 9th December 2013.
  8. Aboufadel E., Krawczyk S.V. and Sherman-Bennett, M. “3D Printing for Math Professors and Their Students“, arXiv:1308.3420, 2013.
  9. Knill O. and Slavkovsky E., “Illustrating Mathematics using 3D Printers“, arXiv:1306.5599, 2013

Power Laws

Why are power laws still in the news? Michael Stumpf and Mason Porter recently published a commentary on the topic in Science (“Critical Truth About Power Laws“).  Yet the nature of long tailed distributions has been a debate in physics since the work on critical phenomena in the 70′s.  This then continued in the context of fractals in the 80s, self-organised criticality in the 90′s and now in terms of complex networks. So the issues should be familiar to theoretical physicists and applied mathematicians working in these areas.  However the field of complexity is multidisciplinary and there is a real need to reach out to talk to those researchers from other backgrounds to explain about what we have learnt in all these debates about power laws.  Often they have their own experiences to add to this debate – financial mathematicians have long been interested in long tailed distributions.  Physicists may understand the subtext behind attention grabbing headlines but there is no reason to think others know how large a pinch of salt to add to these claims. That has certainly been my impression when hearing experts from other fields refer to some of early claims about complex networks. Perhaps the most worrying aspect is that many of the points raised in this age old debate are still not addressed in many modern publications. This is part of the frustration I hear when reading the Stumpf and Porter article. To me this is a good example of the repeated failure in the quality control derived from the current referring system (but that is a debate for another time).

One of the key points made by Stumpf and Porter, though I’ve heard this many times before, is the lack of statistical support behind many claims for a power law. Identifying a power law behaviour is not trivial. I always remember that Kim Christensen at Imperial recommended to me that four decades of data were needed following his experiences with power laws while Stumpf and Porter suggest two.  Many examples fail to pass even this lower limit.

Aaron Clauset,  Cosma Shalizi and Mark Newman provide one popular statistical recipe and code that addresses most issues ( “Power-law distributions in empirical data”  - and check out the blogs on the arXiv trackback).  The approach will produce much more useful results than I’ve seen in several published papers. It might not be perfect though. I think I’ve seen warnings against the use of cumulative distributions as problems in the early part of the data will effect many more data points that issues in later data points.

In fact I would go further. I often reject papers with no error estimates in measured quantities, no uncertainties in data points – I mark our own physics students down for these failings. Why show points and a best fit on a log-log plot except as an illustration? A plot of residuals (differences between data and fit) is far more informative scientifically. Many refereed papers appear to let these things go limiting the impact of their message.

Another part of the debate and a key message in Stumpf and Porter was the meaning of a power law. Most researchers I know realised any hope of universality in the powers seen in network data was misplaced in the early naughties. For me this was one of my first questions and as I did not see it answered in the literature it led me to think about the formation of scale-free networks in the real world.  I wrote up as a paper with Jari Sarämaki from Finland (“Realistic models for the formation of scale-free networks).  Existing models just didn’t explain this at the time, precisely what Stumpf and Porter say is still missing in many discussions about power laws even today.

Fig 2, Evans and Saramäki, arXiv:cond-mat/0411390

Power Laws in a Scale Free network formed using random walks. Even with a theoretical model I fell just short of 4 decades of data. Note the right hand plot showed non-trivial finite size effects hiding in the tail. This was all for networks with one million vertices. Fig 2 of arXiv:cond-mat/0411390

There was a technical point in the Stumpf and Porter article that did catch my eye but in the end left me disappointed.  They highlighted that for random numbers drawn from any long tailed distribution there is a generalization of the central limit theorem.  It tells us that sums of many random numbers drawn from such distributions will lie in distributions with a power law tail (you have to define all this carefully though). However this is not as powerful as it sounds. The definition of a long tailed distribution in this context is one with a power law tail. Seems circular to me.

Less seriously I thought the sketch provided in the Stumpf and Porter Science article was a bad example to set in an article about bad representations of data. One axis is a qualitative judgment while the other is an unspecified statistical measure. The ‘data points’ on the plot are mostly unspecified. The one marked Zipf’s law is presumably for the classic data on cities though which of many data sets this on this topic I’m not sure. What I think was intended was a sketch to indicate the subjective judgments of the authors on the nature of power laws which have been suggested in different fields. This would be terms of their two main themes: the statistical support for power laws, and the modelling and interpretation of results.  In the end the sketch given didn’t convey anything useful to me.

Still these are minor quibbles. If an article can keep the Imperial Complexity and Networks programme’s weekly meeting talking for an extra half an hour, a discussion led by Gunnar Pruessnar from Imperial, it has got to be doing something good.