Computational and Numerical Resources

This is a list of software and data (at the end) I have come across along with some comments on my experience with each.

  • Several of the institutes listed on have useful links pages but you could also try the following places:-
  • visone logo visone
    A software tool intended for research and teaching in social network analysis. Has a interactive graphical user interface, tailored to social networks. Built on the yFiles java library so works across all platforms but not open source. Easily installed on Windows. Still supported and under development. Its primary file format is GraphML but it can import and export graphs in several formats and can export visualisations in many formats including JPEG, PDF, SVG, and postscript.

    Visone is one of my key tools for networks especially for its excellent visualisations. You can easily edit network visualisations and produce publication quality images but there is also a specific editor yed. Visone is fairly intuitive to use (try right clicking on objects to select their properties) though I need to do some experimentation as I find the documentation seems limited, especially on the output side. That is improving as there is an increasing body of questions and answers which you can find by searching the web for keywords.

    My initial  starting point was it was the only package at the time where I could specify coloured edges. The edges can also be one of a number of limited different types (solid, dashed, dots etc.), usually enough if you need to distinguish edges in BW. At the time there was a bug in the way its saves this in its graphml format, (the line style is in the wrong place) so the style is lost when read back in. I used to think that another limitation was that edges have to be straight lines but I have found how to get round that. So again you can produce excellent pictures with the built in tools and with a little hand editing you can really get exactly what you need.

    Since my initial exploration I have found you have great control over every aspect of the visualisation. Basically you can link aspect of visualisation (text size, node colour or size, edge type and colour, etc) to any parameter associated with an object be they calculated within visone or user defined values in the input file.  The analysis side can associate values with objects and then you use them in visualisation as required.  This means I can now control more and more aspects from my java routine which produces the relevant graphml files for visone.  I have yet to work out how to get the values of measurements out of visone in a file and I have not worked out (yet) how to generate networks within visone but what it does it does well and reliably.

    Several useful file formats are supported. However visone does not seem to require all of the information seen in its own files when it exports graphs. So to see what you need create a simple graph, export an example, then edit the produced file and try reimporting. Visone uses the yfiles library so more advanced information may be contained there. This library is also not open source though there is information freely available on it. In more detail

    • The Adjacency Matrix network format import does not seem to require much, but I have found larger examples generated from other places failed to work, probably not got something right in the format. Also can not see how to get single edges for an undirected network.
    • The Edge List format is simple – source vertex and target vertex for each edge listed on its own line. Numbering from 1 upwards did not seem to be a problem.
    • The Edge List network format seemed to be happy with numbering starting from 1.
    • The pajek .net network format is very simple. I found that visone (as many programmes) is sensitive to variations in the format so experiment with simple examples. See detailed comments on pajek net format.
    • The graphml format allows one to control many many aspects e.g. edge colouring, that I wasn’t able to do in other applications. By exporting simple examples one can see that visone produces quite detailed graphml files using the yfiles graphml extensions and these are detailed on the developers pages of yfiles.
  • yED
    Free multiplatform graph editor based on the yFiles java library used by visone and, I think, gephi so works well along side that. Easily installed on Windows. Networks can be read and saved in graphml and gml formats as well as some others, wide range of graphic output formats too. Looks like a nice way to tidy up graphs or to draw some simple examples for publication.
    Note this uses the yfiles library so more advanced information may be contained there.
  • NetworkX
    Describes itself as "High productivity software for complex networks". A widely used, cross platform, free open source package for Python. Python. It is fairly easy to use and it is fast for developing projects and acceptable in terms of resocrces for many projects. See my Python Tips page for things I found useful to know.

    This is now my package making Python my language of choice for most of my numerical work with networks, whether for teaching or research. It came highly recommended in a blog about Python packages(which also refers to NumPy, SciPy, Matplotlib, Pyclutser, RPy as key tools for Python). Lots of information on the web. If you have a problem search the web and you will probably find something relevant. The only problem I found is that the visualisation is not as good as you might need. You can produce them within python with standard modules (or to link through to Graphviz drawing programmes or matlab graph drawing) but I switch to Visone to produce picture for publication.

    I have tried this on Windows and found that most, but not all aspects are not very Windows friendly. Most major packages come with a Windows installer – just make sure you pick the one for the correct version of Python (its obvious from the file names). The main Python package worked fine and I have a version 2.7 with the IDLE GUI interface. Just make sure that c:\Python27 is on the PATH environment variable.The main problem came installing packages. The easy way is to use the easy_install command mentioned on many web pages but they fail to tell you this is part of another package that you have to install first. So first try to install setuptools from the Python package index. This worked easily enough. Note that you download a script for a python programme so you need python installed to run it (either double click on the file or run it via a command window python Before you use easy_install you may have to import setuptoolsinside the python.I found that there were easy Windows executables for the Windows versions of NumPy and SciPy. Watch the order you add them, one may depend on the other. I would probably now use the easy_installroute.The NetworkX I installed by downloading the source code, following instructions on the web site. The easy_installsuggested on the web site would be easier and better.I wanted to use the links to Graphviz (for the dot file conversion for instance) but I never got the pygraphviz package to install. One tip I found and tried is that a lot of these pythons things want some Microsoft C libraries which come with the Visual Studio Express 2008 which is free to download. The 2008 is critical here, versions from other years don’t work. In the end I found the pydot package did what I wanted and installed without a problem using easy_install.
    For an easy to use machine learning library in Python take a look  scikit-learn.  This was recommended by at a page on kaggle.
  • Gephi
    Gephi is an open source visualisation and exploration platform using java Netbeans technology so it is free and it works on all systems. It is a GUI so easier to use than a library. It seems to be well supported and widely used so I am finding it is increasingly easy to find help and support. The only drawback is it is still under development I find it crashes too often. It works on many types of networks including dynamic and hierarchical graphs. It can import various file formats including GDF (GUESS), GraphML (NodeXL), GML, NET (Pajek.), and its own GEXF format. I am beginning to use this as my GUI network analysis package of choice and I see it used by many others. You can play with the data and values you produce, exporting them or linking these to visualisations though I have yet to learn how to do this as well as I can in visone.  It also has plugins so there is a great chance its basic capabilities can be extended e.g. vertex layout by geographical coordinates.  I should highlight that there are some excellent gephi tutorials and cheat sheets by Clement Levalois.
  • pajek spider pajek
    Widely used free package for MS Windows only, though it says its runs under Windows emulators on Linux. GUI driven though it can be driven by a script (apparently, but it doesn’t seem user friendly to do it that way and I have never penetrated the script language). As there is no serious online help, I have found it impossible to use without the book Exploratory Social Network Analysis with Pajek by de Nooy, Mrvar and Batagelj. The book is also not a bad introduction and survey of what social scientists have been doing with networks for many years. There is also a pajek wiki.
    I use this but probably only because it was the only GUI option initially so I have had time to build up my expertise.  Having the book makes it useable but I would almost certainly no drop this for one of the modern open source cross platform offerings provided I was confident they had sufficient documentation.
    The pajek file format for graphs, in its most basic form, is one of the complete and simplest I know and many other packages understand this for both input and output. However as soon as one goes beyond the basics I have found the documentation very limited on the file format. This also means there appears to be no precise standard so different programmes may be tighter or loser on different aspects of the format. Looking at the examples given with the each programme and experimenting seems the best approach. Also see my comments elsewhere on the pajek file format.
    The visualisations can be output in several formats including SVG and postscript. I found it difficult to control the postscript in relation to the the screen rendering, the two can look quite different for sizes and colours of objects.
  • Included with the code and on the associated web site (not sure if the lists are identical) are several useful data sets in pajek format, usually small and usually with a social science context.
  • UCINET Software for Social Network Analysis
    Commercial but cheap. Can read and write many types of simple text file, can deal with excel files and can export data for Mage and Pajek. Includes NetDraw for drawing diagrams. Also it contains many standard data sets from social science, such as the Karate Club, which can easily be exported to a text format. These may be available with the trial version.
  • JUNG (Java Universal Network/Graph Framework)
    Powerful Java based code for networks. I work in JAVA and I ought to use this open code more. Again documentation is limited but here there are extensive example programmes but I found you needed to be relatively fluent in JAVA to use it. Just beginning to get the idea now so I may use this more. Still not been able to understand how to get fast random access to vertices in graphs which are continuously being updated in JUNG, probably because my understanding of JAVA is too fortranesque. JUNG can read GraphML, read and write pajek file formats.  Development seems to have stopped however.
    Graph Layout implements the ARF algorithm to layout a graph. Includes an interface with the JUNG graph package.
  • Google Drive has Google Fusion Tables, i.e data base tables.  This is in beta but you can visualise these as networks.  It looks like a simple way to create small graphs but I am not sure if it can do any analysis.  Could be useful for simple undergraduate projects or quick sketches for a talk.
    To find it go to Google Drive (as Docs is now called). I clicked on “create” but chose the “more” option and there was the hidden “table” option.  It allowed me to import a spreadsheet (I created this separately)  and interpret two columns as source and target.  The network visualisation was under “Experiment” as “Network Graph”. See the example in a post on the Networks Network board.
    The whole google fusion tables programme seems to be aimed at allowing you to work with large public data sets so it may be of wider use.
  • JGraph and JGraphT
    JGraph is an open source network visualisation library using standard Java Swing.
    JGraphT seems to be for modelling large graphs, optimised for analysis and claims to be able to handle millions of nodes. Last time I looked it could export to a good range of file formats (I couldn’t see how to initialise from a file) but had a strange collection of graph algorithms e.g. no clustering coefficient but it can find the chromatic number of a graph.
  • Networkit is a high performance package with a focus on parallelism and scalability. NetworKit is a Python module but the high-performance algorithms are written in C++ and accessed via Python.
  • GUESS – The Graph Exploration System
    GUESS is an exploratory data analysis and visualisation tool for graphs and networks. Free, containing an embedded language called Gython (an extension of Python, or more specifically Jython). Contains “limited, but fairly functional GraphML and pajek” file readers. Contained within Network Workbench.
  • iGraph.
    iGraph is a free package for undirected and directed graphs, including modern community detection algorithms. Written in C, but available in forms making it accessible from Python or Ruby, and from the free R statistical package. Can use GraphML, pajek and GML file formats for networks.
  • Network packages for the R statistical package
    R is free, multiplatform and has very power full analysis tools. It is not GUI orientated and I don’t find it the easiest to use. Data import and export is basic but there are always other packages to add on that may help (see below). Its graphics are excellent but again it may be hard work to get the exact results you need. Visualisation export options are plentiful. I have some general notes on things I found hard to learn or remember in R on my R Tips Pageand other packages are mentioned there. There are several packages for networks:-

    There is a good overview of R For Networks

  • Graphviz – graph visualisation software
    Graphviz is open source graph visualisation software. Provides command line tools but also has limited graphical front end. I got this working easily on Windows. Uses its own DOT format for files. Visualisation output is in very many formats including postscript.
  • BOOST Graph Library (BGL)
    Boost is a widely used and reliable set of routines in C++ covering many topics. It is also available as a Matlab package – MatlabBGL. The BGL is a section of BOOST devoted to graphs. I thought the C++ version was highly extendable but mostly consisted of templates rather than actual implementations. However the Matlab version suggests otherwise.
    GOBLIN is a C++ class library focussed on graph optimisation and network programming problems.
  • SNAP (Stanford Network Analysis Package)
    A general purpose library for network analysis. Written in C++ and suitable for massive networks. Includes tutorials and data sets. Has an simple interface via Microsoft Office in NodeXL.  I found this too complicated to use without investing more time but students of mine have got it working.
  • yFiles
    A commercial java library of algorithms and components for the visualisation and analysis of networks. Used by both visone and yed, so some of the more advanced points may be found in the yfiles own documentation. For instance they all use an yfiles extension to the graphml format and information on this is in the yfiles developers pages.
  • NeAT – Network analysis tools
    Performs many operations on networks and clusters to enable statistical analysis on large graphs. Good documentation and a useful tutorial plus a web based interface make it extremely useable in principle. I’ve found the web interface a bit hit and moss sometimes.
    One extremely useful feature is a network file conversion page. This can handle gml, adjacency matrix and a tab-delimited format which includes the edge list format.
    There is a free UNIX version but it requires a (tedious) registration and licensing process by fax.
  • Cytoscape
    Cytoscape is an open source bioinformatics multiplatform java based package which handles network functions. It uses its own format .sif files but seems to be able to handle many other formats including GML, XGMML, and simple text file formats. These latter seem to be of the form of two files, a list of vertices (one per line per vertex, named as you like) and a second with a list of edges (one edge per line). In each case additional columns (named in first line) are used to add any further information. This can then be used to colour, size etc the network. Various analysis packages available as plug-ins. I have seen this being used and it looks very powerful. Visualisations can be hand edited and may be exported in all the useful formats PDF, EPS, SVG, PNG, JPEG, and BMP. Been recommended to me for when large graphs are to be visualised (but you may need to increase the JAVA memory allocation) but maybe not as nice as visone. It is as least as powerful of visone and realy should be able to do everything you need.  Its well supported and maintained.  The big obstacle is the language – everything is aimed at bionformatice so is described in that language.  Someone needs to write a translation sheet!
  • SPaTO Visual Explorer
    SPaTo is an interactive visualisation tool for exploring complex networks based around the reduction of the network to a shortest path tree from a selected node. The definition of this shortest path is of course the key and that can be done in different ways. Not sure this is good for rigourous analysis but as a quick way to get a feel for a network it could be a nice tool. Open source java based and so cross platform. Biggest drawback seems to be the limited file format – a simple but non-standard XML format. It will work well with MATLab however.
  • VisANT – Integrative Visual Analysis Tool for Biological Networks and Pathways
    VisANT is specifically for the visual data-mining of multi-scale Bio-Network/Pathways.
  • NetDraw
    NetDraw is a free program for visualising social network data. It reads UCINET and pajek format files as well as working with its own formats.
  • NetzCope
    Supposedly a package for Windows or Linux designed for bipartite networks though works with unipartite ones too. Edge list or pajek formatted input accepted. Unable to find decent web site to download code.
  • Javascript solutionsare best for highly compatible web based solutions:-
    • JavaScript Information Visualization Toolkit (JIT) has various different layouts, some of a type I haven’t seen elsewhere in any language. Nice looking toolkit for interactive visualisation of small parts of graphs on screen. Thus looks useful if you need to use the information cleaned from graph analysis. Has several interesting ways of displaying the information.
    • D3.js is a wider set of javscript tools but contains several network visualisations.
    • BINViz is a visualization tool implemented in JavaScript for web based network visualization applications. An alternative BINViz location and screen shots.
    • js-mindmap javascript force-directed layout for graphs
    • Moowheel provides “a unique and elegant way to visualize data using Javascript and the canvas object. This type of visualization can be used to display connections between many different objects, be them people, places, things, or otherwise. The script is licensed under an MIT-style license.”
  • Graph-Easy.
    Written in perl, it converts between many formats.
    Export filters for Graphviz, VCG (Visualizing Compiler Graphs), GDL (Graph Description Language) and GraphML.
    Import filters for Graphviz, VCG and GDL.
    See also CPAN Graph-Easy pages.
  • pyNetConv
    pyNetConv is a Python library which converts various network file formats. In particular it knows about Pajek (.net and .clu, Cytoscape (.sif) and GML files.
  • Orange
    Orange is an open source data visualisation and analysis package which includes some network tools but goes way beyond that e.g. features for text mining and bioinformatics. Works through visual programming or Python scripting.
  • Sonivis
    Aims to create a free network analysis and network mining software that will run on all major platforms. Runs on cross platforms.
  • NodeXL
    Aims to avoid difficult network applications, file formats or programming languages by working through Excel 2007. Windows only.
  • Network Workbench (NWB).
    A large-scale network analysis, modelling and visualisation toolkit for research. Free and java based so cross platform. Easily installed on my Windows system. Has its own network file format (.nwb) but can also deal with PajekGraphML and XGMML formats, as well as the edge list format (.edge) and several other more specialised formats. It contains several other packages underneath including GUESS and JUNG amongst others. Seems too buggy for reliable use even in the official release version 1.0.0 from 2009. There appears to be no work on this project since then. Pity as it had the potential to be very powerful. For instance extremely easy to get a very nice visualisation of community structure.
  • Finding communities in large networks.
    These pages are focussed on MAM: a Multi-Level Aggregation Method for optimising modularity, a.k.a. the Louvain method of community detection. A C++ and a matlab version of the program are freely available for download and other information is there too.
  • CFinder – Clique Percolation method for finding overlapping communities in networks.
    These pages have free code and information on the Clique Percolation Method (CPM) of Palla et. al., Nature 435, 814-818 (2005). This includes a list of papers using the method, visualisations and data sets.
  • Markov Cluster Algorithm (MCL)
    A fast clustering algorithm for graphs (community detection) devised by Stijn van Dongen. Seems to be popular in bioinformatics. The author’s webs pages contain lots of information at different levels and implementations in perl and C, and links to implementations by others in R, java and for MatLab.
  • LEDA a commercial C++ library for graphs, Windows (Visual C++) and Linux. Has a free edition for academic users. I tried it ages ago and I didn’t get it to work for me. It seems to be used in GraphCrunch.
  • GraphCrunch finds the “best” model fitting given data on a network. It uses the LEDA library and is available for Windows or for Linux (the latter may need a little more effort to get working).
  • Map of Science (CX-NETS),
    Various network representations of the fields of scientific research and their relationships.
  • Places and Spaces: Mapping Science
    Various different types of networks of science.
  • visualcomplexity
    “For anyone interested in the visualisation of complex networks”. Seems to be as much about design and visuals as about science though there is no reason these should be incompatible. Indeed they may enhance each other. I rather like the example based on the cover versions of Love Will Tear Us Apart Again but it illustrates the fact that some of these examples have rather limited network content.
    Lots of resources here especially for books on network and visualisation and a wide range of links on networks including datasets and packages used.
  • SCImago Journal and Country Rank
    Ranks journals using a Google-like PageRank type method. This compensates for the facts that some areas may have longer bibliographies which push up their citation counts per paper. One still sees that if a field has twice as many people working in it and twice as many papers produced then its top journals are probably going to be more highly ranked. The same group also produces an Atlas of Science, a graphical representation of IberoAmerican science research.
  • Graph file formats. These are a mess. You really need a network to see how these relate to the different packages but luckily someone has already done this, see the UNC network data formatssection. They include:
    • The Adjacency Matrix network format is a simple network format yet I have found larger examples generated from other places failed to work. This may well be the usual text file issues of space vs tabs or other whitespace as separators, end-of-line formats. See the notes on the pajek net format. A triangle graphs would be given as
         0 1 1
         1 0 1
         1 1 0
    • The Edge List format can be as simple as the source vertex and target vertex for each edge listed on its own line. If numbers are used watch that the first vertex may have to be 0 or 1, or it may be Ok to use anything. Likewise it may be all numbers form smallest to largest must be present or the way missing numbers are dealt with may vary. Note vertices with no edges can not be specified, a drawback of this approach. In some cases the name of each vertex might be allowed, rather than some index. A triangle could be given as
         1 2
         2 3
         3 1
    • The pajek .net format is very simple, is comprehensive, widespread and does not waste too much space. The big advantage over edge lists is that it can encode vertices with no edges. The big disadvantage is that there is no precise standard I know of so several problems below are generic issues associated with text files. Experiment with the programme of interest and a simple example if needed. For instance pajek seems to prefer spaces as separators, tab separators may not be acceptable. Try the pajek wiki for help. There are likely to be several converters which might help, for instance try text2pajek and excel2pajek programs:
      The basic format for a fancy triangle graph is

         *Vertices 3
         1 "AAA"
         2 "Vertex 2"
         *Edges 2
         1 2 1.0
         3 1
         *Arcs 1
         2 3 1.5

      Here are some issues I have encountered:-

      • The separator between items is a space. Tabs do not seem to work.
      • The end of line seems to the Windows text file standard, not sure how other standards are tolerated. If needed I find file compression programmes and my editor WinEdt can be used to change end of line formats in text files but you may need to delve into the options.
      • No space is tolerated between * and the keywords
      • The number of vertices and the number of edges or arcs must be specified.
      • pajek assumes the existence of the number of vertices specified after the keyword *Vertices, and they are indexed (numbered) from 1 to to the number of vertices inclusive (so not starting from 0). If a vertex is not listed it will be created with no other information.
      • If a vertex is given explicitly, the first entry for each vertex is an index.
      • For each vertex there is an optional label. This must be in quotes. Not sure what characters are allowed by different programmes which read these files. At one point I got annoying labels in German if I did not include labels but not sure what was causing that.
      • The *Edges keyword indicates a list of undirected edges follows. These can coexist with directed edges given in a similar list starting with the keyword *Arcs followed by the number of directed edges.
      • The lists of edges and arcs have the index of the source vertex followed by the index of the target vertex.
      • Edges and arcs need not have a weight (or rather they are given a default value of 1.0) but if required a weight is given as a third entry after source and target vertices.
    • GraphML (.graphml files) is an XML based format, seems widely used e.g. read only in JUNG, read and write in igraph, used by visone yED and yfiles. However the core format only covers basic graph topology so other features (e.g. edge colour or style) are additions that must be defined in the relevant packages. According to the Gephi notes, NodeXL, Sonivis, GUESS, NetworkX and Gephi can handle this format to some extent.
    • pajek (.net files) uses a simple format and is produced or read by many other systems.
    • Graphviz .gv and DOT files are often read or written by other systems. I found the description of .dot files extremely confusing. However a few simple examples shows it is basically very simple. I used the python packages of NetworkX and pydotto find out what was going on. Here are some simple examples, one undirected, one directed.
      strict graph G {
      0 -- 1;
      0 -- 2;
      0 -- 3;
      1 -- 2;
      1 -- 3;
      2 -- 3;
      digraph G {
      1 -> 3;
      1 -> 2;
      0 -> 1;
      0 -> 3;
      0 -> 2;
      2 -> 3;
    • GDL – Graph Description Language.
    • GML – G>raph Modelling Language (.gml files) seems like an old format born in 1995. According to the Gephi notes, Pajek (my version doesn’t seem to) , yEd, LEDA, NetworkX and Gephi can handle this format. Used for instance for the various network data sets of Mark Newman. This is quite different from GraphML.
    • XGMML (eXtensible Graph Markup and Modeling Language) is an XML application based on GML.
    • Network WorkBench files (.nwb files) used by Network WorkBench.
    • Cytoscape (.sif) files.

    There must be many ways of converting between them all. For instance try the format conversion page of NeAT for edge lists, adjacency matrices, GML and DOT files. Alternatively look at the conversion programmes pyNetConv and Graph-Easy. Matthieu Latapy has written some useful network routines including format conversions. The list of input and output formats against programmes on the Wikipedia page for Social network analysis software is useful too.

  • Network Data Sets. Numerous entries above contain network data sets – search for data or data set on this web page. If you are really desperate you could use DataThief to lift data from figures. For electronic data sets try these places:-
  • American College Football network of Girvan and Newman Mark Newman provides a football.gml file which contains the network of American football games between Division IA colleges during regular season Fall 2000. The file asks you to cite M. Girvan and M. E. J. Newman, Community structure in social and biological networks, Proc. Natl. Acad. Sci. USA 99, 7821-7826 (2002). There are are two issues with the original GN file. First three teams met twice in one season so the graph is not simple. This is easily dealt with if required. Secondly, the assignments to conferences, the node values, seem to be for the 2001 season and not the 2000 season. The games do appear to be for the 2000 season as stated. For instance the Big West conference existed for football till 2000 while the Sun Belt conference was only started in 2001. Also there were 11 conferences and 5 independents in 2001 but 10 conferences and 8 independents in 2000. I have provided a set of files footballTSE* which define a simple graph with the correct conference assignments in a figshare archive here:- Corrected American College Football Network zip file. The details about the problems with this data and the solutions are given in T.S. Evans, “Clique Graphs and Overlapping Communities”, J. Stat. Mech. (2010) P12037 [arXiv:1009.0638] which would be the appropriate source to cite along with the original GN publication.
  • Analysis of Spatial Data. This can be related to some network analysis
    • sp is a package for spatial data – points, lines, polygons, grids etc.
    • MapWindow is a free and open source package for spatial data viewer and geographic information system. It is built on MapWinGIS and has other interesting add ons. Worked with shape files on my Windows PC but yet to push it.
    • OpenGeoDa is good for working with spatial data analysis. They also have a page listing OpenGeoDa affliated software