Thursday, 14 February 2013

Matching Problems II


A typically uninteresting week and a half since my last post, and not a lot of results to show for it unfortunately. I have spent the time attempting to resolve an issue with multi-matching algorithm, but to no avail.

Analysis of the processes (and when I say processes I mean the hundreds and thousands of lines of information output at each stage of the coarsening scheme) showed that the cause comes from minute decisions when choosing matchings. It appears as though these decisions come from the method of storing adjacent vertices within each vertex as opposed to a set of edges - and will only occur in large graphs with many levels (graphs over 20k vertices normally present with more levels in the multilevel scheme).

Attempted fixes did little to fix the issue (but did result in a more efficient coarsening algorithm - thank you hash maps!).

In any case, I suspect my methodology is flawed as I am iterating over vertices normally - it gives vertices at the beginning more chance of being matched whereas though near the end will have less available partners. For now, I need to check with my supervisor's previous work in order to determine whether the number of coarser levels is comparable, or whether it is some part of my code that is giving bad answers.

As a break away from problem solving, I intend to spend the rest of this week on the new matching and pre-processing techniques (using only standard edge contraction - known to work as expected). This will enable me to play around with some of the ideas and get some preliminary results.

Interesting thought for the day - it seems finan512, a ring of repeated groupings of vertices, does not actually coarsen to a ring in any of my coarsening. Instead, reducing to a sparse type graph. Looking at the structure makes me think that if I were to identify these repeated vertex groupings, I could provide layout for one and then just replicate the same layout around the circumference of the ring. This is a play on identifying automorphisms[?] within the graph and placing them in appropriate locations as suggested long ago by Lipton et al. It is a bit biased to just the finan512 graph, but if they could be identified with ease in any graph - would it help? not really my concern at the moment, but food for thought.

Wednesday, 6 February 2013

Back to MPhil? Oh, and the multi-matching is broken too

Interesting week this week, so most annoying and worrisome topic first: looks like I am still an MPhil student. It has become delightfully apparent that the examiners have not yet sent the Assessment Outcome form (of MPhil to PhD), and so as a result, I am still registered as an MPhil student. I just hope it doesn't interfere with my plans for finishing in 3 years (the Research Committee have been stubborn regarding forms in the past, and if they read the form as me having just completed the transfer, may dislike the idea of me trying to go for PhD with such small amount of time left).

Now that I have complained about it a little, I can move on to the next complaint: the multi-matching and its dislike of certain graphs. In the past few posts, I have reported good results regarding the reduction of runtime for many smaller graphs; however, tests on larger graphs took longer to run and so I was left waiting for those results.

It seems only a few of these tests came back with positive results, with some, specifically finan512, coming back with much higher runtimes (nearly 8x the original runtime measured in the Redosmium/MGF implementation). This extended runtime exists for all matching techniques, including the standard edge contraction.

The main difference in algorithms is the multi-matching, which relies on adjacent vertices being stored as a collection that is associated with individual vertices (similar to Walshaw, O(2E)), as opposed to the previous method of storing them as a separate collection (O(E)). Having received results for smaller graphs which were similar to the MGF, it was an assumption that the new coarsening algorithm and method of storing worked as it should of - these new results suggest differently.

As mentioned, finan512 has the unusually high runtime. Some analysis shows that the number of levels in the multilevel scheme has risen from 35 (on average) to over 600; reinforcing the notion that the problem lies in the coarsening scheme. Having investigated, it is difficult to see where the problem is coming from - the coarsening scheme runs as expected for smaller graphs.

Unable to see the problem, I have begun getting restless and irritated (perhaps why this MPhil thing has annoyed me more than it should). In any case, I plan to look at why finan512 and sierpinski10 have issues while others, such as dime20, do not, and if possible, fix this multi-matching/coarsening scheme so I can get some reliable results.

As a finishing note, it's been a while since I have shown any pictures of output, so here's two more recent layouts I have found particularly pleasing or interesting.

add20 drawn using a circular placement primitive (originally for smaller star type graphs but accidentally applied to the original graph - oddly enough).

dime20 drawn very quickly with a matching number of vertices (so any vertices with 7 or less adjacent vertices coarsened at once). The layout is very pleasing and seems not to show any issues with the local layout, however, the global layout appears tangled still. Investigation into the cause would be ideal (and would help global layout generation for huge graphs) but due to the above problem with multi-matching, is not yet a concern.

Friday, 1 February 2013

Primitive Coarsening

Primitive Coarsening - identifying primitive graph layouts during the coarsening phase, so to avoid any further coarsening deemed unnecessary and wasteful, and to provide expected positions to vertices.

Having presented my ideas and preliminary results to my primary supervisor, we discussed  a long standing topic in my research - how to deal with graphs when they are coarsened to a primitive graph type such as a trail (chain/path/thing), ring or star of connected vertices. Previously, our implementation of the multilevel scheme has either ignored these and continued to coarsen to two vertices - or if the diffference between levels is less than some tolerance (normally 1 vertex), stop coarsening.

Trailing (new name pending) and Multi-matching has been shown to decrease runtime, with trailing being the more appropriate method (Multi-matching requires a matching number, and each graph may run better or worse with matching numbers, which have so far been unpredictable - should put some results on here soon I guess). Having shown that Trailing works so far, and copes with graphs when coarsened to a trail, we look at the star and ring primitives.

So some some definitions:

  • Trail/Chain/Path - two vertices with only 1 adjacent vertex (the ends), and all other vertices between having 2 adjacent vertices.
  • Star - one vertex (the center) having V-1 adjacent vertices, and V-1 vertices have 1 adjacent vertex.
  • Ring - every vertex having 2 adjacent vertices.

As these structures have a known layout, FDP is unnecessary. The layout can be computed quickly and applied to this level, providing the graph with an "initial layout". This is immediately passed to the next finest graph and does not require refinement.

So far, trailing has providing useful as a preprocessing for the coarsening phases, and results in large reductions in runtime for Add20 and Add32, with some small changes to other more regular structured graphs. Quality, on average, has some small improvement across the board, however, the range makes it rather unreliable - so we need some further investigation on this.

Stopping coarsening when a primitive structure is found further decreases this runtime; as an example, Add20 was previously 10seconds with MGF, this was decreased to 4.7 seconds with Trailing, and further decreased to 3.6seconds with the introduction of Primitive Identification/Coarsening. This is still slower than some of the methods used by Hu, but is a vast improvement on the multilevel runtimes.

Next, I have intention to try an alternate (or additonal) preprocessing method which identifies small star structures in a graph (which may be connected to other star structures), and coarsen those to simplify a graph when trails to not exist but when vertices loosely connected do.

For now, there is a lot of clearing up to do regarding the work and theory, and many many results to collect, analyse and write up. I shall update again and put everything into a less erratic form in the future. Back to work for me, have a good weekend!