Main Antz Page | Toroid Visualizations
Youtube video tutorials on Antz (thanks to Joe Adams and other Antzers)
Advanced Visualization of Scopus Search Results (and other datasets)
My first application of Antz was an attempt to visualize search results for the Scopus citation database. The search was semi-arbitrary, consisting of several keywords of interest for scientists interested in ants research (thus the name, Antz) within the context of complex biological systems.
The dataset iteslf consists of 17,118 records of publications returned by Scopus based on keyword search terms included in this file.
This data is derived from the "All_Terms.xls" spreadsheet. It is a subset of data that cleaned up fairly well.
The dataset may be explored in a variety of ways, some of which are included here.
Exploration Method 1: Index Keyword Count Distribution Per Article
With this method of data exploration, we begin by performing a frequency count on the number of index keywords for each of the 17118 articles. We get the following results:
Contrast this with the frequency distribution of # of authors for each article:
Author Count Per Article
The above distribution is uniform and to be expected. However, the distribution of the index keywords is non-uniform. If we evaluate the index keyword distribution based on journal (i.e. source title), we observe the results in the image below:
Index Keyword Distributions Per Journal (Source Title)
The Journal of Chemical Ecology includes articles with index keywords in the range of 25-40 keywords per article. This appears to have a great influence on any distributions of search terms. Thus the terms 'chem' and 'eco' are expected to occur more frequently in higher values of index keyword counts than terms such as 'tech' or 'info' which would be expected to occur more fequently in index keyword counts ranging between 1-10. This provides a method of visually categorzing search results based on terminlology.
For example, using the distribution for Index Keyword Word Count Per Article, we perform a search on the following 12 terms:
statistical, physical, animal, optical, insect, behavior, chemical, ecology, chem, ecol, biochem, bio
This search is processed and categorized based on the frequency distribution of index keywords. The image below shows the distribution of the results. The term "behavior" appears to dominate the values of index keyword distribution values ~10/article.
In the range of higher values of index keyword frequencies, we observe different search terms dominating:
This suggests the visualization technique may serve as a visual method for differentiating between tendencies among journals to use varying numbers of keywords.
Try your own query using the following terms: tech, info, physic, mech, eng, phy, chem, bio, eco, psy, anim, plant. The resulting visualization should resemble the image below. Color-coding is counterclockwise from above, beginning with red, green, and blue with red at 12, green at 11, blue at 10, etc.:
Exploration Method 2: Author Count Per Article
This is a similar query to method 1 above but insteand of performing an initial count on index keywords per article, we perform a count on number of authors per article, then search further on 12 terms.
Try running a search for each of these first two methods using the same list of terms to compare results and to get a better understanding of how to interpret this visualization technique. For example, the following terms return the image below:
physic, chem, bio, eco, anthro, cosmo, tech, engineer, science, life, animal, plant
Below is an image of the results comparing method 1 with method 2:
Notice the distribution of the large gray main toroids and their similarity to the bar chart below:
In fact we could break the above bar chart down into a stacked 3D chart using the results of our 12 search terms (exercise for the reader). Our toroid visualization is roughly equivalent to a stacked bar chart in terms of basic information representation. However, what if we were to then perform a tertiary search based on yet another field such as year, journal, or a different set of keywords.
Exploration Method 3
This is a basic search of author publications vs. citations.
Exploration Method 4
Top Level Search on Year Parsed with 12 Terms
Exploration Method 5
Individual Publications Latitude, Longitude, Year, Citations, and Journal Parsed With Search Terms
Exploration Method 6
All Publications For Specific Journal With Latitude, Longitude, Year, Citations Parsed With Search Terms
3rd Level Search on Year
Instead of using author or keyword counts as a starting point, we could use the correlation matrix distibution between article and journal.
Another option would be to use one of the journal distributions (see image below);
Advanced Visualizations of the Antz Citation Database. Pin and main toroid color represents Journal (top 20 are colored, all others are white), main toroid size represents author citations (big brown "mother ship" is Per Bak, for example), octahedra color represents keyword search term, size represents number of occurences of that term.
2009 data on student population (male vs. female), # of desks/chairs.
Only a single month (April) is fully represented with data to simplify the visualization.
The dataset was created by Max Little of the University of Oxford, in collaboration with the National Centre for Voice and Speech, Denver, Colorado, who recorded the speech signals. The original study published the feature extraction methods for general voice disorders.
Data Set Information:
This dataset is composed of a range of biomedical voice measurements from 31 people, 23 with Parkinson's disease (PD). Each column in the table is a particular voice measure, and each row corresponds one of 195 voice recording from these individuals ("name" column). The main aim of the data is to discriminate healthy people from those with PD, according to "status" column which is set to 0 for healthy and 1 for PD.
The following images were generated by Dave Warner and others. They were derived from the above datasets and modified to serve as hypothetical 'visionary' visualizations for demonstration purposes: