Antz

Main Antz Page | Toroid Visualizations

More Antz

Explore more pics on my Picasa site.

Youtube video tutorials on Antz (thanks to Joe Adams and other Antzers)

 




Advanced Visualization of Scopus Search Results (and other datasets)

My first application of Antz was an attempt to visualize search results for the Scopus citation database. The search was semi-arbitrary, consisting of several keywords of interest for scientists interested in ants research (thus the name, Antz) within the context of complex biological systems.

The dataset iteslf consists of 17,118 records of publications returned by Scopus based on keyword search terms included in this file.

This data is derived from the "All_Terms.xls" spreadsheet. It is a subset of data that cleaned up fairly well.

The dataset may be explored in a variety of ways, some of which are included here.

 


Exploration Method 1: Index Keyword Count Distribution Per Article

With this method of data exploration, we begin by performing a frequency count on the number of index keywords for each of the 17118 articles. We get the following results:

Contrast this with the frequency distribution of # of authors for each article:

Author Count Per Article

The above distribution is uniform and to be expected. However, the distribution of the index keywords is non-uniform. If we evaluate the index keyword distribution based on journal (i.e. source title), we observe the results in the image below:

Index Keyword Distributions Per Journal (Source Title)

Index Keywords Per Journal

The Journal of Chemical Ecology includes articles with index keywords in the range of 25-40 keywords per article. This appears to have a great influence on any distributions of search terms. Thus the terms 'chem' and 'eco' are expected to occur more frequently in higher values of index keyword counts than terms such as 'tech' or 'info' which would be expected to occur more fequently in index keyword counts ranging between 1-10. This provides a method of visually categorzing search results based on terminlology.

For example, using the distribution for Index Keyword Word Count Per Article, we perform a search on the following 12 terms:

statistical, physical, animal, optical, insect, behavior, chemical, ecology, chem, ecol, biochem, bio

This search is processed and categorized based on the frequency distribution of index keywords. The image below shows the distribution of the results. The term "behavior" appears to dominate the values of index keyword distribution values ~10/article.

In the range of higher values of index keyword frequencies, we observe different search terms dominating:

 

This suggests the visualization technique may serve as a visual method for differentiating between tendencies among journals to use varying numbers of keywords.

 

Try your own query

Get Index Keyword Count With 12 Term Sub-Search

Try your own query using the following terms: tech, info, physic, mech, eng, phy, chem, bio, eco, psy, anim, plant. The resulting visualization should resemble the image below. Color-coding is counterclockwise from above, beginning with red, green, and blue with red at 12, green at 11, blue at 10, etc.:

Here is the csv file.


Exploration Method 2: Author Count Per Article

This is a similar query to method 1 above but insteand of performing an initial count on index keywords per article, we perform a count on number of authors per article, then search further on 12 terms.

 

Try your own query

Get Author Count With 12 Term Sub-Search

Try running a search for each of these first two methods using the same list of terms to compare results and to get a better understanding of how to interpret this visualization technique. For example, the following terms return the image below:

physic, chem, bio, eco, anthro, cosmo, tech, engineer, science, life, animal, plant

Below is an image of the results comparing method 1 with method 2:

Notice the distribution of the large gray main toroids and their similarity to the bar chart below:

In fact we could break the above bar chart down into a stacked 3D chart using the results of our 12 search terms (exercise for the reader). Our toroid visualization is roughly equivalent to a stacked bar chart in terms of basic information representation. However, what if we were to then perform a tertiary search based on yet another field such as year, journal, or a different set of keywords.


Exploration Method 3

This is a basic search of author publications vs. citations.

Try your own query

Get Author # of Citations vs. # of Publications sorted alphabetically.

Antz

 


Exploration Method 4

Top Level Search on Year Parsed with 12 Terms

# of Titles Per Year Parsed with 12 Terms

# of Titles Per Year Parsed with 12 Terms Selected From List of Top 1000 terms (index keywords)

 


Exploration Method 5

Individual Publications Latitude, Longitude, Year, Citations, and Journal Parsed With Search Terms

Individual Publications Latitude, Longitude, Year, Citations, and Journal Parsed With Search Terms

Alternate search on just the Top 20 Journals. (Journal list and color-coded legend TBD)

Alternate search on just the Top 12 Journals. (Journal list and color-coded legend TBD)

 


Exploration Method 6

All Publications For Specific Journal With Latitude, Longitude, Year, Citations Parsed With Search Terms

All Publications For Specific Journal With Latitude, Longitude, Year, Citations Parsed With Search Terms

 


Future Directions

3rd Level Search on Year

3rd Level search on year (under development)


Instead of using author or keyword counts as a starting point, we could use the correlation matrix distibution between article and journal.

Article-Journal Correlation Matrix

Correlation for Index Keywords Between First 50 Titles and Top 96 Journals

 

Another option would be to use one of the journal distributions (see image below);

Index Keywords Per Journal

 

Advanced Visualizations

Advanced Visualizations of the Antz Citation Database. Pin and main toroid color represents Journal (top 20 are colored, all others are white), main toroid size represents author citations (big brown "mother ship" is Per Bak, for example), octahedra color represents keyword search term, size represents number of occurences of that term.

Afghanistan - 2009 Nangahar Province School Data (students, resources):

2009 data on student population (male vs. female), # of desks/chairs.

 

Visualization of 12 Parameters for Three Wine Classes of White Wine From Spain

 

Single Season for the Arizona Diamondbacks

Only a single month (April) is fully represented with data to simplify the visualization.

Baseball Viz

 

Parkinsons Disease Patient Speech Pattern Visualization

Source:

The dataset was created by Max Little of the University of Oxford, in collaboration with the National Centre for Voice and Speech, Denver, Colorado, who recorded the speech signals. The original study published the feature extraction methods for general voice disorders.


Data Set Information:

This dataset is composed of a range of biomedical voice measurements from 31 people, 23 with Parkinson's disease (PD). Each column in the table is a particular voice measure, and each row corresponds one of 195 voice recording from these individuals ("name" column). The main aim of the data is to discriminate healthy people from those with PD, according to "status" column which is set to 0 for healthy and 1 for PD. 

The data is in ASCII CSV format. The rows of the CSV file contain an instance corresponding to one voice recording. There are around six recordings per patient, the name of the patient is identified in the first column.For further information or to pass on comments, please contact Max Little (littlem '@' robots.ox.ac.uk). 

Further details are contained in the following reference -- if you use this dataset, please cite: 
Max A. Little, Patrick E. McSharry, Eric J. Hunter, Lorraine O. Ramig (2008), 'Suitability of dysphonia measurements for telemonitoring ofParkinson's disease', IEEE Transactions on Biomedical Engineering (to appear).

Miscellaneous Visualization Visions

The following images were generated by Dave Warner and others. They were derived from the above datasets and modified to serve as hypothetical 'visionary' visualizations for demonstration purposes: