Client Projects

Visualizing Our Global World | Ambition | Visualizing Missing Species Interactions Data | Visualizing a Research Library
The Millennium Project | Mapping Indiana University International Collaborations | What's in a Name? SBMA
Press Coverage of NAMS Position Statement | Visualization of the Occupational Therapy Researcher Database
Social Network Analysis of “Tutor/Mentor Networking” | CoBRA: Comic Book Readership Archive | Globalization of the United States

Visualizing Our Global World

Client Name: The GDELT Project

Project Description (goal/scientific or practical value):
GDELT Project monitors the world's broadcast, print, and web news from nearly every corner of every country in over 100 languages and identifies the people, locations, organizations, counts, themes, sources, emotions, counts, quotes and events driving our global society every second of every day, creating a free open platform for computing on the entire world.

Taken together, this collection represents one of the largest unclassified quantitative archives of codified human society. The dataset’s unique scale, scope, and coverage has made visualization challenging, given the number of data modalities contained within that span spatial, temporal, and network attributes, content and citation graphs, etc.

The dataset’s global focus means it captures the chaotic and conflicting nature of the real world, from false and conflicting information to shifting narratives and baselines to a constant exponential increase in monitoring volume, requiring visualizations that are capable of not only coping with this environment, but ideally conveying the resident uncertainty to end users.

There is a need not only for polished print-ready visualizations optimized for consumption by policymakers and the general public that summarize macro-level patterns, but also for interactive displays that allow hypothesis testing and live “speed of thought” exploration of massive multi-modality data.

For example, could one build a dashboard that displays macro-level “bursts” of unrest using the news data, bringing to bear the human rights data to highlight those pockets of unrest occurring in areas prone to human rights abuses, leveraging the academic literature to identify the underlying influencing factors and top academic experts on that area based on the citation graph, and finally using the television data to show how American domestic television is portraying the evolving situation.

Visualizations that combine the spatial, temporal, and network dimensions are of especial interest, as are those that examine the data in innovative ways that help uncover new and nonobvious underlying patterns. Visualizations that could ultimately be scripted/automated such that they could be updated automatically on a daily or real-time basis are also of particular interest, especially global dashboards.

Information on dataset(s) to be used:
The Africa and Middle East Global Knowledge Graph encodes a massive array of socio-cultural information, including ethnic and religious group mentions, and the underlying citation graph over more than 21 billion words of academic literature comprising the majority of output of the humanities and social sciences literature over Africa and the Middle East since 1945 (JSTOR+DTIC+CORE+CiteSeerX+CIA+Internet Archive). This dataset uniquely includes the full extracted list of all citations from each journal article, making it possible to explore and visualize the entire citation graph over this literature. See more at GDELT Blog announcement for the Africa and Middle East Global Knoweldge Graph.

The Human Rights Global Knowledge Graph encodes more than 110,000 documents from Amnesty International, FIDH, Human Rights Watch, ICC, ICG, US State, and the United Nations dating back to 1960 documenting human rights abuses across the world. See more at GDELT Blog announcement for the Human Rights Global Knowledge Graph.

American Television News special collection encodes over 2,200 emotions and themes over all 540,000 hours of English-language American television news broadcasts monitored by the Internet Archive since July 2010.

The news-based GDELT Event dataset is also available in Google BigQuery, though this requires signing up for a free Google account and can incur charges to you if you go above the free quota provided to you by Google each month.

Web-link to dataset(s):
All datasets can be downloaded at the links above, though most require substantial technical expertise to process. The GDELT Analysis Service can be used to perform basic visualization and create small-scale extracts from the primary news-based collections (the special collections are not yet supported).

Relevant publications, websites, etc.:
GDELT Project website provides an overview of the project.
GDELT Project Blog showcases a number of applications and information about the project.
GDELT Pilot Dashboards for Global Conflict that shows some examples of using the data in a mapping context.
GDELT Pilot Visualization of Emotions in American Television News showcases visualizing the emotional dimensions over time.
A YouTube video that brief introduces the project can be found in this presentation at the Google Developer Group DevFestDC conference.

Publication Notes:
No restrictions of any kind other than requesting that all work include a citation to the GDELT Project website. I would love to see students widely publish and disseminate their results and maximize visibility of what they are able to do with the data.



Client Name: Pedro Russo

Project Description (goal/scientific or practical value):

Ambition, a science fiction movie, it is the essence of what it means to be human, to attempt difficult things, to reach for seemingly impossible goals, to learn, adapt and evolve.

And at the heart of this film is Rosetta, European Space Agency’s real mission to rendezvous with, escort and land on a comet. A mission that began as a dream, but that after decades of planning, construction and flight through the Solar System, has arrived at its goal.

Its aim? To unlock the secrets hidden within the icy treasure chest for 4.6 billion years, to study its make-up and its history; & to search for clues as to our own origins.

From 100 km distance, to 50, 30 and then, defying all expectations, to just 10 km, Rosetta continues to captivate and intrigue with every image and every data packet returned.

In this project we will explore public's opinion about the mission through the content analysis of comments people made about Ambition.

Information on dataset(s) to be used:

Students will analyze and visualize the hundreds of comments about the short movie Ambition, which may be accessed using the YouTube API for developers

Documentation for retrieving comments from Youtube videos.

Web-link to dataset(s):
Comments for the Ambition video are viewable at:

Relevant publications, websites, etc.:
Ambition film website
Rosetta: the ambition to turn science fiction into science fact

Publication Notes:
No notes about publishing on the project were provided.

Visualizing Missing Species Interactions Data

Client Name: Jorrit Poelen

Project Description (goal/scientific or practical value)
Global Biotic Interactions (GloBI, is one of the largest (if not largest!) openly accessible linked (as in dataset that describes how, when, and where organisms interact. Various projects (e.g.; currently use the data to their benefit.

As the ecosystems that sustain life on earth are under increased stress due to human activity, a better understanding of how organisms interact with each other (e.g. predator-prey, parasite-host, pollinator-plant) is increasingly important for nature conservation efforts. In the past 2 years, IVMOOC students developed methods to visualize and browse openly available species interaction data. The challenge for this year is to visualize missing species interaction data (e.g., data gaps, incomplete records). Missing interaction records may be inferred from species occurrence records or species distribution models under the basic assumption that if an organism occurs in a specific location, it will interact with at least one other organism.

With the outcome of this project, current and future research efforts can be directed to focus on under-studied organisms or locations. Ideally, the resulting visualization can be automatically updated whenever new data becomes available, but creating a static prototype would already be a huge leap forward for the eco-informatics community

Information on dataset(s) to be used:
The primary data set is provided through Global Biotic Interactions (GloBI, and consists of open-access, integrated, species interaction datasets. Secondary datasets might include Global Biodiversity Information Facility (GBIF), Integrated Digitized Biocollections (iDigBio) or similar open-access projects to provide additional information on where and when individual species occurred.

The dataset can be accessed in various ways:

  • using a web API,
  • R package,
  • JavaScript library,
  • Various data archives and online databases (e.g., neo4j; fuseki).

Web-link to dataset(s):
GloBI Biological Species Interactions Datasets

Relevant publications, websites, etc.:
Global Biotic Interactions Project
Global Biotic Interactions Project Blog
GloBI Github Repository
Jorrit H. Poelen, James D. Simons and Chris J. Mungall. (2014). Global Biotic Interactions: An open infrastructure to share and analyze species-interaction datasets. Ecological Informatics.

Publication Notes:
You are free to publish results however you'd like. Attribution of the data sources and associated contributors is highly encouraged.

GloBI Data is licensed under


Indiana University Libraries logo

Visualizing a Research Library: An Exploratory Analysis of Indiana University's Library Collections

Client Name: Dr. Andrew Asher

Project Description (goal/scientific or practical value):
The Indiana University Bloomington (IUB) Libraries represent one of the largest collection of research materials in the United States, with over 10 million volumes and an annual collections budget of approximately $20 million.

This project will help library collection managers’ better serve the Libraries' diverse constituencies by providing analysis of how IUB's collections are used and recommending areas in which to focus future resources. Specifically, we seek to answer the following questions: What are the current strengths and weaknesses of the IUB libraries collections? What are the disciplinary differences in usage patterns? Are there meaningful clusters of usage that suggest links between disciplines or subject areas? Are there patterns in past usage that can be used to predict future usage, and by extension plan purchasing decisions? As this is an exploratory analysis, students may develop additional research questions during the course of the project.

Information on dataset(s) to be used:
The dataset represents the holdings of the Indiana University Bloomington libraries. Each line represents one item and its associated metadata (e.g. subject area, format, language, last circulation date, etc.), with about 10 million records total.

Web-link to dataset(s):

Indiana University Library Collections data set

Relevant publications, websites, etc.:
Visualizing Columbia University's Library Data
Ted Polly and Brianna Marshall visualizations for Wabash College's library

Publication Notes:

I would like to approve the project results before they are presented publicly, however I would expect an outcome of this project to be project website that students could link to on their resumes. After the project is complete, I would like to work with students to present the results as a co-authored conference presentation or journal article.



The Millennium Project

Client Name: Jerome Glenn

Project Description (goal/scientific or practical value): Global participatory think tank to improve thinking about the future and what should be done today using the State of the Future Index, Global Futures Intelligence System, State of the Future reports, a series of global futures research since 1996. Please see the below listed relevant publications, websites for details.

Web-Link to dataset(s):

Please contact course instructors to gain access to the data.

Information on dataset(s) to be used:

Excel files of 28 variables over 20 years and ten year projections for each integrated into a State of the Future Index. Data variables comprise among others:

  • Internet Users (per 100 people)
  • Population growth (annual %)
  • Life expectancy at birth (years)
  • Mortality rate, infant (per 1,000 live births)
  • Prevalence of undernourishment (% population)
  • Health expenditure per capita (US$)
  • Physicians (per 1,000 people)
  • Improved water sources (% population with access)

Relevant Publications, websites, etc.:

Publication Notes:
Free to publish, we ask that it can be linked to the Global Futures Intelligence System and visualizations as shared usage with The Millennium Project.



Mapping Indiana University International Collaborations

Client Name: Lynn Schoch, Office of the Vice President for International Affairs

Project Description (goal/scientific or practical value):

Use Web of Science data to determine extent, variety, and patterns of the international collaborations of IU faculty based on authorship of scholarly articles. The results will be published as part of the IU International Fact Book, which examines the university's international activity by campus, school and department, discipline, and region of the world.

Information on dataset(s) to be used:
IU faculty publications retrieved from the complete set of Thomson Reuters’ Web of Science XML raw data (Web of Knowledge version 5) retrieved from IUNI Web of Science data enclave (

Web-Link to dataset(s):
Data set coming soon.

Relevant Publications, websites, etc.:

IU World Factbook for 2014

Publication Notes:
Data can be freely used and students can claim credit or publish the results.



What's in a Name?

Client Name: Dr. Shelley Arvin and Dr. Heather Rayl

Project Description (goal/scientific or practical value):
Spinal and bulbar muscular atrophy (SBMA) is a rare X-linked neuromuscular disease with 206 known name variants that identify it. With no known cure, the adult male patients experience long-term progressive muscular weakness.

The most commonly used name for SBMA retrieves only about 40% of the literature. This complicates searches as researchers attempt to monitor SBMA research studies. Is one name rising to ascendancy in usage? Are some of the many names no longer in use and need not be included in a current search strategy? Has there been a change since the initial research study? Are multiple names typical in the development of a name for rare or emerging diseases? What do searchers need to know about naming to effectively track the literature for this and other rare and emerging diseases?

Information on dataset(s) to be used:
"Arvin SBMA Name Search" is maintained by Arvin and includes records from PubMed that identify the disease by name variant(s) in either the title and/or abstract. (The complete, comprehensive, up-to-date dataset that includes bibliographic records from multiple science and medical databases is held and maintained by Shelley Arvin and Heather Rayl in the EndNote software but there are potential copyright violations when sharing database records across institutions. This is being investigated and evaluated.)

Web-link to dataset(s):
Arvin SBMA Name Search publication list (downloadable in XML)

Relevant publications, websites, etc.:
Arvin SD. Analysis of inconsistencies in terminology of spinal and bulbar muscular atrophy (SBMA) and its effect upon retrieval of research. Journal of the Medical Libraries Association. 2013;101(2):147-150

Publication Notes:
IBRC clients request co-authorship on publications, and approval of results by client before publication and communication during the life of the project.



Press Coverage of NAMS Position Statement

Client Name: Janet Carpenter

Project Description (goal/scientific or practical value):
The paper “Nonhormonal management of menopause-associated vasomotor symptoms: 2015 position statement of The North American Menopause Society” published in the Menopause journal on Sep 21, 2015 was covered by more than 100 news outlets.

The goal of this project is to analyze and visualize exactly how the paper’s content diffused over time, geospatial space, and topical space. Did different scholarly facts spread via alternative diffusion pathways? Where there bursts of activity? Do some news outlets act as superspreaders? How did the message content and sentiment change over time?

Information on dataset(s) to be used:
Listing of relevant web pages and news stations is provided. Students are encouraged to expand these lists.

Web-Link to dataset(s):

Relevant Publications, websites, etc.:
Nonhormonal management of menopause-associated vasomotor symptoms: 2015 position statement of The North American Menopause Society. (2015). Menopause (New York, N.Y.), 22(11), 1155–1174.

Publication Notes:
If any work resulting from this work is suitable for publication, it will be pursued as such. Students who were/remain actively involved with the project will be included as co-authors.



Visualization of the Occupational Therapy Researcher Database

Client Name:Julie Bass, Director of Research, AOTF

Project Description (goal/scientific or practical value):
The American Occupational Therapy Foundation (AOTF) recently identified asset mapping of our scientists/researchers as a strategic goal for future research initiatives. At this time, we do not have a comprehensive understanding of our scientific community as is necessary for building scientific networks, identifying scientific leaders for specific initiatives, or summarizing our capacity to external stakeholder groups. A few years ago, a researcher database was launched with our partner, the American Occupational Therapy Association (AOTA). Before we proceed to larger asset mapping projects, it is timely to conduct analyses on the existing researcher database.


  1. Complete a temporal analysis and project analysis (diagnosis, ICF, categories) and identify times and project areas where there are bursts of activity.
  2. Complete a geospatial analysis and map to identify potential collaboration links between institutions.
  3. Complete a topical analysis using text associated with projects and researchers and map the evolving topic space.
  4. Complete a network analysis and identify bimodal networks of experts and their institutions/expertise areas, projects and diagnosis/ICF/categories then visualize and animate these over time.

Information on dataset(s) to be used:
A summary of variables in the dataset include:
Researcher Profile: Name, ResearcherID, Institution, Underrepresented Group

Research Project: ResearcherID, ProjectID, Start Date, Keywords, Principal Investigator, Project Title, NIH Funding (Division, Grant #, Direct Costs, Total Funding), Other Federal Funding (Division, Grant #, Direct Costs, Total Funding), Non Federal Funding (Division, Grant #, Direct Costs, Total Funding)

Research Project Details: Diagnoses, Ages, Practice Settings, ICF Category, Research Agenda Category

Web-link to dataset(s):
Dr. Börner reviewed a sample dataset in MS Excel format and indicated it was appropriate for the purposes of student projects in the class. The final dataset will be prepared by the AOTA prior to the beginning of the class.

Relevant publications, websites, etc.:
American Occupational Therapy Foundation
American Occupational Therapy Association: Researcher Database

Publication Notes:
The AOTF and AOTA support opportunities for students to publish results and/or add project results to their resume. Because this is a new initiative, the AOTF and AOTA request additional time to develop the conditions for these activities. Conditions will be shared with students prior to the start of the class.



Social Network Analysis of “Tutor/Mentor Networking” Conferences

Client Name: Daniel F. Bassill

Project Description (goal/scientific or practical value):

Since 1993, I've used GIS maps and a variety of visualizations to communicate ideas and strategies that leaders throughout the Chicago region could use to build and sustain non-school tutor/mentor programs that connect inner city youth and adult volunteers and help those youth move through school into jobs and careers. Since 2006, interns from a variety of universities, including IU, have created new interpretations of my work, applying their own talent.

I use these in blogs, web sites, social media to influence how others use their own time, talent and dollars to help kids in Chicago and other cities. In doing so, I demonstrate how a small group of people can influence actions of others throughout the world, by how they communicate their ideas on the Internet.

This project is a continuation of work started during the 2015 IVMOOC. It asks students to create an information visualization that shows participation in Tutor/Mentor Networking Conferences held in Chicago since 1994. The goal is to show growth in participation in Teacher/Mentor conference over time and look at how organizations and job titles function within the network. The visualizations and analysis will help participants and organizers find ways to connect with each other after our conferences. Work done will be shared with others who organize conferences and events.

Information on dataset(s) to be used:
The data to be used for this project was created by a volunteer in 2010. Their description of the initial project and analysis may be found at their blog. The data is composed of conference attendance lists for most of the Tutor/Mentor Conferences held in Chicago from May 1994 through Nov. 2014. The data comes in 42 Google Sheets files which are easily converted into CSV documents.

The 2015 IVMOOC team helped clean up data from 42 different Excel spreadsheets.. I'd like the 2016 team(s) to use the cleaned Excel files and spend their time creating social network analysis (SNA) visualizations that show conference participation. I'd like to focus on SNA rather than GIS, although both are important. I'd also like the team to create an interactive SNA format where layers of information can be turned on and off.

Web-link to dataset(s):
Dataset on Google Drive

Relevant publications, websites, etc.:

Final report from the 2015 team, I'd like this to be a starting point for the 2016 team.

This is a blog article where I shared work done by the IVMOOC team.
Browse other articles to see more uses of maps and visualizations.

Publication Notes:
I encourage publication and sharing of this information, with myself included as much as possible, since my goal is that anyone who is hosting conferences and convening groups of people will learn to map participation using SNA and GIS as part of the on-going network building needed to build movements capable of solving complex social, health and environmental problems.



CoBRA: Comic Book Readership Archive

Client Name: Dr. John Walsh

Project Description (goal/scientific or practical value):

The Comic Book Readership Archive project (CoBRA), proposes to build a digital archive—of primary source material and related data sets—to document American comic book readership and fandom. The archive will include content from such sources as: fan mail, fan club publications and membership rolls, contests sponsored by publishers and fan clubs, fanzines, and programs and attendee records from comic book conventions and similar events.

Comics scholarship is an established area of academic research and the subject of thousands of dissertations, journal articles, book chapters, monographs, and digital projects. Comics readership has been a specific target of scholarly attention. However, previous studies have not fully considered the vast documentary record of comic book readership that will be compiled and analyzed in the CoBRA project.

In the “Bibliographic Essay” concluding his study, Of Comics and Men: A Cultural History of American Comic Books, Jean-Paul Gabilliet writes: “fan mail constitutes a largely unexplored source of information about the reception of characters, stories and creators.”

The CoBRA project will address this gap in comics scholarship by providing access to a large and growing archive for the study of comic book readership, including fan mail. Our archive will allow new research questions to be asked and will enable new forms of research, such as interactive maps, timelines, other information visualizations, and computationally-assisted content and data analysis. Possible research questions include:

  • Who are the most frequently published fan mail writers? How common is it for individuals to have more than one letter published?
  • Can we observe trends among prolific letter writers. Are they following particular characters, titles, writers, artists?
  • What is the gender breakdown among identified readers? Are there significant differences across genres, titles, and years?

What are the geographic sources (cities, states, and countries ) of fan mail. Are there significant differences across genres, titles, and years?

Information on dataset(s) to be used:

To date we have generated 6500 mySQL records of comic book publications. The records include:

  • name of letter author / fan club member
  • street address
  • city
  • state
  • postal code / zip code
  • country
  • authorative information about each source from the Grand Comics Database <>
  • Series title
  • Issue number
  • Publication date
  • Publisher
  • Notes (occasional notes include details about well-known figures, or interesting information about individual: e.g., college student, faculty member, military personnel, etc.)

By the end of our current grant period in January, 2016, we anticipate having over 10,000 records in the CoBRA data set. In addition to authors of fan mail, the data set will grow to include the names, cities, and states of fan club members whose names were published in the monthly comics in regular lists of new members.

Web-link to dataset(s):

Relevant publications, websites, etc.:
Comic Book Readership Archive
Comic Book Readership Archive project description
Gabilliet, J.-P. (2010). Of comics and men: A cultural history of American comic books. Jackson, MS: University of Mississippi Press, 2010. p. 364.

Conference presentation:

Related project: and
Related publication:

Publication Notes:
Students are free to publish the results of their work, preferably in an open access publication, and to include their work on their résumés. I would like to be included as a co-author and have a role in reviewing the results.



Globalization of the United States, 1789-1861

Client Name: Konstantin Dierks

Project Description (goal/scientific or practical value):

The aim of this historical GIS project is to project historical data onto historical world basemaps, with an interactive menu to enable users to choose data, and with an interactive timeline to enable users to choose a particular historical moment or interval. A second aim is to create D3 visualizations in dynamic relation to the digital map.
The scientific value of the project is to achieve effective presentation of historical GIS, since modern basemaps do not suit historical data. The practical value is to achieve clear presentation of multiple data variables simultaneously on a digital map as well as a data visualization.

Information on dataset(s) to be used:
The dataset concerns diplomatic and military activities in the wider world conducted from the United States between the American Revolution and the American Civil War. (Historical basemaps can be found online.)

Web-link to dataset(s):

Relevant publications, websites, etc.:
Two interesting websites using historical basemaps:

Publication Notes:
In consultation with and with permission from me, as principal investigator for an ongoing digital history project.