Client Projects

Visualizing GDELT | Learning Analytics for PLAE Project | Mapping the Knowledge Landscape of Design
THOR: Uptake and Reach of Data and Researcher PID Services | Visualizing User Behavior on the Places & Spaces Website
Visualizing IU e-Text Usage and Achievement | GloBI: Social Graph of Life on Earth | Scientometric Mapping of Interpreting
Globalization of the United States, 1789-1861


Visualizing Our Global World

Client Name: The GDELT Project

Project Description (goal/scientific or practical value):
The GDELT Visual Global Knowledge Graph explores the visual narratives of the world's online news media imagery. Each day it samples up to 700,000 online news images from across the planet and processes them through Google's Cloud Vision deep learning algorithms to catalog the objects, activities, facial emotions, text OCR, violence levels and even location estimation and codify this as a massive graph over the world's visual representations. We'd love to see what can be done to understand visual portrayals across the world.

Information on dataset(s) to be used:
GDELT Visual Global Knowledge Graph ( A new snapshot can be made that has the full 175 million image dataset.

Relevant publications, websites, etc.:
GDELT Project website provides an overview of the project.
GDELT Project Blog showcases a number of applications and information about the project.
Blog post annoucing the Visual Global Knowledge Graph data set.
A YouTube video that brief introduces the project can be found in this presentation at the Google Developer Group DevFestDC conference.

Publication Notes:
No restrictions of any kind other than requesting that all work include a citation to the GDELT Project website. I would love to see students widely publish and disseminate their results and maximize visibility of what they are able to do with the data.

Photo showing STEP Project smart classroom set-up that tracks student movement.

Learning Analytics for PLAE Project

Client Name: Joshua Danish

Project Description (goal/scientific or practical value):
The PLAE Project ( explores how young students can learn about complex science concepts by engaging in embodied play activities that are tracked using the OpenPTrack system ( Their actions are incorporated into a computer simulation, and students also have an opportunity to annotate the simulation using an iPad app. In the next iteration, our goal is to begin analyzing the log data of students' motion along with key events from the simulation to enhance our prior analyses using pre/post measures and video data. Being able to visualize this kind of information will advance both the current research focus on learning, and also the field's attempts to use multimodal learning analytics.

Information on dataset(s) to be used:
Our goal is to be able to analyze log data generated by this system. Depending on the timing, we may be working with simulated log data created by adults with the hope of re-using any scripts/tools later once data is collected with students. Either way, data will be de-identified and thus shareable. Data will consist of two streams of Excel readable data tables where one includes: tracking positions of individuals within the system generated by the OpenPTrack system, and the other includes simulation generated events. The data will need to be stitched together based on entity ID and timestamp.

Relevant publications, websites, etc.:
PLAE is an extension of the STEP project:
A video can be seen here:

Publication Notes:
Analysis of students movements will need to be discussed with the PLAE team, but presentations about the visualization and similar can be done so long as the PLAE project is acknowledged.

Mapping the Knowledge Landscape of Design

Client Name: Dr. Eswaran Subrahmanian

Project Description (goal/scientific or practical value):
The goal of this project is to trace the history of research and debate over the last three decades in areas of “design”. Design being a very vast subject, the work has both practical and scientific value to the diverse community to see the scope of design issues and research. This would allow the community to possibly come together. Students are invited to analyze and visualize the temporal growth and bursts of activity, to map evolution of collaboration networks, to overlay the data on a map of science so that changes in topical coverage can be understood and communicated.

Information on dataset(s) to be used:
Data for this project includes a publication records collected related to design and a set of Design Society documents. Publications are collected from 32 design journals was retrieved from the IUNI Web of Science dataset. It can be shared for the purposes of this project, but cannot be distributed beyond it.

Relevant Publications, websites, etc.:
Website for Dr. Eswaran Subrahmanian

Publication Notes:
Students may use the project result in their resume and portfolios; for publications, I would like to approve the results and be co-author.

THOR Project Logo

THOR - Uptake and Reach of Data and Researcher PID Services

Client Name: THOR Project

Project Description (goal/scientific or practical value):
The EU-funded THOR project (Technical and Human Infrastructure for Open Research) aims to improve the interoperability of persistent identifiers (PIDs) for data and researchers, so that every researcher in Europe can find and connect to her own data and that of her peers. As part of this project, we are monitoring the landscape of PIDs for trends that can help us with our outreach efforts. While we have access to the basic metadata associated with PIDs, we are missing out on additional insights that can only be gained through a more comprehensive study of all the available metadata. While we’re undertaking the broader study ourselves, we’re interested in the insights information visualization students might have on a subset of the data. Fresh eyes could lead to additional points of view.

Specifically, the project we are proposing is an assessment of the reach and uptake of PID services for researchers within and across geographical regions. Students will evaluate the ORCID public data files from recent years for insights on the geographical distribution of researchers with ORCID IDs and how that relates to other aspects of the researcher profiles.

Information on dataset(s) to be used:
The ORCID public data file contains all the information in the ORCID system that is marked as public. This is information about researchers, such as their institutional affiliation and their list of attributed works.

More information in the blog post accompanying this year’s data release (, or in the ORCID data use policy (

Relevant Publications, websites, etc.:
THOR project:
ORCID Project Mission:

Publication Notes:
Results should be approved by the THOR project team prior to publication. Publication is encouraged as long as it includes attribution of the data used (as per the ORCID public data file’s use policy), the THOR project, and any associated contributors. The THOR project may reuse the outputs of this work in future project deliverables, and will credit the students accordingly.

Places and Space Website Logo

Visualizing User Behavior on the Places & Spaces Website

Client Name: Lisel Record and Mike Gallant

Project Description (goal/scientific or practical value):
The Places & Spaces: Mapping Science exhibit introduce science mapping techniques to the general public and to experts across disciplines for educational, scientific, and practical purposes. It is meant to inspire cross-disciplinary discussion on how to best track and communicate scholarly activity and scientific progress on a global scale. The exhibit website provides information about people behind the exhibit; showcases maps and macroscopes; lists past, present and planned exhibit venues; but also links to publications, the store, news and contact information, etc.

The website underwent a redesign in 2015 to update the organization and user interface of the website and the exhibit curators are interested to understand the impact of this redesign. Analysis and visualization should include: descriptive statistics of visitor demographics and sessions, page visits, and document downloads; geospatial origin of visits; temporal analysis of site traffic, e.g., do burst of activity correlate with press or venue events. Students may suggest other analysis and visualizations. Results may take the form of static or interactive visualizations and/or dashboard tool.

Information on dataset(s) to be used:
The data set consists of Monthly Webalizer reports for 10 years—covering March 2007 to February 2017—for the website will be provided in html format.

Relevant Publications, websites, etc.:
Places and Spaces: Mapping Science Exhibit web page

Publication Notes:
Students can publish approved results. Students can add the project's results to their resumes.

Visualizing IU eText Usage and Achievement

Client Name: Dr. Serdar Abaci

Project Description (goal/scientific or practical value):
The IU e-text program is an ongoing initiative that offers a low-cost E-Texts platform for students and instructors to engage with relevant course material, share resources, and interact with peers. Since its inception within the IU system, over 33,000 users have used the platform and their interactions with the platform has been stored since 2012.

The project will use data collected in activity logs from the IU E-Texts program to help address the following research questions through statistical analysis and information visualizations:

  1. How does platform usage differ, both temporally and summatively, across departmental and disciplinary boundaries?
  2. How does platform usage differ, both temporally and summatively, by student performance?
  3. If differences in disciplinary and departmental use exist, do these differences influence student outcomes?
  4. Do past patterns of student and instructor use inform or indicate future uses of the tool?

Project Restrictions: Participation in this project is restricted to students enrolled at Indiana University.

Information on dataset(s) to be used:
The data set to be used in this project consists of several courses across several semesters contained in multiple tables. Several of the tables contain information of anonymized student records and activities within the E-Texts platform. These tables include the following components: student gender, class standing, course enrollment information, page views, annotation/markup activity (e.g., bookmarks, notes, questions), and grades. Other tables include instructor page views and annotations.

Relevant Publications, websites, etc.:
UITS E-Texts Information Page
Serdar Abaci, Anastasia S. Morrone, and Alan R. Dennis. (2015). Instructor Engagement with E-Texts. EDUCAUSE Review.
Reynol Junco and Clem, C. (2015). Predicting course outcomes with digital textbook usage data, The Internet and Higher Education, 27,

Publication Notes:
My team and I would like to approve final project results prior to any public presentation of these results. After the project is complete, my team and I would like to work with students to co-author a conference or journal publication.

GloBI: Social Graph of Life on Earth Beyond Humans

Client Name: Jorrit Poelen

Project Description (goal/scientific or practical value)
Social networking platforms document interactions between humans (Homo sapiens) with increasing detail. Results are interesting to advertisement and intelligence agencies, but what do we really know about the other organisms that occupy our planet?

Global Biotic Interactions (GloBI, is one of the largest (if not largest!) openly accessible linked (as in dataset that describes how, when, and where organisms interact. Various projects (e.g.; currently use the data to their benefit.

One of the challenges in improving and growing GloBI is to better visualize millions of interactions. Currently available tools allow to discover parts of the interaction graph, e.g., what do sea otters eat? However, few comprehensive visualizations exists that show the big picture, e.g., how do all species interact? This project asks students to use the GloBI datasets and develop methods to analyze millions of records and show species interaction patterns between thousands of organisms.

Information on dataset(s) to be used:
GloBI mines existing datasets that describe how organisms interact. On November 2016, GloBI includes about 2M interactions across over 100,000 taxa (see This makes GloBI one of the largest, openly accessible resources of species interaction records available today.

The primary data set is provided through GloBI consists of open-access, integrated, species interaction datasets. Secondary datasets might include Global Biodiversity Information Facility (GBIF), Integrated Digitized Biocollections (iDigBio) or similar open-access projects to provide additional information on where and when individual species occurred.

GloBI data records are available in tsv (tab separated values) files, n-quads, web API, R package and JavaScript libraries), among others.

Relevant publications, websites, etc.:
Global Biotic Interactions Project
Global Biotic Interactions Project Blog
GloBI Github Repository
Jorrit H. Poelen, James D. Simons and Chris J. Mungall. (2014). Global Biotic Interactions: An open infrastructure to share and analyze species-interaction datasets. Ecological Informatics.

Publication Notes:
You are free to publish results however you'd like. Attribution of the data sources and associated contributors is highly encouraged. GloBI data is licensed under

Client sponsor:
Jorrit Poelen lives and works in Oakland, CA and is a freelance software engineer. In the last 15+ years, Jorrit has been active in academic, government, corporate, and start-up settings across fields like neuroscience, health informatics, and financial risk management and, more recently, ecological informatics. Current project include,, and

Scientometric Mapping of Interpreting

Client Name: Lluis Baixauli-Olmos

Project Description (goal/scientific or practical value):
Although science mapping is relatively established in other domains, the field of “interpreting” has not looked at itself from a scientometric perspective. The project is a first step toward gaining first insights into this academic field by analyzing the academic articles published in the main journal in the field, Interpreting, using scientometric analysis methods and information visualization.

Interpreting is an interdisciplinary journal that publishes research and debate on all aspects of interpreting, in its various modes, modalities (spoken and signed) and settings (conferences, media, courtroom, healthcare and others).

Students will use the citation metadata for 182 articles published in Interpreting from between 2007 and 2016 collected from Scopus. Students will create visualizations that focus on mapping the topic areas covered in the journal, drawing a historical evolution of topics in the field of interpretation, and network visualizations showing collaborations between authors.

Information on dataset(s) to be used:

The project will use citation metadata collected from articles published in Interpreting, and collected from the online database Scopus. The citations include the fields:

  • Authors
  • Title
  • Year
  • Cited by
  • Affiliations
  • Authors with affiliations
  • Abstract
  • Author Keywords
  • References

Publication Notes:
The project sponsor will be a co-author on any publications from this project.

Globalization of the United States, 1789-1861

Client Name: Konstantin Dierks
Project Description (goal/scientific or practical value):
The aim of this historical GIS project is to project historical data onto historical world basemaps, with an interactive menu to enable users to choose data, and with an interactive timeline to enable users to choose a particular historical moment or interval. A second aim is to create D3 visualizations in dynamic relation to the digital map.

The scientific value of the project is to achieve effective presentation of historical GIS, since modern basemaps do not suit historical data. The practical value is to achieve clear presentation of multiple data variables simultaneously on a digital map as well as a data visualization.

Information on dataset(s) to be used:
The dataset concerns diplomatic and military activities in the wider world conducted from the United States between the American Revolution and the American Civil War. (Historical basemaps can be found online.)

Relevant publications, websites, etc.:
Two interesting websites using historical basemaps:

Publication Notes:
The project client requests approval for any publications that come from the results.