Website screen shot and heatmap overlay comparison

Visualizing the Evolution of Website Design

With over 25 years of history, the web itself has become a significant cultural artifact. We are studying how website design has changed over time, and how these changes reflect changes in culture, technology, aesthetics, etc. We are studying these changes through both user studies (e.g., interviewing web designers) and through automated analysis using computer vision and machine learning. We would like help in creating visualizations to explore and discover patterns of web design changes over time.

About Dataset

The dataset for this project uses screenshots from several hundred websites over a period of about 15 years. The dataset is derived from the Internet Archive’s Wayback Machine using a web scraper.

Publication Notes

The project sponsor would be very happy if students get publishable results and would be happy to co-author a paper with them if they choose to do so. Students may freely mention results on resumes.


Dr. David J. Crandall,
IU Computer Vision Lab

Dr. David J. Crandall is an Associate Professor at the School of Informatics, Computing, and Engineering at Indiana University where he directs the IU Computer Vision Lab. Dr. Crandall works in computer vision, the area of computer science concerned with automatically inferring semantic meaning from images – teaching computers to “see.”

Logo: GloBI

Visualizing Research Silos in Ecological Interaction datasets

Open access to high quality and integrated ecological datasets is important to better understand and preserve the ecosystems that sustain life on earth. The goal of this project is to visualize and identify research Silos (or overlap) in geospatial temporal species interaction datasets (e.g., bees pollinate plants, sea otters eat crabs, and the black plague is spread by rodents). The scientific and practical value of this is to help researchers, and their funders, identify collaboration opportunities in an effort to increase the quality of datasets while avoiding duplicate work.

Even though acquiring ecological data is resource intensive and time sensitive, anecdotal evidence suggests that research efforts in ecology are siloed. These siloed research efforts might explain the sparseness (aka the “Eltonian Shortfall”) of openly available datasets that describe how organisms interact in our global ecosystems.

About Dataset

GloBI mines existing datasets that describe how organisms interact. On November 2017, GloBI includes about 2M interactions across over 100,000 taxa (see // This makes GloBI one of the largest, openly accessible resources of species interaction records available today.

The primary dataset is provided through GloBI consists of open-access, integrated, species interaction datasets. Secondary datasets include Global Biodiversity Information Facility (GBIF), Integrated Digitized Biocollections (iDigBio), or similar open-access projects that provide additional information on where and when individual species occurred.

GloBI data records are available in tsv (tab separated values) files, n-quads, web API, R package and JavaScript libraries), among others.

Publication Notes

Students may publish the results as they please; however, the project sponsor strongly encourages open access publications. For examples of prior student publications, please see previous IVMOOC projects at:


Jorrit Poelen

Jorrit Poelen lives and works in Oakland, CA and is a freelance software engineer. In the last 15+ years, Jorrit has been active in academic, government, corporate, and start-up settings across fields like neuroscience, health informatics, and financial risk management and, more recently, ecological informatics. Current project include //, //, and //

Logo: ChaCha

ChaCha Menopause queries

The ChaCha menopause query data is the foundation for building intervention modules to improve people’s knowledge and problem solving skills related to menopause. For this project, students work with the data to identify topics that people want to know about related to menopause. My team will use the results to build intervention modules on each topic, and apply for NIH R21 funding to pilot test their impact, and then get NIH R01 funding to test their efficacy and/or effectiveness. I envision an intervention for menopausal women (ages 42-62, female) that would include modules around what their teenager, male partner/spouse, or female family members/friends might also want to know.

A secondary purpose of the project is to identify what the general public who are not women aged 42-62 (all other ages and genders) want to know about menopause, which will be reported back to the North American Menopause Society to improve their educational website materials.

About Dataset

In 2015, our laboratory became the only research team in the world with access to raw data from ChaCha for the years 2009 to 2012. ChaCha was a United States-based company that let users anonymously submit questions and receive a human-guided, real-time, anonymous, and verified answer.

Between 2009 and 2012, users submitted their questions via texts or the Web. After several queries from the same user, ChaCha asked the user to provide their age, sex, and zip code, but the information was not required. ChaCha is not a social media platform and its anonymity appears to allow users to ask unusual and potentially embarrassing or stigmatizing questions.

The ChaCha database contains 1.93 billion complete questions asked between January 2009 and November, 2012 by 19.3 million users. A menopause-related subset was identified by searching for questions containing a set of key words. We started with an initial set of keywords that were expanded based on our initial review of the data and variations in spelling of terms. The final set of keywords we used can be grouped into five broad topics: (1) menopause/menopausal status; (2) hot flash symptoms; (3) medications; (4) surgery; (5) specialty provider.

Our searches using these keywords resulted in 263,363 menopause-related queries in total (user age 13+) and 5,892 queries from women aged 40 to 62 years. The ChaCha user query data is provided to students in an Excel workbook for students to use. Students working on this project will be invited to be “collaborators” for the IU Box folder.

Publication Notes

The project sponsor would like to be a co-author for any publications that result from student projects using the dataset provided.


Dr. Janet S. Carpenter,
IU Bloomington

Dr. Janet Carpenter’s research has focused on oncology and women’s health, specifically the measurement, mechanisms, and management of menopausal symptoms in cancer survivors and midlife women without cancer. Dr. Carpenter has expertise in developing and testing measures (self-reported and physiological), theory-building, and the design and conduct of multi-site randomized controlled trials testing behavioral, pharmaceutical, and nutraceutical interventions. She has extensive experience in scientific writing and the review and critique of grant applications (e.g., NIH, ONS, ACS, DOD).

Her IVMOOC client project in 2017 resulted in a scholarly publication in a prominent journal, see //

Portrait: Dr. Chen X. Chen

Text-Mining of User-Generated Queries on Menstrual Pain

Menstrual pain is highly prevalent among women of reproductive age. By text-mining of user-generated queries, we try to understand the public’s information needs and concerns related to menstrual pain. The insights gained from this work will be used to develop interventions to support menstrual pain management. The specific goals for this project are to: (1) visualize the text data; (2) cluster and categorize questions; (3) summarize patterns and themes.

About Dataset

The data to be mined are user-generated questions (i.e., text data) from ChaCha search engine (see ChaCha Menopause Details description for more details about the data set). The ChaCha query results were analyzed to identify all queries that include questions on menstrual pain; the results were split into two subsets. The first subset contains about 507,000 menstrual pain-related queries from females; the second contains about 114,000 menstrual pain-related queries from males. Demographic information (e.g., age, gender, geographical location) is available for some users. The ChaCha user query data is provided to students in an Excel workbook for students to use. Students working on this project will be invited to be “collaborators” for the IU Box folder.

Publication Notes

Students are welcome to add results to their resume. Prior to publishing the results, review and approval from me is required. In addition, I request to be a co-author for presentations and papers related to this project.


Dr. Chen X. Chen

Dr. Chen’s program of research focuses on management of dysmenorrhea. Characterized by menstrual pain, dysmenorrhea is a prevalent pain condition among women of reproductive age that puts women at higher risk for developing other chronic pain conditions later in life. Dr. Chen’s goals for this program of research are to support dysmenorrhea management, improve women’s quality of life, and to some extent, reduce the risk for developing future pain conditions among affected women. In addition to building her program of research, Dr. Chen has been engaged in interdisciplinary collaboration in the areas of pain and symptom science, women’s health, and complementary integrative health. Her areas of methodological expertise include advanced psychometrics, survey research, and meta-analysis.

Logo: Biosim


BioSim is a participatory simulation where young students (grades K-3) enact the roles of ants and biological systems through the assistance of electronically-enhanced e-puppets. It is designed to enhance youths’ understanding of complex systems though novel combinations of play, reflection, interaction and exploration.

This project uses data generated by children while playing with electronic ant puppets within a defined indoor area. The visualizations should show (1) where ants travel during one game and (2) which students and teacher answer the question “where did the ants go the most?” The visualization must help school teachers to easily show and explain to young children which team of ants were searching food more efficiently and why.

About Dataset

The project will use data collected at an elementary school in Spring 2017. The data is recorded in plain JSON files. Each file captures the activity from one game, the trail/position data is stored in the “allActions” attribute with each “checkTrack” action. For more detailed explanation please contact the client directly. Data for this project is available through GitHub:

Publication Notes

The project sponsor would like to be a co-author for any publications that result from student projects using the dataset provided.


BioSim Team

The project is a collaboration of Drs. Kylie Peppler’s Creativity Labs and Joshua Danish’s Representations Activity Play and Technology (RAPT) lab, along with Dr. Armin Moczek of the Biology Department, all at Indiana University in Bloomington.

Logo: BLM

Visualizing Government Meetings

Currently the City of Bloomington has 33 boards and commissions, in addition to the ongoing meetings of the Bloomington City Council. Currently the only way for a person to understand what happens from meeting to meeting is to navigate our city website and read minutes.

We would like to work with students from the IVMOOC to it make easier to understand Bloomington City Council meetings, which includes meeting participants, dates, topics, and the policy and legislative outcomes through information visualizations. This will have enormous value for the public by increasing government transparency and hopefully making it easier for folks to follow along with what is going on city hall and community of Bloomington.

About Dataset

The data for this project is publicly available through the City of Bloomington website. Each meeting record includes the type of meeting, date and a variety of documents related to these meetings. These documents may include: meeting agenda, minutes of speakers at a meeting, legislation packets, and non-public meeting memos. The documents are unstructured data provided as PDFs.

Relevant Resources

Publication Notes

The information is all available in the public domain, students are free to publish or present the outcomes of the project how they would like. Ideally, we would love to implement their work onto the city website site or via a public website to make the work accessible to the public.


Thomas Miller

Thomas Miller is the city's Director of Innovation, who works directly for Mayor John Hamilton; his work focuses on improving government service delivery and process.

Logo: BookTrust

Visualizing Student Book Choice

Book Trust is a non-profit organization that serves 53,000 students across 21 US states by providing a stipend each month for the students to choose their own books out of the Scholastic catalogue. We currently receive monthly data from Scholastic that shows us the books kids are choosing, but we don’t have a way to visualize that data and act upon it.

We are interested in developing visualization products and resources to help teachers understand what books their students select and potentially make recommendations. The visualizations should be developed with Tableau dashboard software.

About Dataset

We receive a txt file drop from Scholastic that is uploaded into our custom web-based application. We then upload data into Tableau for easy consumption by staff. Data will be accessed through our custom web-application that requires login for access, due to sensitive student information. Once the project team is approved, we will give students a login to access the dataset.

Relevant Resources

Publication Notes

The project sponsor requires approval of the results, and depending on the evolution of the project and our staff’s involvement, potentially a co-authorship on any publications.


Erika Weiss

Erika Weiss is the Vice President of Programs for BookTrust, and works to empower children from low-income families to choose and buy their own books-throughout the school year-with a focus on book choice and ownership. The goal is increasing children’s literacy skills and fostering life-long learning.

Logo: Maker Ed

Visualizing Student Makerspace Portfolios

The Open Portfolio Project (OPP) aims to develop a common framework for documenting, sharing, and assessing learning through portfolios. Open portfolios are openly networked, decentralized, and distributed systems of documentation, curation, and reflection, which can showcase a learner’s abilities, interests, and voice in a way that test scores and grades cannot.

Portfolios provide opportunities for students and others to recognize the skills and ideas they have to offer and contribute, especially for students who may not excel in academics or high-stakes testing. Inherent to the creation of a portfolio is the process of reflecting on one’s work, curating what is most appropriate for an intended audience, and designing an artifact to articulate that evolution of learning and making.

About Dataset

This project uses survey data collected from youth-serving makerspaces across the United States, between 2014 and 2017. The data covers demographic information about youth and staff members, programs offered and how they relate to school subjects, as well as assessment approaches in makerspaces with a particular focus on open portfolio assessment.

Students working on this project might consider visualizations that show:

  • Participation in Open Portfolios project over geographic space, using youth and staff demographics, the annual diversity index and comparing school and out-of-school makerspaces annually;
  • Visualize the topics and subjects offered for school and out-of school makerspace, over time;
  • Interactive visualizations that let users explore assessment items in school and out-of-school makerspaces, by item type and overall performance
  • Portfolio assessment visualizations that show the impact of portfolio assessments and reasons for student portfolios that highlight the challenges of implementing, and published/not-published portfolios, comparing in-school and out-of-school makerspaces.

Publication Notes

The project sponsor requests to approve results and share co-authorship prior to publication, and would like to use the visualizations in our reports.


Creativity Labs: Open Portfolios

The project is sponsored by Anna Keune, with a background in participatory design of digital media learning tools in Europe and India. Anna is passionate about participatory making. Her work focuses on maker culture, documentation practices of youth at makerspaces across the US, and co-designing equitable curricular approaches for making.

Maker Ed is a national non-profit organization that provides educators and institutions with the training, resources, and community of support they need to engaging, inclusive, and motivating learning experiences through maker-centered education. We work to make it possible for every educator in America-particularly those in underserved communities-to facilitate interactive, student-driven, and open-ended learning experiences for youth.

Logo: Maker Ed

Re-Crafting Mathematics Education

This collaborative project, titled “Re-Crafting Mathematics Education: Designing Tangible Manipulatives Rooted in Traditional Female Crafts,” is based on our prior work studying design, mathematics, and traditional female crafts, particularly when integrated with electronics in what we call “e-textiles.” As part of this new project, we plan to extend this earlier work to better understand how traditional female crafting practices can make far-reaching improvements in a range of learning outcomes in science, technology, engineering, and mathematics (STEM) education. These types of investigations will help reveal key issues underlying the underrepresentation of women and girls in lifelong STEM learning.

We are conducting a series of ethnographies of female crafting circles to better theorize the connections particularly between mathematics and traditional women’s crafts. In fact, we are coming to understand craft as a lived mathematical practice- that craft and mathematics are closely intertwined. Following our initial ethnographic field work, we’re planning to develop and test a set of new hands-on classroom manipulatives for schools and after-school programs (targeted at youth in grades 5-9).

About Dataset

We generated a list of codes and coded interviews with crafters and their craft projects in relation to inherent mathematics concepts used across a range of crafts. We would like to see visualizations of the way in which math concepts are represented across crafts.

A copy of the data will be provided to students as spreadsheet after the student group forms and contacts the project sponsor.

Publication Notes

The project sponsor would like to discuss any co-authorship or publications at the beginning of the project.


Creativity Labs

Logo: An Opioid-focused Suregery

Personalized Post-Surgery Opioid Risk Mitigation and Teaching

Nearly 60% of surgical patients get opioids when they leave the hospital. Of those prescribe opioids, 10% of those patients become chronic users of opioids and 1% overdose. Given that 50 million surgeries occur every year, the exposure to opioids is dramatic. Overprescribing and lack of knowledge about proper use and risk by patients are at least, in part, implicated. When these patients leave the hospital with an opioid prescription, there is no well-designed worksheet to tell them how to take their medication and the risks involved.

The goal of this project is to create a personalized, auto-generated, easily understood opioid data sheet that presents best practice as well as use and risk data by procedure. We have created a database of typical opioid requirements for patients for numerous surgical procedures. We also have a database of the misuse risks associated with refills by surgical procedure. We also have a datasheet with several of the best practices organized as a list as well as examples of previous datasheets that are clearly inadequate. Now, we want you to help us build a worksheet surgeons around the country can give their patients to inform them and protect them from opioid misuse.

About Dataset

Data for this project includes:

  1. Database of typical opioid needs by patient type and surgery.
  2. Database of risk of misuse by patient type, number of refills, and surgery.
  3. Datasheet of best practices.
The project sponsor will share this data with the team at the beginning of the project.

Publication Notes

The author would like to approve the any final publication, and be included as a co-author of the work.


An Opioid-focused surgeon

Logo: OwlSky Cloud platform

Visualization of Time-Resolved Electronic Crystal Structure Relaxations

Dynamics on an electronic level play a crucial role in most mass transport mechanisms within crystals, such as ionic conduction in energy materials. Large datasets of time-resolved electron densities can be calculated for these kinds of processes by means of modern quantum-chemical modeling techniques, like density functional theory. However, an understanding of the particular interactions within the transport process (i.e. the time-resolved breaking and formation of chemical bonds) remains tedious. A detailed visualization of ionic and electronic relaxations and electron density redistribution would greatly enhance today’s capabilities of data analysis and interpretation.

About Dataset

The data sets for this project comprise 10 time steps, each reflected by a scalar electron density map on a 72x72x120 data point grid for the 3D-periodic supercell and a respective list of atomic positional coordinates. Students may use the software tool Cinector GmbH, in addition to open codes such as the Python Mayavi package.

Publication Notes

The project sponsor would like to approve the results and be a co-author on any publication; however, students may use the results of the project in their resume and portfolios.


DENRelax and Institute of Experimental Physics,
TU Bergakademie Freiberg

Image: A cartoon person pedaling a stationary bike

30 sec of Power Pedaling - How much energy can be produced at what age?

Visitors to the Experimenta Science Center in Heilbronn Germany were asked to participate in an experiment to pedal on a stationary bike for 30 seconds. The data was collected to give people feedback on their performance in comparison to others of similar age and same sex. Data from nearly 200,000 users was collected and is now available to analyze. Students are invited to analyze and visualize physical performance compared to demographic features of participants such as age and sex. Students are welcome to develop and pursue their own ideas of what to do with the data.

About Dataset

The dataset consists of about 200,000 rows each representing a visitor and seven columns with information about them including date and time, age, sex, peak power, and energy “produced”.

Publication Notes

The project sponsor notes that students can publish approved results and add the project results to their resumes. Attribution of the data sources and associated contributors is highly encouraged.


Science Center Psychologist

Dr. Katrin Hille is a psychologist working for the Science Center experimenta Heilbronn. Her tasks include the collection and analyses of visitors’ data for exhibit operation and for improvement of visitor experience.

Logo: IN Management Performance Hub

Exploring Patent Data in the State of Indiana

The State of Indiana does not have consumable information around patents. The Indiana Economic Development Corporation (IEDC) is interested in visualizing patent data for the State of Indiana. The following items are of particular to IEDC in the patent data:

  • Patent industry alignment with IEDC target industries – Agriculture, Aerospace / Defense, Smart Transportation, Life Sciences, Energy / Materials. Advanced Manufacturing, Tech / Cyber Security/ Internet of Things
  • Patents that are available to be acted on by entrepreneurs.

The IEDC would like to have a visualization that allows for dynamic interactions and slicing / dicing of data related to patents. These visualizations may presented to various stakeholders (IEDC, venture capital groups, and universities) to support efforts by IEDC policy goals and initiatives, such as linking surplus or stagnant patents with entrepreneurs and funding to spur innovation and entrepreneurship in the State of Indiana.

About Dataset

Data for this project comes from the US Bureau of Labor Statistics reports on Indiana. The reports include data on: labor force, workplace injuries and illnesses, employer benefits and pay, consumer price indices, consumer spending, energy prices. Data may be accessed through reports, tabular data downloads, or through the US Bureau of Labor Statistics’ open API for economic datasets.

Publication Notes

The project sponsor would like to approve any student work and results prior to publication.


Indiana Economic Development Corporation (IEDC)

The Indiana Management Performance Hub (MPH) provides analytics solutions tailored to address complex management and policy questions enabling improved outcomes for Hoosiers. We empower our partners to leverage data in innovative ways, facilitating data-driven decision making and data-informed policy making.