Portrait: Dr. Chen X. Chen

Visualizing dysmenorrhea symptoms (i.e., symptoms related to menstrual pain)

Project Description (Goal/Scientific Or Practical Value)

The goals of the project are to (1) visualize the relationship between symptoms and (2) visualize subgroups/cluster of women with dysmenorrhea. Such information has implications in menstrual symptom management and treatment tailoring for subgroups.

Information On Dataset(s) To Be Used

My team collected dysmenorrhea symptom data in two studies from a total of over 1400 female participants. Specifically, we asked questions about types of symptoms and symptom severity. We also have data on participants' demographics and clinical characteristics.

Relevant Publications, Websites, Etc

Publication Notes

Students are welcome to add results to their resume. Prior to publishing the results, review, and approval from me is required. In addition, I request to be a co-author for presentations and papers related to this project.

Graphic: INSPYRED Evolutionary Computation

Visualizing Evolutionary Computation

Project Description (Goal/Scientific Or Practical Value)

The project will visualize Inspyred Evolutionary Computational (EC) algorithm results for the Traveling salesman problem or another problem. The students will be exposed to python libraries and NP hard problems. Moreover, they will get familiar with EC optimization methods such as Genetic Algorithms. The end product of this work is visualization of progress of complex algorithms - advanced students may choose to integrate their solution in open source software.

Information On Dataset(s) To Be Used

The code and data that show the solution of the traveling salesman problem is available online as part of the archive.

Relevant Publications, Websites, Etc

https://pythonhosted.org/inspyred/reference.html

Publication Notes

Students can publish the project results and mention the Inspyred library. In fact students will be encouraged to make their solution available as open source software.

Graphic: INSPYRED Evolutionary Computation

The Salons Project

Project Description (Goal/Scientific Or Practical Value)

The Salons Project is a part of Mapping the Republic of Letters at Stanford University. The co-editors are Melanie Conroy and Chloe Edmondson. We study gatherings in women's homes from 1700 to 1814. Our dataset consists of attendance records for the most prominent Parisian salons of the Enlightenment era. These networks were some of the key meeting places for the French Enlightenment. Questions include: how did social class, gender, nationality, and other social traits relate to philosophical and other interests, as well as to what extent salons were open to women and radical philosophers.

Information On Dataset(s) To Be Used

No restrictions.

Relevant Publications, Websites, Etc

http://blogs.memphis.edu/salonsproject/

Publication Notes

No conditions so long as data source is acknowledged.

Logo: GloBI

Visualizing evidence of the oldest social graph on earth: our living ecosystems

Project Description (Goal/Scientific Or Practical Value)

Our ecosystems consist of intricate interdependent webs of interacting organisms. This ecological "social" graph of life on earth is becoming available digitally as (citizen) scientists are openly sharing their ecological datasets. Global Biotic Interactions (GloBI, https://globalbioticinteractions.org) carefully stitches together these open datasets to provide a global view of how species interact (e.g., predator-prey, pathogen-host, parasite-host, pollinator-plant). An ongoing challenge is to visualize the chain of evidence that supports an interaction claim. Traditionally, the origin, or provenance, of a claim is captured by a scientific reference. Currently, GloBI attempts to capture the authority and location of a claim. The challenge of this project is to design a visual representation of complex ecological evidence chains (e.g., Bees pollinate plants as described by ...) to help (citizen) scientists better navigate the millions of ecological claims made available through GloBI.

Information On Dataset(s) To Be Used

Global Biotic Interactions (GloBI) data captures, among other things, the scientific name, interaction type, location, time and life stage of interacting organisms. Also, GloBI data describes the authority and digital origins of the claim in the form of URLs, DOIs and human readable citations.

Relevant Publications, Websites, Etc

https://globalbioticinteractions.org

Jorrit H. Poelen, James D. Simons and Chris J. Mungall. (2014). Global Biotic Interactions: An open infrastructure to share and analyze species-interaction datasets. Ecological Informatics. https://doi.org/10.1016/j.ecoinf.2014.08.005

Publication Notes

GloBI data can be openly accessed. Please do note that academic tradition is to cite the appropriate sources to attribute data providers and to establish a chain of trust. Also, students are expected to openly publish their results in a reliable and citable publication platform like https://figshare.com, https://zenodo.org or https://osf.io to help share their insights to benefit future collaborators.

Image: Space Station

Student Experiments on the International Space Station

Project Description (Goal/Scientific Or Practical Value)

Our organization (a non-profit) manages the International Space Station US National Lab, including promoting large-scale use of the ISS by elementary, middle and high school students to do experiments on-orbit. Over the past five years, students have launched and operated hundreds of experiments, ranging from spiders in space, to Earth observation, on-orbit robotics and so many more. In fact, we've done so many student experiments, and have so many pathways to engage students, that now we need a better way to help students understand what is possible, inspire their imagination, support their experiments and share results with other students, educators and the public. This is a true revolution in educational access to space, and solving the challenges of this new venue will open this up to many more students.

Information On Dataset(s) To Be Used

The International Space Station is inherently global, with 18 countries supporting its overall operations, and many more accessing and using it as a platform for research, business and education. Our data set for this project has an incredible wealth, including data from NASA telemetry about the ISS, data about the 20+ programs that use the ISS for education, and results from the student experiments.

Relevant Publications, Websites, Etc

Use the link above, to explore the educational programs using the ISS - each has a wealth of data about student experiments.

Publication Notes

No constraints, except that you need to cite your partnership with the ISS National Lab. We also want to include the results in our web site and public communications.

Image: Colored graph

Respiratory modality after pediatric extubation

Project Description (Goal/Scientific Or Practical Value)

Pediatric patients can be assessed for extubation using different protocols (DSBT, CPAP/PS, or no extubation readiness trial). Our goal is to assess the respiratory mobility progression (RA, NC, HFNC, NIPPV, ETT) in the first 72 hrs after extubation.

Information On Dataset(s) To Be Used

We are preparing to publish our study, so the figures will be used for publication.

Relevant Publications, Websites, Etc

https://iu.box.com/s/feojvedkczxq9q199ougnoubabflvi2g

Publication Notes

We can add acknowledgement for the student in our next publication and the student can add it to their publication. Plan to submit a manuscript in Spring.

Image: Earth from space with molecular models on it

Atlas of Molecular Scaffolds

Project Description (Goal/Scientific Or Practical Value)

The project aims to visualize geographical interactions of chemical research. Given a data set of molecular scaffolds (molecular frameworks) and their information about the paper in which it has been published (PubMedID, publication date, author affiliations), frequently used scaffolds per country can be visualized. In addition, collaborative chemical research can be identified via the author affiliations. Not yet identified possibilities of research collaborations can also be displayed by creating connections between institutions that are working on the same scaffolds but have not yet published together in the same paper (can be identified via PubMedID). Ultimately, changes of collaborative chemical research over time can be displayed by including the information of the publication date.

Information On Dataset(s) To Be Used

The provided data set (540,455 rows) includes information on the paper identifier ("PMID"), molecular scaffold ("Murcko scaffold"), country ("Country"), publication year ("Year"), publication month ("Month"), author affiliations ("Affiliations"), counts of different affiliations given for a paper ("Number_Affiliations").

Relevant Publications, Websites, Etc

In a previous paper we investigated the popularity of molecular scaffolds in the Medicinal Chemistry literature (https://www.dropbox.com/sh/04e1ga8gvakm3ag/AADQ1iBYe34ioHNd8_wo4yRha?dl=0). In this study, we want to focus on the geographic distribution of molecular scaffolds and changes over time as well as identify new possibilities for collaborative research in Medicinal Chemistry.

Publication Notes

The project sponsor would like to be a co-author for any publications that result from student projects using the dataset provided.

Logo: NEEC

Visualizing the impact of training datasets on machine learning performance

Project Description (Goal/Scientific Or Practical Value)

The ultimate goal of the project is to develop interactive visualizations that help explain the structure, run-time dynamics, and results of different machine learning (ML) algorithms. In this project, we are interested in visualizations that help communicate the impact of training data on learning performance. Specifically, we are looking for visualizations that map high dimensional image training datasets into two-dimensional data spaces that can be readily understood and explored interactively. Different image similarity metrics should be explored and dimension reduction techniques such as PCA, t-SNE, and UMAP should be used to layout training data in two-dimensional spaces. Results should be used to estimate and communicate the decision boundaries of ML classifiers; to compare how different training datasets affect ML prediction performance, and even to see how prediction performance changes as training progresses.

Information On Dataset(s) To Be Used

Plane: The aim of this dataset is to help address the difficult task of detecting the location of airplanes in satellite images. Automating this process can be applied to many issues including monitoring airports for activity and traffic patterns, and defense intelligence. Provided is a zipped directory planesnet.zip that contains the entire dataset as .png image chips. Each individual image filename follows a specific format: {label} __ {scene id} __ {longitude} _ {latitude}.png. The pixel value data for each 20x20 RGB image is stored as a list of 1200 integers within the data list. The first 400 entries contain the red channel values, the next 400 the green, and the final 400 the blue. The image is stored in row-major order so that the first 20 entries of the array are the red channel values of the first row of the image. Ship: The aim of this dataset is to help address the difficult task of detecting the location of large ships in satellite images. Automating this process can be applied to many issues including monitoring port activity levels and supply chain analysis. The dataset consists of image chips extracted from Planet satellite imagery collected over the San Francisco Bay and San Pedro Bay areas of California. It includes 4000 80x80 RGB images labeled with either a "ship" or "no-ship" classification. Image chips were derived from PlanetScope full-frame visual scene products, which are orthorectified to 3-meter pixel size. Other popular datasets for machine learning can also be used, such as Microsoft COCO, ImageNet, and etc. Download: Plane: https://www.kaggle.com/rhammell/planesnet/download Ship: https://www.kaggle.com/rhammell/ships-in-satellite-imagery/download.

Publication Notes

The project sponsor would be very happy if students get publishable results and would be happy to co-author a paper with them if they choose to do so. Students may freely mention results on resumes.

Image: Logos and Indiana map showing state diversity

Visualizing IU’s multidisciplinary network of researchers tackling the Prepared for Environmental Change Grand Challenge

Project Description (Goal/Scientific Or Practical Value)

Research at the Environmental Resilience Institute involves more than 100 faculty, post-docs, students and organizations, partnering with business, organizations and government entities across the state. A goal of the project would be to highlight by visualization the various schools and disciplines of all affiliated researchers, leadership, advisory boards, and staff as well as their contributions to the ERI working groups.

Information On Dataset(s) To Be Used

Spreadsheets of research affiliates can be provided with links to each faculty bio webpage as well as working group membership(s).

Relevant Publications, Websites, Etc

https://eri.iu.edu/who-we-work-with/index.html

Publication Notes

No conditions. The ability to add or remove entities to any produced visualizations as researchers, partners, and affiliates come and go would be ideal.

Image: Various visualizations

Solar Performance Analysis

Project Description (Goal/Scientific Or Practical Value)

The Problem: Residential solar photovoltaic (PV) systems are a growing part of the energy mix, but these systems are often poorly designed, deployed, and managed. They generate a significant amount of data that could be used to help drive improvements, but this data generally is not being used effectively today. The Project Objective: A key to improving residential solar system deployments is to enable to installers and homeowners to better visualize and understand how well their systems are performing, and how they can be improved. Some questions that could be answered with the data that is readily available include: - How does actual performance compare to the installers estimates? What is the cause of any differences? - What is the best system architecture for any given application? What utility rate structure should be selected? What potential benefits could battery storage provide, and how should it be managed? - Is solar a good or bad investment - environmentally and financially - for this specific residence? - How much can system performance be improved by relocating panels or obstructions? - How can consumption be reduced and time-shifted to better match the solar production and utility rate structure? - What is the impact of this system on the electric grid? How can the benefits be increased?

Information On Dataset(s) To Be Used

The primary data source for this project is time-series sample data generated by one or more local solar installations. The content and format of the data is described in the SolarEdge Cloud Monitoring Server API Documentation:

Publication Notes

No restrictions.

Logo: CNS NRT

Interdisciplinary Training in Complex Networks and Systems: Collaboration, Disciplines, and Mentorship Visualization

Project Description (Goal/Scientific Or Practical Value)

The Complex Networks and Systems NSF Research Traineeship (CNS-NRT) is an interdisciplinary multifaceted STEM program with a rigorous dual-PhD degree focus, emphasizing collaborative skills and workforce development. The evaluation of this program requires the examination of diverse perspectives and outcomes. In our current evaluation, we aim to understand the scope of academic progress by analyzing students’ publications: 1) are they developing collaborative network, 2) do they publish in interdisciplinary journals, 3) does their publication represent only one discipline or two disciplines (interdisciplinary in nature) based on their choice of venue, 4) do they have co-authors, 5) are they well represented in google scholar (or other academic profiles)? The second question relates to the mentoring and interdisciplinarity: 1) what is the current network between students and their mentors, 2) how well certain disciplines are represented in the program and which discipline are weak, 3) how is this network changing between Year 1, Year2, and now Year3 of the program? Data Visualization: 1. Dictionaries - student/discipline (not CNS discipline)* - advisor/discipline - student/advisor - advisor/student 2. Co-authorship network 3. Discipline network 4. Student-advisor network 5. Students’ presence on academic sites (e.g. google scholar).

Information On Dataset(s) To Be Used

Data descriptions:

Publication Notes

The sponsor would like to be a co-author and approve results.

Image: Plate with text, knife and fork

Livable Communities Index

Project Description (Goal/Scientific Or Practical Value)

In order to establish a proactive measuring tool for the establishment of the Community Resiliency Platform, the Institute for Regenerative Design & Innovation aspires to build a web based, comprehensive tool and analytics system known as the Livable Communities Index (LCI) that visually displays a variety of health and wellness indicators at a community level. Work will focus on a comprehensive assessment of the foundation required to construct a Google Maps based visualizer with adequate attention on developing specific core components of the proposed LCI. This assessment will provide a robust understanding of the proposed LCI coding architecture and will be formalized into a comprehensive specification document which will include a detailed budget for building a fully integrated and operational LCI.

Information On Dataset(s) To Be Used

The LCI design, as it applies to Winston Salem, will supply the means by which we measure the overall impacts of our unique health-based strategy thus providing a science/evidence based road map for regional transition to a healthier more vibrant Restoration Economy, specifically targeting at-risk, low-wealth communities. The primary data for the LCI will be collected and secondary sources by professional field staff, epidemiologists, and analysts. The data will then be entered into the LCI database, and will represent environmental, economic, and social indicators of livability. Information from the data clusters will be available through the LCI format and displayed within the Google Map we generate for our target region.

  • US Census Bureau
  • NC State Center for Health Statistics
    • There's an incredible amount of data available here, but the formatting is generally inconvenient. I suspect it's machine readable if you're in a position to use a scraper: https://schs.dph.ncdhhs.gov/data/county.cfm
  • CDC Wonder
    • This allows you to query some of the CDC's databases. It has an interesting assortment of data available, and you can generally get pretty granular information (for a public health data set): https://wonder.cdc.gov/
  • Electoral Participation Data
    • This is the link to the state FTP site for voter data: https://dl.ncsbe.gov/?prefix=data/
    • This is all person-level data, and it includes demographics and addresses.
    • The voter files have the voter registration data. Because of local policies about removing people who have moved or passed away and some of the quirks we've observed in the data, we strongly suspect that more transient groups and much older adults are over-represented in the registration rates, which makes their participation rates appear artificially low if you use that as the denominator. We've been using the number of adult citizens as a proxy for the eligible voter population instead.
    • The history files show the voting history of every person currently registered to vote in that county for each election they participated in. It is organized by the county where they currently reside, but it indicates in which county the historical ballot was cast. The trick for looking at historical data for any one county is that you have to pull all of the residents who have voted in that county from all of the county files.
    • The voter IDs are unique to each voter within the county but they repeat across counties. (E.g. Voter 102 in Forsyth County is not the same person as Voter 102 in Durham County).
    • We've spent some time on the phone with the board of elections trying to figure out what some of the fields mean. We don't have notes for everything, but we have notes for a bit. It's a quirky (but really interesting) dataset. I'd be happy to answer student questions on anything that I can if they work with this data.
    • We are Forsyth County (number 34). Here is a list of county codes if that helps: https://slph.ncpublichealth.com/doc/NorthCarolinaCountyCodes.pdf

Publication Notes

LCI will be a hub for several types of users:

  • Community - can upload information related to their community including images, videos, as well as personal fitness information.
  • Developers - access to open-source repository such as GitHub (github.com) and can pull data, review code, and submit code improvements.
  • Researchers - have access to non-community facing data sets and graphs. Researchers can use the LCI for comparing datasets, finding correlations, submitting research for peer review, and peer reviewing submitted work.
  • Administrators - has access to all areas of the content management system, can add/edit/delete users, approve/reject research accounts, curate submitted content, create/manage community events
Image: Text POPMOD made of smillie faces

Visualizing the Population Modeling Map

Project Description (Goal/Scientific Or Practical Value)

The field of population modeling is composed of many researchers that model populations of different types with overlapping methods. Recently, the population modeling working group started mapping the field to help researchers locate similar work and potentially reuse work done by other researchers in parallel fields.

Possible ideas on how to visualize the dataset better could be:

  1. Showing keywords associated with population modeling sized according to how common they are and if a word is clicked, researchers who qualify for that work will be shown alongside links to their publications. For example, the word Monte Carlo or Agent Based will be very common.
  2. Showing the modelers as icons in space and show connections among their work through links that represent a common element they use with different colors.

Information On Dataset(s) To Be Used

The dataset consists of papers published by the working group members. Paper metadata and summaries are available as a spreadsheet that is publicly accessible.

Relevant Publications, Websites, Etc

Publication Notes

No special conditions. However, it would be nice if the students will report back to the mailing list and contact the researchers that contributed work to let them know about the new map. Resulting web sites or other visualization products should link to the population modeling working group page (https://simtk.org/projects/popmodwkgrpimag)

Website screen shot and heatmap overlay comparison

Visualizing the Evolution of Website Design

Project Description (Goal/Scientific Or Practical Value)

With over 25 years of history, the web itself has become a significant cultural artifact. We are studying how website design has changed over time, and how these changes reflect changes in culture, technology, aesthetics, etc. We are studying these changes through both user studies (e.g., interviewing web designers) and through automated analysis using computer vision and machine learning. We would like help in creating visualizations to explore and discover patterns of web design changes over time.

Information On Dataset(s) To Be Used

The dataset for this project uses screenshots from several hundred websites over a period of about 15 years. The dataset is derived from the Internet Archive’s Wayback Machine using a web scraper.

Publication Notes

The project sponsor would be very happy if students get publishable results and would be happy to co-author a paper with them if they choose to do so. Students may freely mention results on resumes.

Logo: GloBI

Visualizing Research Silos in Ecological Interaction datasets

Project Description (Goal/Scientific Or Practical Value)

Open access to high quality and integrated ecological datasets is important to better understand and preserve the ecosystems that sustain life on earth. The goal of this project is to visualize and identify research Silos (or overlap) in geospatial temporal species interaction datasets (e.g., bees pollinate plants, sea otters eat crabs, and the black plague is spread by rodents). The scientific and practical value of this is to help researchers, and their funders, identify collaboration opportunities in an effort to increase the quality of datasets while avoiding duplicate work.

Even though acquiring ecological data is resource intensive and time sensitive, anecdotal evidence suggests that research efforts in ecology are siloed. These siloed research efforts might explain the sparseness (aka the “Eltonian Shortfall”) of openly available datasets that describe how organisms interact in our global ecosystems.

Information On Dataset(s) To Be Used

GloBI mines existing datasets that describe how organisms interact. On November 2017, GloBI includes about 2M interactions across over 100,000 taxa (see //en.wikipedia.org/wiki/Taxon). This makes GloBI one of the largest, openly accessible resources of species interaction records available today.

The primary dataset is provided through GloBI consists of open-access, integrated, species interaction datasets. Secondary datasets include Global Biodiversity Information Facility (GBIF), Integrated Digitized Biocollections (iDigBio), or similar open-access projects that provide additional information on where and when individual species occurred.

GloBI data records are available in tsv (tab separated values) files, n-quads, web API, R package and JavaScript libraries), among others.

Publication Notes

Students may publish the results as they please; however, the project sponsor strongly encourages open access publications. For examples of prior student publications, please see previous IVMOOC projects at:

Logo: ChaCha

ChaCha Menopause queries

Project Description (Goal/Scientific Or Practical Value)

The ChaCha menopause query data is the foundation for building intervention modules to improve people’s knowledge and problem solving skills related to menopause. For this project, students work with the data to identify topics that people want to know about related to menopause. My team will use the results to build intervention modules on each topic, and apply for NIH R21 funding to pilot test their impact, and then get NIH R01 funding to test their efficacy and/or effectiveness. I envision an intervention for menopausal women (ages 42-62, female) that would include modules around what their teenager, male partner/spouse, or female family members/friends might also want to know.

A secondary purpose of the project is to identify what the general public who are not women aged 42-62 (all other ages and genders) want to know about menopause, which will be reported back to the North American Menopause Society to improve their educational website materials.

Information On Dataset(s) To Be Used

In 2015, our laboratory became the only research team in the world with access to raw data from ChaCha for the years 2009 to 2012. ChaCha was a United States-based company that let users anonymously submit questions and receive a human-guided, real-time, anonymous, and verified answer.

Between 2009 and 2012, users submitted their questions via texts or the Web. After several queries from the same user, ChaCha asked the user to provide their age, sex, and zip code, but the information was not required. ChaCha is not a social media platform and its anonymity appears to allow users to ask unusual and potentially embarrassing or stigmatizing questions.

The ChaCha database contains 1.93 billion complete questions asked between January 2009 and November, 2012 by 19.3 million users. A menopause-related subset was identified by searching for questions containing a set of key words. We started with an initial set of keywords that were expanded based on our initial review of the data and variations in spelling of terms. The final set of keywords we used can be grouped into five broad topics: (1) menopause/menopausal status; (2) hot flash symptoms; (3) medications; (4) surgery; (5) specialty provider.

Our searches using these keywords resulted in 263,363 menopause-related queries in total (user age 13+) and 5,892 queries from women aged 40 to 62 years. The ChaCha user query data is provided to students in an Excel workbook for students to use. Students working on this project will be invited to be “collaborators” for the IU Box folder.

Publication Notes

The project sponsor would like to be a co-author for any publications that result from student projects using the dataset provided.

Portrait: Dr. Chen X. Chen

Text-Mining of User-Generated Queries on Menstrual Pain

Project Description (Goal/Scientific Or Practical Value)

Menstrual pain is highly prevalent among women of reproductive age. By text-mining of user-generated queries, we try to understand the public’s information needs and concerns related to menstrual pain. The insights gained from this work will be used to develop interventions to support menstrual pain management. The specific goals for this project are to: (1) visualize the text data; (2) cluster and categorize questions; (3) summarize patterns and themes.

Information On Dataset(s) To Be Used

The data to be mined are user-generated questions (i.e., text data) from ChaCha search engine (see ChaCha Menopause Details description for more details about the data set). The ChaCha query results were analyzed to identify all queries that include questions on menstrual pain; the results were split into two subsets. The first subset contains about 507,000 menstrual pain-related queries from females; the second contains about 114,000 menstrual pain-related queries from males. Demographic information (e.g., age, gender, geographical location) is available for some users. The ChaCha user query data is provided to students in an Excel workbook for students to use. Students working on this project will be invited to be “collaborators” for the IU Box folder.

Publication Notes

Students are welcome to add results to their resume. Prior to publishing the results, review and approval from me is required. In addition, I request to be a co-author for presentations and papers related to this project.

Logo: Biosim

BioSimmer

Project Description (Goal/Scientific Or Practical Value)

BioSim is a participatory simulation where young students (grades K-3) enact the roles of ants and biological systems through the assistance of electronically-enhanced e-puppets. It is designed to enhance youths’ understanding of complex systems though novel combinations of play, reflection, interaction and exploration.

This project uses data generated by children while playing with electronic ant puppets within a defined indoor area. The visualizations should show (1) where ants travel during one game and (2) which students and teacher answer the question “where did the ants go the most?” The visualization must help school teachers to easily show and explain to young children which team of ants were searching food more efficiently and why.

Information On Dataset(s) To Be Used

The project will use data collected at an elementary school in Spring 2017. The data is recorded in plain JSON files. Each file captures the activity from one game, the trail/position data is stored in the “allActions” attribute with each “checkTrack” action. For more detailed explanation please contact the client directly. Data for this project is available through GitHub:

Publication Notes

The project sponsor would like to be a co-author for any publications that result from student projects using the dataset provided.

Logo: BLM

Visualizing Government Meetings

Project Description (Goal/Scientific Or Practical Value)

Currently the City of Bloomington has 33 boards and commissions, in addition to the ongoing meetings of the Bloomington City Council. Currently the only way for a person to understand what happens from meeting to meeting is to navigate our city website and read minutes.

We would like to work with students from the IVMOOC to it make easier to understand Bloomington City Council meetings, which includes meeting participants, dates, topics, and the policy and legislative outcomes through information visualizations. This will have enormous value for the public by increasing government transparency and hopefully making it easier for folks to follow along with what is going on city hall and community of Bloomington.

Information On Dataset(s) To Be Used

The data for this project is publicly available through the City of Bloomington website. Each meeting record includes the type of meeting, date and a variety of documents related to these meetings. These documents may include: meeting agenda, minutes of speakers at a meeting, legislation packets, and non-public meeting memos. The documents are unstructured data provided as PDFs.

Relevant Publications, Websites, Etc

Publication Notes

The information is all available in the public domain, students are free to publish or present the outcomes of the project how they would like. Ideally, we would love to implement their work onto the city website site or via a public website to make the work accessible to the public.

Logo: BookTrust

Visualizing Student Book Choice

Project Description (Goal/Scientific Or Practical Value)

Book Trust is a non-profit organization that serves 53,000 students across 21 US states by providing a stipend each month for the students to choose their own books out of the Scholastic catalogue. We currently receive monthly data from Scholastic that shows us the books kids are choosing, but we don’t have a way to visualize that data and act upon it.

We are interested in developing visualization products and resources to help teachers understand what books their students select and potentially make recommendations. The visualizations should be developed with Tableau dashboard software.

Information On Dataset(s) To Be Used

We receive a txt file drop from Scholastic that is uploaded into our custom web-based application. We then upload data into Tableau for easy consumption by staff. Data will be accessed through our custom web-application that requires login for access, due to sensitive student information. Once the project team is approved, we will give students a login to access the dataset.

Relevant Publications, Websites, Etc

Publication Notes

The project sponsor requires approval of the results, and depending on the evolution of the project and our staff’s involvement, potentially a co-authorship on any publications.

Logo: Maker Ed

Visualizing Student Makerspace Portfolios

Project Description (Goal/Scientific Or Practical Value)

The Open Portfolio Project (OPP) aims to develop a common framework for documenting, sharing, and assessing learning through portfolios. Open portfolios are openly networked, decentralized, and distributed systems of documentation, curation, and reflection, which can showcase a learner’s abilities, interests, and voice in a way that test scores and grades cannot.

Portfolios provide opportunities for students and others to recognize the skills and ideas they have to offer and contribute, especially for students who may not excel in academics or high-stakes testing. Inherent to the creation of a portfolio is the process of reflecting on one’s work, curating what is most appropriate for an intended audience, and designing an artifact to articulate that evolution of learning and making.

Information On Dataset(s) To Be Used

This project uses survey data collected from youth-serving makerspaces across the United States, between 2014 and 2017. The data covers demographic information about youth and staff members, programs offered and how they relate to school subjects, as well as assessment approaches in makerspaces with a particular focus on open portfolio assessment.

Students working on this project might consider visualizations that show:

  • Participation in Open Portfolios project over geographic space, using youth and staff demographics, the annual diversity index and comparing school and out-of-school makerspaces annually;
  • Visualize the topics and subjects offered for school and out-of school makerspace, over time;
  • Interactive visualizations that let users explore assessment items in school and out-of-school makerspaces, by item type and overall performance
  • Portfolio assessment visualizations that show the impact of portfolio assessments and reasons for student portfolios that highlight the challenges of implementing, and published/not-published portfolios, comparing in-school and out-of-school makerspaces.

Publication Notes

The project sponsor requests to approve results and share co-authorship prior to publication, and would like to use the visualizations in our reports.

Logo: Maker Ed

Re-Crafting Mathematics Education

Project Description (Goal/Scientific Or Practical Value)

This collaborative project, titled “Re-Crafting Mathematics Education: Designing Tangible Manipulatives Rooted in Traditional Female Crafts,” is based on our prior work studying design, mathematics, and traditional female crafts, particularly when integrated with electronics in what we call “e-textiles.” As part of this new project, we plan to extend this earlier work to better understand how traditional female crafting practices can make far-reaching improvements in a range of learning outcomes in science, technology, engineering, and mathematics (STEM) education. These types of investigations will help reveal key issues underlying the underrepresentation of women and girls in lifelong STEM learning.

We are conducting a series of ethnographies of female crafting circles to better theorize the connections particularly between mathematics and traditional women’s crafts. In fact, we are coming to understand craft as a lived mathematical practice- that craft and mathematics are closely intertwined. Following our initial ethnographic field work, we’re planning to develop and test a set of new hands-on classroom manipulatives for schools and after-school programs (targeted at youth in grades 5-9).

Information On Dataset(s) To Be Used

We generated a list of codes and coded interviews with crafters and their craft projects in relation to inherent mathematics concepts used across a range of crafts. We would like to see visualizations of the way in which math concepts are represented across crafts.

A copy of the data will be provided to students as spreadsheet after the student group forms and contacts the project sponsor.

Publication Notes

The project sponsor would like to discuss any co-authorship or publications at the beginning of the project.

Logo: An Opioid-focused Suregery

Personalized Post-Surgery Opioid Risk Mitigation and Teaching

Project Description (Goal/Scientific Or Practical Value)

Nearly 60% of surgical patients get opioids when they leave the hospital. Of those prescribe opioids, 10% of those patients become chronic users of opioids and 1% overdose. Given that 50 million surgeries occur every year, the exposure to opioids is dramatic. Overprescribing and lack of knowledge about proper use and risk by patients are at least, in part, implicated. When these patients leave the hospital with an opioid prescription, there is no well-designed worksheet to tell them how to take their medication and the risks involved.

The goal of this project is to create a personalized, auto-generated, easily understood opioid data sheet that presents best practice as well as use and risk data by procedure. We have created a database of typical opioid requirements for patients for numerous surgical procedures. We also have a database of the misuse risks associated with refills by surgical procedure. We also have a datasheet with several of the best practices organized as a list as well as examples of previous datasheets that are clearly inadequate. Now, we want you to help us build a worksheet surgeons around the country can give their patients to inform them and protect them from opioid misuse.

Information On Dataset(s) To Be Used

Data for this project includes:


  1. Database of typical opioid needs by patient type and surgery.
  2. Database of risk of misuse by patient type, number of refills, and surgery.
  3. Datasheet of best practices.

The project sponsor will share this data with the team at the beginning of the project.

Publication Notes

The author would like to approve the any final publication, and be included as a co-author of the work.

Logo: OwlSky Cloud platform

Visualization of Time-Resolved Electronic Crystal Structure Relaxations

Project Description (Goal/Scientific Or Practical Value)

Dynamics on an electronic level play a crucial role in most mass transport mechanisms within crystals, such as ionic conduction in energy materials. Large datasets of time-resolved electron densities can be calculated for these kinds of processes by means of modern quantum-chemical modeling techniques, like density functional theory. However, an understanding of the particular interactions within the transport process (i.e. the time-resolved breaking and formation of chemical bonds) remains tedious. A detailed visualization of ionic and electronic relaxations and electron density redistribution would greatly enhance today’s capabilities of data analysis and interpretation.

Information On Dataset(s) To Be Used

The data sets for this project comprise 10 time steps, each reflected by a scalar electron density map on a 72x72x120 data point grid for the 3D-periodic supercell and a respective list of atomic positional coordinates. Students may use the software tool Cinector GmbH, in addition to open codes such as the Python Mayavi package.

Publication Notes

The project sponsor would like to approve the results and be a co-author on any publication; however, students may use the results of the project in their resume and portfolios.

Image: A cartoon person pedaling a stationary bike

30 sec of Power Pedaling - How much energy can be produced at what age?

Project Description (Goal/Scientific Or Practical Value)

Visitors to the Experimenta Science Center in Heilbronn Germany were asked to participate in an experiment to pedal on a stationary bike for 30 seconds. The data was collected to give people feedback on their performance in comparison to others of similar age and same sex. Data from nearly 200,000 users was collected and is now available to analyze. Students are invited to analyze and visualize physical performance compared to demographic features of participants such as age and sex. Students are welcome to develop and pursue their own ideas of what to do with the data.

Information On Dataset(s) To Be Used

The dataset consists of about 200,000 rows each representing a visitor and seven columns with information about them including date and time, age, sex, peak power, and energy “produced”.

Relevant Publications, Websites, Etc

Publication Notes

The project sponsor notes that students can publish approved results and add the project results to their resumes. Attribution of the data sources and associated contributors is highly encouraged.

Logo: IN Management Performance Hub

Exploring Patent Data in the State of Indiana

Project Description (Goal/Scientific Or Practical Value)

The State of Indiana does not have consumable information around patents. The Indiana Economic Development Corporation (IEDC) is interested in visualizing patent data for the State of Indiana. The following items are of particular to IEDC in the patent data:

  • Patent industry alignment with IEDC target industries – Agriculture, Aerospace / Defense, Smart Transportation, Life Sciences, Energy / Materials. Advanced Manufacturing, Tech / Cyber Security/ Internet of Things
  • Patents that are available to be acted on by entrepreneurs.

The IEDC would like to have a visualization that allows for dynamic interactions and slicing / dicing of data related to patents. These visualizations may presented to various stakeholders (IEDC, venture capital groups, and universities) to support efforts by IEDC policy goals and initiatives, such as linking surplus or stagnant patents with entrepreneurs and funding to spur innovation and entrepreneurship in the State of Indiana.

Information On Dataset(s) To Be Used

Data for this project comes from the US Bureau of Labor Statistics reports on Indiana. The reports include data on: labor force, workplace injuries and illnesses, employer benefits and pay, consumer price indices, consumer spending, energy prices. Data may be accessed through reports, tabular data downloads, or through the US Bureau of Labor Statistics’ open API for economic datasets.

Publication Notes

The project sponsor would like to approve any student work and results prior to publication.