4 Best Practice Tips for Working with Survey Data

Data Culture
10 min readFeb 25, 2022

--

By Amina Brown, Data Visualization Engineer @ Data Culture

Photo by Christin Hume on Unsplash

Asking a data professional about their thoughts on survey analysis often gets you an answer with a healthy dose of skepticism. Surveys can be an incredibly powerful tool with their ability to gauge responses and gather information that isn’t readily available in other datasets, but there are some common mistakes that can lead to wasted time and opportunity. Here are our 4 best practice tips in short (1) Assess if a survey is the best tool for the job (2) Use a survey tool to save time and avoid preventable errors (3) Handle open-field responses with care or pay the price later! (4) Consider the final visualizations you want to create and use this to inform the phrasing and response types for the survey questions.

  1. The Format
  2. The Analysis
  3. The Report

How survey formatting can make or break the analysis

Before you create a survey, make sure it is the best way to get the answers you are looking for. Some quick questions to consider:

  • Can you get the answers you are looking for from data that already exists?
  • Do you have a large enough group to survey so as to have a large enough sample size for aggregation?
  • Do you have enough time and scope to create, manage, and then analyze a survey?

Once those questions are settled, the best thing you can do to improve your survey formatting is to use a survey tool or service. While there are valid reasons for wanting to build your own surveys from scratch, the time it takes to work out all the issues won’t always work with the timeline you have in mind for rolling out the survey. Instead, using a tool will both save you time and prevent you from making the most common mistakes. That being said, these tools will not solve all of your problems.

One of the often overlooked pieces of survey creation is the importance of being intentional with the wording of both the questions and the provided answers. Each respondent is likely to interpret these in their own way, so being as clear and concise as possible makes for stronger insights. For instance, if you were to ask “How many days a week do you take a personal vehicle to work?”, some will read it as taking a car, while others might include items such as a bike or scooter under the definition of “vehicle”. Switching “vehicle” to “car” or “motor vehicle”, will help narrow this definition and improve your confidence that the respondents answered the question in the intended way. The same logic can be applied to the provided answers, be it a single or multiple selection question. These may seem like negligible changes, but it is these small details that can help bring more legitimacy to your analysis.

Even with the use of a survey tool and intentional wording, many people also fall prey to the open text field. There are two instances where you will have the option to add in an open text field. The first is to allow an “other” option with a spot to elaborate on a multiple choice question. This one can be useful, but is prone to issues when it comes to aggregating. The good news is that most respondents, given a well designed question, will use the provided answers. This means that any manipulation of the data that needs to be done post surveying will only affect a smaller group. You also have the ability to group all of them together under the “other” option and not have to worry anymore about that particular open text field. The second type of open text field, and by far the most problematic (maybe even across the whole survey pipeline), is the one provided as the only answer option for a question. In theory, this type of question can be helpful in allowing respondents to share their thoughts, but most other use cases will cause a myriad of issues during the data cleaning and analysis.

As mentioned above, a simple aggregation of open text fields is next to impossible. Even if you use a simple question like “How do you get to work on most days?”, which is likely to return a lot of similar answers, anyone working with code can tell you that similar and same are actually very different. For instance, “Car” and “car” can only be aggregated after making allotments for uppercase letters, which is doable, but making similar allotments for misspellings or variations (“Cars”, “Automobile”, “Auto Mobile”, etc.) will take up a lot of valuable time which could have been saved by adjusting the question format. And really, that’s just the beginning of what can go wrong. Some people will instead give a description of their entire commute, others will tell you about how they wish they got to work, and you’re even likely to get people complaining about their commute with the added bonus of some colorful language.

All of these scenarios make the open text field an unpredictable element in your analysis, so even if you are just sharing out the answers in an unaggregated format, you’ll still likely want to have someone filtering them, which is not always feasible. The last warning I will give about open text fields is that even if you point these issues out to the appropriate stakeholders, there is likely still going to be a push to use them. A stakeholder could easily suggest including them in the survey, but not the analysis. While not a bad solution, this still leaves the door open for someone to ask for analysis or aggregation down the line, so be prepared to set the appropriate expectations and boundaries if you choose to head down that path.

Creating both structure and flexibility within your analysis

Once the survey closes, it’s time for the hard work to begin. How you choose to ingest the survey results depends on the situation. Some cases might have you writing the analysis based directly off of the output table, which is enabled by the csv download option available in most survey tools. Other times, you might be loading the data into an existing database. The latter scenario can be a good way to keep track of the output data and connect the survey responses to other data you have on your clients (company or member attributes being a common example), but it isn’t always an option. Either way, how you name the columns and clean the dataset can define the rest of the analysis. Choosing the right name for each column will affect how easily you can scale the analysis for later iterations. While running the survey again might not currently be in the scope of your project, it generally serves to be prepared for the scenario in case it comes along as it can save you substantial amounts of time. This is also the reason for being careful with how you clean the data. In a csv situation, it can be tempting to use a tool like Excel or Google Sheets to make edits directly to the table. While it may be easy the first time, if you need to pull the raw output again or are running the survey another time, you will need to first remember all the formatting and cleaning steps that were previously executed and then work through those steps again. Instead, writing a script in R or Python will allow you to work through all those issues once and then be able to replicate it as many times as needed (and it will provide you with a form of documentation for the cleaning phase).

Another element to keep in mind as you work through your analysis is how you plan on presenting the findings. Specifically, this applies to the flexibility of the output values. For instance, the task might call for grouping the data by various attributes (company size, gender, race, etc.), so any output metrics should be calculated in such a way to allow for efficient recalculation by those groups. It’s also likely that after the initial scoping of the project, there are requests to split the data differently than initially planned. Building in the infrastructure to allow flexibility for the planned slicing and dicing can also set the framework for unexpected requests. The main key is to use variables to allow for swapping out the grouping attribute. This will help minimize the required changes to your analysis to add in a new category. In some cases, you’ll be able to use a loop to simply add the new groups to a list and seamlessly integrate the changes into your existing analysis. Whatever the situation, the most important part is to be aware and have a plan for expanding the analysis without having to redo your work. And this applies across the board — working with survey data can be a surprisingly intricate process, so it’s best not to underestimate the challenges and requests that may come down the pipeline.

Crafting the best representation of your findings

The final report and its visualizations are arguably the most important part of the entire survey analysis since they are the face of the project. This means that the choices you make about how you represent the data will have a meaningful impact on how well the findings are received. As with any design project, each detail is important to the overall package. One pertinent example of this is the wording used throughout the report. A large piece of this is to accurately represent the contents of the survey. After seeing the results, it can be tempting to reword the description of the question to better fit the talking point of that specific datapoint, but that doesn’t necessarily mean that it’s the right call. Given that the questions were chosen for a reason, changing the wording in the report can actually be quite misleading as the smallest edits can lead to a whole new spectrum of interpretations. Going back to the commuting example, if we compare the original question “How many days a week do you take a personal vehicle to work?” to the more specific question “How many days a week do you drive a car to work?” the takeaway from the question is transformed. While the audience is still seeing the results as a version of the number of days, the first question allows them to consider all the ways the respondents may have interpreted it, while changing it after the fact is actually assuming the interpretation and not providing the necessary context for your audience. A common reason for these types of edits is simply that the question wasn’t well written when it was included in the survey and so to improve the appearance of the report, someone has decided to improve the question itself. While the intentions are good, the outcome isn’t as beneficial as expected. That’s part of why being intentional about your choices throughout the whole process is so important.

A big element of creating your report is working through the responses and digesting what they actually mean. In some cases, you might have an idea of what the audience thinks about the topic, but there’s always a chance that the responses do not support the narrative of the report. This becomes particularly apparent when the results actually show negative reviews of what the stakeholders are trying to accomplish. This could be a low satisfaction score or a result saying that the audience doesn’t use a certain product. You also might find that while you have a satisfactory number of responses to the survey, a question may have been skipped by enough respondents that the sample size becomes an issue. In this situation, you reach a crossroads where you and/or the stakeholders will likely have a conversation about whether to exclude the result from the report and the solution may not be clear cut. Removing the question altogether could present as ignoring unfavorable feedback, though depending on the context, the results may have been skewed by an unanticipated factor. You could choose to remove it but still mention its existence or provide some additional context to account for its absence. The main thing to avoid here is appearing to pick and choose which results to share simply to benefit the interested parties. Ultimately, the key is to be aware of the implications of removing “bad data” and if you choose to do so, do it with intention.

The last element to discuss is the visualizations themselves. And while they’re only being mentioned now, they certainly shouldn’t be an afterthought. The visualizations serve as a translation of the findings. The choices you make will affect both the first impression that readers get as well as their overall understanding of the content. Choosing the right tools and visualization types will help effectively communicate the results. Your final choice will largely be determined by what you have access to, but the available features for whichever tool you choose will be important down the road. Most importantly, plan ahead for how you will share the report. Certain tools enable you to schedule pdf exports or allow users to create their own pdfs. Picking the wrong tool could mean lots of manual work if you intend on sharing custom versions with multiple clients or allowing for different filters. The other choice you need to make is what type of graphs or charts to use. This decision will come down to the graph literacy of your audience. While a box and whisker plot may be more unique and visually interesting, the margin for misunderstanding is a lot larger than that of a bar chart. The type will also be dictated by what’s available within the tool you’re using, or, if you’re building a custom report, the amount of time you have dedicated to building out the visualizations. Essentially, you are looking to strike a balance between form and function, where the appearance of the visualization shouldn’t remove from the readability.

To Sum It Up

Working with survey data is rife with difficulties and potential roadblocks, but the unique insights that the results can provide are worth the effort. Planning ahead and being prepared for those roadblocks will make the whole process more enjoyable and can even increase the impact of your final report. The fact is, that survey data and analysis isn’t going anywhere, so being intentional about how you handle it is likely the best way to alleviate the associated headaches. Making the right decisions about designing and how to run the survey will start you off with a strong foundation. This will set you up to build a versatile data cleaning and analysis pipeline which will allow for scaling and replication. These factors can combine to elevate both the overall content and visuals of your final report, meaning that you can create more impact simply by making the right decisions throughout the process.

--

--

Data Culture

We help organizations build data capabilities and get value from their data.