How often have you queried your data and come back with garbage? You know the question you are trying to answer, whether it’s how many bills of sale went through a particular silo or maybe how come that vendor is on this list and not another? How many are duplicates? You think these investigations are easy to answer, you plot and work on your query; and then the dataset that comes back isn’t remotely what you thought it would be. Then you have to go back and do the whole process again.
A common problem found today inside data is what you think you are asking the data isn’t how it was entered, so you are asking an impossibility. If it was never mapped or built to find the answer you are looking for; it will fail or bring back what can be taken as misinformation. In our current era of data democratization where the definition of data science is still being formed, tech and audit become disseminators of information that are not traditionally in their wheelhouse. Every business wants strategic planning information to be digestible and visual. We all jump through hoops to make that happen, but in most cases, you are dragging information in and out of legacy systems, and losing pieces along the way.
Do you have your end result in mind? Are your query and focus strict enough to bring back that answer?
Here are some tips on how to structure your ideas into smarter queries that deliver the answers you are looking for:
What is your question? Make it as simple as possible and ask for a limited number of items per data pull. That way it is easy to read and you can always go back and pull, then join/merge the datasets.
What does your company track? Look at your internal controls and KPIs. Look at the data as information, focusing on the field names. Knowing your data is half the battle – and it is nearly won, simply because of your working knowledge of the business.
How do I get the data? This may be a request from your IT department, or maybe you can get the data yourself. Always be aware of the needed output. Is this a report? Is this a visualization? Is this just a piece of a larger puzzle?
Is your data good? Data quality assurance, over time, will become more and more important as you compile more and more data. Massive amounts daily in some cases. Noise within data can be a huge problem in fishing out data intelligence. Combining software and services can help with this. A project history can be essential and quite helpful if the software automatically tracks the steps you’ve taken.
Pick your type of analysis. Are you looking for regression, cohort, predictive or prescriptive? What kind of numbers are you looking for and for what end need?
Know your audience. What is the needed output? Does it need to be shareable, editable or locked?
Keep an open mind. Many times when you pull a dataset that you think is garbage, it may hold secret gold nuggets. Use these missteps as fodder for ideas of other, new and unique areas to investigate. Your imagination can go wild in areas such as cost controls. All queries are not bad – some just give strange answers.
Data queries can be fun and even addictive in the right environments. Getting ideas from people with different skill sets is another good tip for finding new ways to use your data for your competitive advantage. Data Oceans, Data Warehousing, Sourced Data, Free Data, Paid Outsourced Data, Lions, Tigers and Bears, Oh My! There is a lot of fun to be had in data. Don’t let the hoops you may have to go through block your imagination in what you can find and do. Happy hunting!