Discover more from Under the blue sky
Frustrations of a data scientist
Is your analysis used only to confirm what management already believes?
Data scientists often complain that management judges their output by whether it confirms what they already believe or not. If the analysis confirms what the business leaders believe, they get kudos. Otherwise it is brickbats.
A similar thing is being highlighted here.
US News collects data on different factors about each university. These are then combined to produce a score that is used to rank universities.
The peeve highlighted here is that they try out a bunch of different weights to produce scores and look at the outcomes before deciding what weights to pick.
Is this wrong?
It does sound wrong. But what if you had some pre-determined weights and applied it on the data and say Yale and Harvard dropped down to rank 32 and 27. Could you sell this ranking as a business now?
This is not a trivial problem even when the stakes are lower.
My wife and I recently built a scoring model when we were picking a school for our kid. We made a list of factors and tried to visit and learn enough about each school.
We tweaked the weights a lot before deciding on a school. How important is the sports program? How much worse is a 45 min commute vs. a 15 min one?
At the end of this process we did choose a school. We are reasonably happy with it. The data we collected was somewhat fuzzy in comparison to what we later experienced.
Was there a point to this data collection and model building?
Do we decide up front, consciously or subsconsciously, and then just use the analysis to back the decision by tweaking the factors and weights? Or do we really pick the school the model tells us to?
Knowledge matters.
A seasoned business operator (B) carries a lot of knowledge in their head. Not much about their industry could possibly surprise them. So they have priors. Strong priors.
The data analyst (D) who may not have all this domain knowledge does some analysis with what data they have and take it to B.
Most of the time B nods their head because the analysis confirms what they believe. They say “well done” to D and move on.
But sometimes D brings up something surprising that contradicts what B believes. One of two things is now true:
There is a problem in the data and/or the analysis
Something has really changed. The business person needs to update their priors
D thinks #2 is what is what happens every time. But in many organizations, #1 is way more likely to happen than #2. So B will default to being dismissive when presented with such “surprises”.
Hence the frustrated data scientist.
The link between analysts and the business matters
When B doesn’t like the conclusions, the best tool D can use is humility. D should ask B why they believe the analysis is wrong and update their knowledge.
D (or at least their manager) should also have a very good idea of the base rate of how often #1 happens vs #2. Each instance of #1 should lead to fixes in either data management and analytical processes along with improvements in domain knowledge.
When you do this you add more value even when just confirming what B already believes.
Lets say B has to choose between 3 options. B feels option #2 looks the best.
D through their improved knowledge and better data and processes can add confidence and depth to these decisions. For example, D might be able to show how much better Option #2 is compared to the others and what specific factors make it more attractive.
When you follow these processes, B should over time trust D more. So when D walks up with a “surprise” they get taken much more seriously.
Hopefully there is less frustration all around!
yeah.. every failed ML project is an opportunity to make good connections with stakeholders.. if done right they will come back for the expertise.