Chad Sanderson 

@chadsanderson
Currently Head of Data Platform @ Convoy, Formerly Experimentation Product Leader @ Microsoft / SEPHORA / Subway

Big article coming…My next data manifesto

Why Data Quality Is So Hard In S3
An Industry Shift: Moving From Collecting to Automating Metadata

Starting a data infrastructure project with a clean slate can feel incredibly liberating. The whole business is excited to use valuable data for the first time. Years into the role, things start to change for the worse...

The baseline expectation is that data will be accessible and discoverable. You don't win any points just by doing your job. The mass of data debt that has arisen from a lack of well-defined upstream modeling and ownership has created a graveyard of dashboards, constantly runnin…

Data Contracts Book - Early Release

Self-serve BI is one of the largest culprits for poor data management. Teams believe they are 'democratizing data' but are just creating a graveyard of dashboards, views, and tables that no one uses anymore.

During the advent of the cloud, venture capitalists heavily funded Analytics and Business Intelligence software. The reason was simple: every company excited about migrating their data into S3 wanted to do something with it, and the primary use case for all Heads of Marketing, RevOps, and Pr…

Despite paying for a data warehouse, data lake, and data engineering team - Most executives don't know how their data is being used, who is using it, which data products are built on top of it, and how valuable these data products are to the company.

Because of this, data teams are constantly undervalued. Their pain points are rarely heard or taken action upon, and their ability to deliver solutions is hampered by a lack of infrastructure.

An essential role is needed to fill this gap: Data Produc…

Data is more like supply chain management than software engineering. Failing to understand this key difference results in broken, unscalable infrastructure.

The first similarity is workflow management: In a supply chain, the movement of goods from suppliers to customers requires a well-coordinated sequence of steps with clear visibility for the transformation of raw materials into useful products that a customer can leverage.

Similarly, data workflows involve ingestion, the transformation of raw …

Partners

The biggest reason data engineers are constantly fire-fighting is because their organization has no data change management process. When code changes unexpectedly, queries break.

Despite change management being a core component of a product engineering team’s workflow (Unit testing, integration testing, pull requests) this same cycle of review and release is not present for data organizations, or software engineers who maintain data sources such as production databases, APIs, or event st…

Without high-quality data, every AI and analytics initiative will be underwhelming at best and actively damaging the business at worst.

To overcome this problem, data producers must be willing to take ownership of production data and collaborate with data consumers to support high-value use cases.

Data contracts are API-based agreements between producers and consumers designed to solve exactly that problem. Data contracts should be implemented by following the steps below:

1. Identify a production…

I pitched a LOT of internal data infrastructure projects during my time leading data teams, and I was (almost) never turned down. Here is my playbook for getting executive buy-in for complex technology initiatives:

1. Research top-level initiatives: Find something an executive cares about that is impacted by the project you have in mind.

Example: We need to increase sales by 20% from Q2-Q4

2. Identify the problem to be overcome: What are the roadblocks that can be torn down through better infras…

ChatGPT will not replace data engineers. Yes, it can write SQL, but the hard part of data development is understanding how code translates to the real world.

Every business has a unique way of storing data. One customerID could be stored in a MySQL DB. Another could be imported from Mixpanel as nested JSON, and a third might be collected from a CDP. All three IDs and their properties are slightly (or significantly) semantically different and must be integrated into a single table in the Data War…