I need help setting up a data warehouse using HubSpot CRM as my source
I’m working on extracting data from our HubSpot CRM system and want to build a proper data warehouse structure. The main tables I’m pulling include:
Agreements
Customers
Opportunities
Prospects
I’m struggling with how to design the dimensional model properly. What would be the best approach for creating fact tables and dimension tables from this CRM data? Should I treat Opportunities as my main fact table and use Customers as a dimension?
Any guidance on the schema design would be really helpful. I want to make sure I’m following best practices for this type of CRM data warehousing project.
When I faced a similar situation during my CRM migration last year, a key step was to create distinct fact tables for various business processes instead of consolidating everything into one structure. For your scenario, it makes sense to utilize Opportunities as the central sales fact table, but consider implementing an activity fact table if tracking interactions is part of your strategy.
A significant challenge with HubSpot data is correctly addressing custom properties and deal stages. I established a separate dimension for deal stages, incorporating effective dating since these can evolve over time. Additionally, HubSpot’s contact merge feature can complicate your dimension tables, so you’ll need to factor this into your ETL process. Using the HubSpot contact ID as your natural key while generating your surrogate keys for the warehouse is advisable.
Initially, I overlooked the many-to-many relationship between contacts and companies, which required me to create a bridge table based on reporting needs.