Need help with complex JSONB structures after data sync
I’m working with CloudQuery to move data from Salesforce to PostgreSQL and running into issues with query complexity.
My setup
I have two configuration files for the sync process:
salesforce.yml
kind: source
spec:
name: "salesforce"
path: cloudquery/salesforce
registry: cloudquery
version: "v3.2.1"
destinations: ["postgresql"]
tables: ["*"]
spec:
api_key: "my_secret_key"
batch_settings:
batch_size: 500
retry_attempts: 3
entity_options:
salesforce_contacts:
related_objects: ['accounts', 'opportunities', 'activities', 'campaigns']
salesforce_accounts:
related_objects: ['contacts', 'opportunities', 'activities', 'campaigns']
postgresql.yml
kind: destination
spec:
name: "postgresql"
path: "cloudquery/postgresql"
registry: "cloudquery"
version: "v7.3.2"
write_mode: "overwrite-delete-stale"
spec:
connection_string: "postgresql://admin:password@localhost:5432/crm_data?sslmode=disable"
The problem
The sync works fine but I end up with really complex JSONB columns. In the salesforce_contacts table, there’s a contact_info column like this:
{
"name": "John Smith",
"email_address": "[email protected]",
"phone_number": "+1234567890",
"created_at": "2024-07-15T14:30:22.158Z",
"record_id": "40298374529",
"updated_at": "2024-07-15T16:45:33.901Z"
}
And a related_entities column:
{
"accounts": {
"items": [
{
"record_id": "98765432101",
"relationship": "contact_to_account"
},
{
"record_id": "98765432101",
"relationship": "contact_to_account_primary"
}
]
}
}
I don’t understand why the same account appears twice with different relationship types.
Current query approach
Right now I’m using this complex query to get contacts with their accounts:
WITH account_relationships AS (
SELECT
ct.record_id AS contact_record_id,
ct.contact_info ->> 'name' AS contact_name,
ct.contact_info ->> 'email_address' AS contact_email,
jsonb_array_elements(ct.related_entities -> 'accounts' -> 'items') AS account_relation
FROM
salesforce_contacts ct
)
SELECT
DISTINCT ON (contact_record_id, account_record_id)
ar.contact_record_id,
ar.contact_name,
ar.contact_email,
acc.record_id AS account_record_id,
acc.contact_info ->> 'company_name' AS account_name
FROM
account_relationships ar
LEFT JOIN
salesforce_accounts acc
ON (ar.account_relation ->> 'record_id') = acc.record_id
ORDER BY
contact_record_id, account_record_id;
This gets messy fast when I need other relationships too. Writing queries with multiple JSONB operations is really hard for me.
What I need
Is there a way to transform this data before it goes into PostgreSQL? Or maybe a better approach to structure the data so queries aren’t so complicated?
I’m open to any suggestions on how to make this more manageable.