I need to extract all my HubSpot contacts into a dataframe but I’m running into the API limitation of 100 records per request. Currently I’m only getting the first batch of contacts.
Here’s my current implementation:
import hubspot
from hubspot.crm.contacts import ApiException
hs_client = hubspot.Client.create(api_key="YOUR_API_KEY")
try:
response = hs_client.crm.contacts.basic_api.get_page(limit=100, archived=False)
print(response)
except ApiException as error:
print(f"API call failed: {error}")
How can I implement pagination to iterate through all available contacts? I want to keep making requests until there are no more contacts to retrieve. What’s the best approach to handle the pagination tokens and stop the loop when all data has been fetched?
That pagination approach works well, but watch out for rate limiting - HubSpot’s API quotas are pretty strict. I throw a small sleep between requests to avoid timeouts when dealing with large datasets. Make sure you’re grabbing response.results for the actual contact data, not the full response when building your dataframe. Heads up - deleted contacts can still show up in results depending on your query params, so double-check that archived=False is actually working. When I migrated our contact database, I wrapped the whole pagination loop in try-except to handle network hiccups and pick up from the last successful token.
The Problem:
You are trying to retrieve all your HubSpot contacts using the HubSpot API, but you’re only getting the first 100 contacts due to the API’s pagination limit. Your current code doesn’t handle pagination, preventing retrieval of the full contact list.
Understanding the “Why” (The Root Cause):
The HubSpot API returns contacts in batches (pages) of 100 records by default. To get all contacts, you need to implement pagination: repeatedly call the API, using the after parameter from the previous response to fetch the next page until no more contacts are available. Failing to do this will only retrieve the initial batch of results.
Step-by-Step Guide:
-
Implement Pagination with a while Loop: The core solution involves a while loop that continues until the API response indicates no more contacts are available. The after parameter is used to specify the offset for subsequent requests.
import hubspot
from hubspot.crm.contacts import ApiException
import time #Import to handle rate limits
hs_client = hubspot.Client.create(api_key="YOUR_API_KEY")
all_contacts = []
after = None
while True:
try:
response = hs_client.crm.contacts.basic_api.get_page(limit=100, after=after, archived=False)
all_contacts.extend(response.results)
if not hasattr(response.paging, 'next'):
break # No more pages
after = response.paging.next.after
time.sleep(1) #Pause to avoid hitting rate limits. Adjust as needed.
except ApiException as e:
print(f"An API error occurred: {e}")
# Consider more robust error handling, like exponential backoff
break # Or handle the error differently, e.g., retry after a delay
except Exception as e:
print(f"An unexpected error occurred: {e}")
break
print(f"Total contacts retrieved: {len(all_contacts)}")
#Convert all_contacts (list of dictionaries) to pandas DataFrame as needed
import pandas as pd
contacts_df = pd.DataFrame(all_contacts)
print(contacts_df)
-
Handle Errors and Rate Limits: The code includes error handling using a try...except block to catch ApiException and other potential exceptions. A time.sleep(1) is added to pause for one second between requests to help avoid hitting HubSpot’s API rate limits. Adjust the sleep duration based on your observed API behavior. More sophisticated rate limit handling might involve exponential backoff.
-
Convert to DataFrame: The code includes the conversion of the retrieved contact list into a pandas DataFrame for easier processing.
Common Pitfalls & What to Check Next:
- API Key: Double-check that you’ve replaced
"YOUR_API_KEY" with your actual HubSpot API key.
- Rate Limiting: If you encounter rate limiting errors, increase the
time.sleep() duration or implement a more advanced rate limiting strategy (e.g., exponential backoff).
archived=False: Ensure you only retrieve active contacts. If you need archived contacts, remove or change this parameter.
- Error Handling: The provided error handling is basic. For production code, implement more robust error handling, including logging, retries with exponential backoff, and appropriate error messages to the user.
Still running into issues? Share your (sanitized) config files, the exact command you ran, and any other relevant details. The community is here to help!
check the paging attribute in ur response - it’s got the next token for pagination. just loop while response.paging exists and pass the after parameter: get_page(limit=100, after=response.paging.next.after). works perfectly!
This topic was automatically closed 24 hours after the last reply. New replies are no longer allowed.