I’m working with an API that returns paginated data and I need help combining the results properly. When I fetch data from multiple pages, each response gets saved as a separate JSON array in my file. This creates invalid JSON format with multiple arrays like this:
[{"active":false, "count":12, "content":"sample text here"}]
[{"active":false, "count":12, "content":"sample text here"}]
[{"active":false, "count":12, "content":"sample text here"}]
I want to convert this into a proper CSV file later. What’s the best approach to combine these separate JSON arrays into one valid structure that I can easily work with?
You’re writing each response straight to the file, which breaks JSON syntax. I’ve hit this same issue before - just collect everything in memory first. Parse each API response with json.loads() and add the items to a master list. Use something like all_data.extend(json.loads(api_response.text)) in your loop. Once you’ve grabbed everything, write it all at once with json.dump(all_data, file). You’ll get a single valid JSON array that’s way easier to convert to CSV. Memory usage isn’t usually a problem unless you’re dealing with huge datasets.
don’t append to the file like that - it’ll break your json structure. instead, collect everything in memory first, then write it all at once. store the responses in a list and use json.dump() to save the whole thing as one array.
This is a super common issue with paginated APIs. Instead of writing to file on each loop, collect all the responses in a Python list first, then flatten and write everything at once. Grab the JSON data from each page into one list, then use itertools.chain.from_iterable() or just loop through to flatten all those nested arrays into one big array. You’ll get better error handling too - no more half-written files when something breaks halfway through. Once you’ve got all the data combined, writing to CSV is easy with pandas or the csv module.