I’m working with a spreadsheet exported from Google Sheets and saved as an .xlsx file. When I use PHPExcel to find the last used column and row, the functions getHighestDataColumn() and getHighestDataRow() return values of 26 columns and 1000 rows respectively, even though most of those cells are blank. It appears that Google Sheets reserves more cells than necessary, causing memory inefficiencies when processing the file. I’ve attempted to loop through the rows and check for non-empty cells, but this method is slow and resource-intensive.
Below is a simplified example of what I tried:
function findLastUsedRow($worksheet) {
$maxRow = $worksheet->getHighestDataRow();
for ($row = $maxRow; $row >= 1; $row--) {
if (!$this->isRowEmpty($worksheet, $row)) {
return $row;
}
}
return 0;
}
function isRowEmpty($worksheet, $row) {
foreach ($worksheet->getRowIterator($row, $row) as $row) {
foreach ($row->getCellIterator() as $cell) {
if ($cell->getValue() !== null && $cell->getValue() !== '') {
return false;
}
}
}
return true;
}
Does anyone have suggestions for a more efficient approach?
hav u tried using the getHighestColumn() and getHighestRow() methods instead? they might give more accurate results for actual data. also, consider trimming the spreadsheet before processing - remove empty rows/columns in google sheets b4 exporting. that cud help with efficiency
I’ve dealt with this exact problem before, and it can be frustrating. Here’s what worked for me:
Instead of relying on getHighestDataColumn() and getHighestDataRow(), I found it more effective to use a reverse iteration approach. Start from the bottom-right corner and work your way up and left until you find data.
Here’s a rough outline of the logic:
Start at the maximum possible row and column (e.g., XFD1048576).
Check cells moving upwards until you find data. This gives you the last row.
From that row, move left until you find data. This gives you the last column.
This method is usually much faster, especially for spreadsheets with a lot of empty cells at the end. It avoids checking every single cell and focuses on finding the actual data boundaries.
Remember to implement appropriate error handling and consider edge cases, like completely empty sheets.
I’ve encountered similar issues with Google Sheets exports. One approach that’s worked well for me is using the PHPExcel ‘calculateWorksheetDimension()’ method. It tends to be more accurate in determining the actual data range.
Another technique is to start from the last cell (XFD1048576 in Excel) and work backwards, checking for content. This can be faster than iterating from the beginning, especially with sparse data.
If performance is still an issue, consider using a different library like PhpSpreadsheet, which is generally more efficient and actively maintained. It has similar methods for finding the highest row and column with data.