Issues with Pipeline: Downloading data from SFTP, converting to JSON, and modifying Airtable entries

I’m in the process of developing a pipeline that runs in Google Cloud and uses Cloud Scheduler for execution at specified intervals. My setup involves accessing an SFTP server to retrieve two XLS files using one module, processing those files into JSON format with another module, and finally sending a PATCH request to Airtable to refresh specific records with a third module. Initially, I considered organizing these functions in separate files, but I opted to consolidate them into a single main.py file, which also includes a requirements.txt for dependencies.

The main entry point of the Cloud Function is defined as follows:

import os
import pysftp
import pandas as pd
import requests
import json

def fetch_data_for_analysis(ftp_directory, file_names):
    '''
    Connects to an SFTP server, retrieves specified XLS files, and reads them into dataframes.
    '''
    HOST = os.environ.get('SFTP_HOST')
    USER = os.environ.get('SFTP_USER')
    PASSWORD = os.environ.get('SFTP_PASS')
    PORT = int(os.environ.get('SFTP_PORT'))

    data_frames = []
    options = pysftp.CnOpts()
    options.hostkeys = None

    with pysftp.Connection(HOST, username=USER, password=PASSWORD, port=PORT, cnopts=options) as sftp:
        print('Connected to SFTP')
        files = sftp.listdir()
        for file_name in file_names:
            with sftp.open(ftp_directory + file_name) as file:
                df = pd.read_excel(file)
                data_frames.append(df)
    return data_frames

However, I face the following error:

500 Internal Server Error: The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.

In the logs, I see this snippet:

File "/layers/google.python.pip/pip/lib/python3.11/site-packages/flask/app.py", line 2073, in wsgi_app
    response = self.full_dispatch_request()

Hey Ethan99! Have you checked the file permissions on your SFTP or tht Airtable key access rights? Sometimes misconfigs there throw server errors. Also, see if Flask debug mode gives more clues. Debugging these can often reveal hidden issues in the functinal flow.

You might be missing exception handling in your code. Try adding try-except blocks around the SFTP connection part to catch potential login or network issues. That way, you can log the actual problem and prevent a generic error response from the server.

It could be a timeout issue when connecting to the SFTP server, especially if the server is slow or there’s a network lag. Try increasing the connection timeout setting in pysftp and see if that resolves the error. Additionally, ensure that the server’s IP is whitelisted if you’re running this code on a cloud platform that might restrict outbound connections. This might not be an obvious issue since it works fine locally but can cause problems when deploying on cloud infrastructures.