Important Functions to deploy code on AWS SageMaker

Pranidhi Prabhat
3 min readNov 21, 2020

--

This blog is an extension to my previous blog on How to create and deploy Text Classification on Sagemaker. I will be talking about the various functions that impact the flow of the data once the endpoint is called and how we can change them to suit our needs.

The input_fn, output_fn, predict_fn and model_fnmethods are used by Amazon SageMaker to parse the data payload and reformat the response. These functions can be altered to suit the needs of the project.

In the following example, the input method only accepts ‘text/csv’ as the content-type, but can easily be modified to accept other input formats such as ‘json’. Proper checks about the length of the input file can also be incorporated in this function to determine whether to preprocess training data, which includes the label, or prediction data. For text classification, the incoming data can be cleaned as well in this section.

def input_fn(input_data, content_type):     
# Read the raw input data as CSV.
if content_type == 'text/csv':
df = pd.read_csv(StringIO(input_data),header=None)
if len(df.columns) == len(feature_columns_names) + 1:
#This is for labelled example.
df.columns = feature_columns_names + [label_column]
elif len(df.columns) == len(feature_columns_names):
#This is an unlabelled example.
df.columns = feature_columns_names
return df
else:
raise ValueError("{} not supported by
script!".format(content_type))

The output method in the output_fn returns back in JSON format because by default the Inference Pipeline expects JSON between the containers, but can be modified to add other output formats. The idea is to set the ContentType in such a way that the response is read correctly. Here is the sample of the default output_fn. Putting it here to get a holistic view of all these functions :

def output_fn(prediction, accept):
if accept == "application/json":
instances = []
for row in prediction.tolist():
instances.append({"features": row})

json_output = {"instances": instances}

return worker.Response(json.dumps(json_output), mimetype=accept)
elif accept == 'text/csv':
return worker.Response(encoders.encode(prediction, accept), mimetype=accept)
else:
raise RuntimeException("{} accept type is not supported by this script.".format(accept))

Next is themodel_fn that essentially takes the location of the serialized model and returns the deserialized model back to Amazon SageMaker. Note that this is the only method that does not have a default because the definition of the method will be closely linked to the serialization method implemented in training. While there are numerous methods to this activity, as I was using Scikit-learn for my text classification, I used the joblib library included with Scikit-learn.

def model_fn(model_dir):
"""Deserialize fitted model
"""
preprocessor = joblib.load(os.path.join(model_dir,
"model.joblib"))
return preprocessor

The predict_fn is used to call the model.It takes the input data, which was returned by the input_fn mentioned above, and the deserialized model from the model_fn to transform the source data. The default predict_fn uses predict_fn, and my example case required it as it is so I did not make any changes to this function. There are cases where instead of .predict(), we would require a .transform(). The predict_fn can then be changed in the following way :

def predict_fn(input_data, model):
features = model.transform(input_data)
return features

Essentially these functions need to be altered if there is a case to case requirement for changing the functionalities of these functions. It is very important to understand how these functions work in putting together the flow for deployment of the code.

--

--

Pranidhi Prabhat
Pranidhi Prabhat

Written by Pranidhi Prabhat

Building and sharing solutions !

No responses yet