Automating Data Science Analysis with Flask APIs

Introduction

An API (Application Programming Interface) is a set of rules that allows different software applications to communicate with each other. It enables developers to access specific features or data of an application without exposing its internal code. APIs are widely used to integrate services, share data, and build scalable systems, making them essential in modern software development.

Why Are APIs Essential in Data Science?

APIs are invaluable in data science for integrating and sharing insights seamlessly across systems. They allow data scientists to deploy models, retrieve external data, and provide real-time predictions. Integrating APIs with tools like Excel makes advanced analytics accessible to non-technical users, enabling them to interact with models or datasets directly within spreadsheets for better decision-making.

Let’s explore how it all comes together.

Import Necessary Libraries

Library Name Description
Flask The core class used to create the web application.
jsonify A Flask utility to convert Python objects (like dictionaries) into JSON format, suitable for web APIs
from flask import Flask, jsonify
import pandas as pd

Initialize the Flask App

__name__ is a special Python variable that is set to __main__ when the script is run directly. Flask uses this to determine where to find the resources for your app.

app = Flask(__name__)

Define the Home Route

We define a simple home route (/) that will display “Hello, world!” when you access the root URL of the application. This is just to verify that the Flask app is running correctly.

@app.route("/")
def home():
    return "Hello, world!"

@app.route("/") tells Flask that this function will be triggered when the user visits the root URL of the app (http://localhost:5039/).

Create the Data Endpoint

@app.route("/data")
def get_data():
    Bank_Loan = pd.read_csv('BANK LOAN.csv')
    correlation_matrix = Bank_Loan.corr()
    return jsonify(correlation_matrix.to_dict(orient='records'))

Description of the Dataset

The BANK LOAN.csv file contains the following columns:

Column Name Description
SN Serial number, uniquely identifying each record.
AGE Age of the individual.
EMPLOY Years of employment.
ADDRESS Years at the current address.
DEBTINC Debt-to-income ratio.
CREDDEBT Credit card debt.
OTHDEBT Other forms of debt.
DEFAULTER Indicates whether the individual defaulted on a loan (1 for default, 0 for no default).
  • The get_data function is defined as the route handler for the /data endpoint it reads the BANK LOAN.csv file, calculates its correlation matrix using Pandas, and converts the matrix into a JSON-compatible dictionary with to_dict(orient=‘records’).
  • The jsonify() function then serializes the dictionary into a JSON response, which is returned to the user when they access the endpoint.

Run the Flask App

if __name__ == '__main__':
    app.run(debug=True, port=5039,use_reloader=False)
 * Serving Flask app '__main__'
 * Debug mode: on
  • if __name__ == '__main__': ensures that the app runs when the script is executed.

  • app.run(debug=True, port=5039): This starts the Flask development server on port 5039 (you can choose a different port if needed)

  • This will start the Flask server on http://localhost:5039/.

  • Access the Home Route: Open a web browser and visit http://localhost:5039/. You should see the message “Hello, world!” displayed.

  • Access the Data Endpoint: Visit http://localhost:5039/data. You should see a JSON response containing the correlation matrix of the dataset. It will look something like this:

Conclusion

Congratulations! You’ve built a simple Flask API that serves a correlation matrix from a CSV file, making it easy to share data insights. In this tutorial, you learned how to set up a Flask app, work with data using Pandas, and return data in JSON format. In the next blog, we’ll show you how to integrate this API with Excel using Macros.