Tutorial to Create a Data Science Agent: A Code Implementation using gemini-2.0-flash-lite model through Google API, google.generativeai, Pandas and IPython.display for Interactive Data Analysis

0


In this tutorial, we demonstrate the integration of Python’s robust data manipulation library Pandas with Google Cloud’s advanced generative capabilities through the google.generativeai package and the Gemini Pro model. By setting up the environment with the necessary libraries, configuring the Google Cloud API key, and leveraging the IPython display functionalities, the code provides a step-by-step approach to building a data science agent analyzing a sample sales dataset. The example shows how to convert a DataFrame into markdown format and then use natural language queries to generate insights about the data, highlighting the potential of combining traditional data analysis tools with modern AI-driven methods.

!pip install pandas google-generativeai –quiet

First, we install the Pandas and google-generativeai libraries quietly, setting up the environment for data manipulation and AI-powered analysis.

import pandas as pd
import google.generativeai as genai
from IPython.display import Markdown

We import Pandas for data manipulation, google.generativeai for accessing Google’s generative AI capabilities, and Markdown from IPython.display to render markdown-formatted outputs.

GOOGLE_API_KEY = “Use Your API Key Here”
genai.configure(api_key=GOOGLE_API_KEY)

model = genai.GenerativeModel(‘gemini-2.0-flash-lite’)

We assign a placeholder API key, configure the google.generativeai client with it, and initialize the ‘gemini-2.0-flash-lite’ GenerativeModel for generating content.

data = {‘Product’: [‘Laptop’, ‘Mouse’, ‘Keyboard’, ‘Monitor’, ‘Webcam’, ‘Headphones’],
‘Category’: [‘Electronics’, ‘Electronics’, ‘Electronics’, ‘Electronics’, ‘Electronics’, ‘Electronics’],
‘Region’: [‘North’, ‘South’, ‘East’, ‘West’, ‘North’, ‘South’],
‘Units Sold’: [150, 200, 180, 120, 90, 250],
‘Price’: [1200, 25, 75, 300, 50, 100]}
sales_df = pd.DataFrame(data)

print(“Sample Sales Data:”)
print(sales_df)
print(“-” * 30)

Here, we create a Pandas DataFrame named sales_df containing sample sales data for various products, and then print the DataFrame followed by a separator line to visually distinguish the output.

def ask_gemini_about_data(dataframe, query):
“””
Asks the Gemini Pro model a question about the given Pandas DataFrame.

Args:
dataframe: The Pandas DataFrame to analyze.
query: The natural language question about the DataFrame.

Returns:
The response from the Gemini Pro model as a string.
“””
prompt = f”””You are a data analysis agent. Analyze the following pandas DataFrame and answer the question.

DataFrame:
“`
{dataframe.to_markdown(index=False)}
“`

Question: {query}

Answer:
“””
response = model.generate_content(prompt)
return response.text

Here, we construct a markdown-formatted prompt from a Pandas DataFrame and a natural language query, then use the Gemini Pro model to generate and return an analytical response.

# Query 1: What is the total number of units sold across all products?
query1 = “What is the total number of units sold across all products?”
response1 = ask_gemini_about_data(sales_df, query1)
print(f”Question 1: {query1}”)
print(f”Answer 1:\n{response1}”)
print(“-” * 30)
Query 1 Output
# Query 2: Which product had the highest number of units sold?
query2 = “Which product had the highest number of units sold?”
response2 = ask_gemini_about_data(sales_df, query2)
print(f”Question 2: {query2}”)
print(f”Answer 2:\n{response2}”)
print(“-” * 30)
Query 2 Output
# Query 3: What is the average price of the products?
query3 = “What is the average price of the products?”
response3 = ask_gemini_about_data(sales_df, query3)
print(f”Question 3: {query3}”)
print(f”Answer 3:\n{response3}”)
print(“-” * 30)
Query 3 Output
# Query 4: Show me the products sold in the ‘North’ region.
query4 = “Show me the products sold in the ‘North’ region.”
response4 = ask_gemini_about_data(sales_df, query4)
print(f”Question 4: {query4}”)
print(f”Answer 4:\n{response4}”)
print(“-” * 30)
Query 4 Output
# Query 5. More complex query: Calculate the total revenue for each product.
query5 = “Calculate the total revenue (Units Sold * Price) for each product and present it in a table.”
response5 = ask_gemini_about_data(sales_df, query5)
print(f”Question 5: {query5}”)
print(f”Answer 5:\n{response5}”)
print(“-” * 30)
Query 5 Output

In conclusion, the tutorial successfully illustrates how the synergy between Pandas, the google.generativeai package, and the Gemini Pro model can transform data analysis tasks into a more interactive and insightful process. The approach simplifies querying and interpreting data and opens up avenues for advanced use cases such as data cleaning, feature engineering, and exploratory data analysis. By harnessing these state-of-the-art tools within the familiar Python ecosystem, data scientists can enhance their productivity and innovation, making it easier to derive meaningful insights from complex datasets.

Here is the Colab Notebook. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 85k+ ML SubReddit.

Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.



Source link

You might also like
Leave A Reply

Your email address will not be published.