Integrating Azure OpenAI with Unstructured Data (Azure OpenAI Chatbot)

In the world of AI and machine learning, leveraging embeddings to understand and analyze text data is becoming increasingly important. In this blog, we will explore how to use Azure OpenAI embeddings with the Python SDK to analyze PowerPoint files (.pptx) and efficiently query them using FAISS (Facebook AI Similarity Search) vectorstore.

Prerequisites

Before starting, make sure you have the following:

An active Azure subscription.
Azure OpenAI resource created in your Azure portal.
API Key and Endpoint URL from Azure OpenAI.
Python 3.6+ installed.
Required Python libraries: langchain

Setting Up Azure OpenAI

1. Create Azure OpenAI Resource

Go to the Azure portal.
Click on “Create a resource” and search for “Azure OpenAI”.
Follow the on-screen instructions to create the resource.

2. Get the API Key and Endpoint

Navigate to your Azure OpenAI resource.
Under the “Keys and Endpoint” section, copy the API key and endpoint URL.

Integrating Azure OpenAI with Python SDK

1. Install Required Libraries

Open your terminal and install the required libraries using pip:

from langchain.llms import AzureOpenAI
from langchain.embeddings import OpenAIEmbeddings
from langchain.document_loaders import DirectoryLoader
from langchain.vectorstores import FAISS
from langchain.chains import ConversationalRetrievalChain

2. Set Up Azure OpenAI Client and Embeddings

Initialize the Azure OpenAI client and embeddings:

global llm
global embeddings
llm = AzureOpenAI(engine=<engine>, model_name="gpt-4", openai_api_key= <openai_api_key>,
                    model_kwargs={ 
                        "api_type": "azure",
                        "api_version": "2023-03-15-preview",
                    },
                )


embeddings = OpenAIEmbeddings(deployment=<deployment>,
                            openai_api_key=<openai_api_key>, 
                            model='text-embedding-ada-002',  
                            openai_api_base=<openai_api_base>, 
                            openai_api_type='azure',
                            chunk_size=1
                        )

3. Load and Split PowerPoint Files

Use DirectoryLoader to load and split PowerPoint files into text chunks:

loader = DirectoryLoader(<localFolderPath>, glob='**/*.pptx')
pages = loader.load_and_split()

4. Create and Load FAISS Vectorstore

Create a FAISS vectorstore and load embeddings:

vectorStore = FAISS.load_local(<localFolderPath>, embeddings)

5. Set Up Retriever and Conversational Retrieval Chain

Create a retriever and set up a conversational retrieval chain:

retriever = vectorStore.as_retriever(search_type="similarity", search_kwargs={"k":2})
qa = ConversationalRetrievalChain.from_llm(llm=llm,
                                        retriever=retriever,
                                        return_source_documents=True,
                                        verbose=False,
                                        combine_docs_chain_kwargs={'prompt': qa_prompt})

6. Query the Model

Create a function to query the model with a user’s question:

def ask_question_with_context(qa, question):
    # query = "what is Azure OpenAI Service?"
    try:
        result = qa({"question": question})
        print("answer:", result["answer"])
       
        return result["answer"]
    except Exception as e:
        return str(e)

 
response = ask_question_with_context(qa, (user_message))