In the world of AI and machine learning, leveraging embeddings to understand and analyze text data is becoming increasingly important. In this blog, we will explore how to use Azure OpenAI embeddings with the Python SDK to analyze PowerPoint files (.pptx) and efficiently query them using FAISS (Facebook AI Similarity Search) vectorstore.
Prerequisites
Before starting, make sure you have the following:
- An active Azure subscription.
- Azure OpenAI resource created in your Azure portal.
- API Key and Endpoint URL from Azure OpenAI.
- Python 3.6+ installed.
- Required Python libraries: langchain
Setting Up Azure OpenAI
1. Create Azure OpenAI Resource
- Go to the Azure portal.
- Click on “Create a resource” and search for “Azure OpenAI”.
- Follow the on-screen instructions to create the resource.
2. Get the API Key and Endpoint
- Navigate to your Azure OpenAI resource.
- Under the “Keys and Endpoint” section, copy the API key and endpoint URL.
Integrating Azure OpenAI with Python SDK
1. Install Required Libraries
Open your terminal and install the required libraries using pip:
from langchain.llms import AzureOpenAI
from langchain.embeddings import OpenAIEmbeddings
from langchain.document_loaders import DirectoryLoader
from langchain.vectorstores import FAISS
from langchain.chains import ConversationalRetrievalChain
2. Set Up Azure OpenAI Client and Embeddings
Initialize the Azure OpenAI client and embeddings:
global llm
global embeddings
llm = AzureOpenAI(engine=<engine>, model_name="gpt-4", openai_api_key= <openai_api_key>,
model_kwargs={
"api_type": "azure",
"api_version": "2023-03-15-preview",
},
)
embeddings = OpenAIEmbeddings(deployment=<deployment>,
openai_api_key=<openai_api_key>,
model='text-embedding-ada-002',
openai_api_base=<openai_api_base>,
openai_api_type='azure',
chunk_size=1
)
3. Load and Split PowerPoint Files
Use DirectoryLoader
to load and split PowerPoint files into text chunks:
loader = DirectoryLoader(<localFolderPath>, glob='**/*.pptx')
pages = loader.load_and_split()
4. Create and Load FAISS Vectorstore
Create a FAISS vectorstore and load embeddings:
vectorStore = FAISS.load_local(<localFolderPath>, embeddings)
5. Set Up Retriever and Conversational Retrieval Chain
Create a retriever and set up a conversational retrieval chain:
retriever = vectorStore.as_retriever(search_type="similarity", search_kwargs={"k":2})
qa = ConversationalRetrievalChain.from_llm(llm=llm,
retriever=retriever,
return_source_documents=True,
verbose=False,
combine_docs_chain_kwargs={'prompt': qa_prompt})
6. Query the Model
Create a function to query the model with a user’s question:
def ask_question_with_context(qa, question):
# query = "what is Azure OpenAI Service?"
try:
result = qa({"question": question})
print("answer:", result["answer"])
return result["answer"]
except Exception as e:
return str(e)
response = ask_question_with_context(qa, (user_message))