{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "You can download the `requirements.txt` for this course from the workspace of this lab. `File --> Open...`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# L2: Create Agents to Research and Write an Article\n", "\n", "In this lesson, you will be introduced to the foundational concepts of multi-agent systems and get an overview of the crewAI framework." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The libraries are already installed in the classroom. If you're running this notebook on your own machine, you can install the following:\n", "```Python\n", "!pip install crewai==0.28.8 crewai_tools==0.1.6 langchain_community==0.0.29\n", "```" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "height": 64 }, "outputs": [], "source": [ "# Warning control\n", "import warnings\n", "warnings.filterwarnings('ignore')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Import from the crewAI libray." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "height": 166 }, "outputs": [], "source": [ "from crewai import Agent, Task, Crew\n", "from crewai_tools import (\n", " #DirectoryReadTool,\n", " #FileReadTool,\n", " #SerperDevTool,\n", " #WebsiteSearchTool,\n", " #DOCXSearchTool,\n", " #RagTool,\n", " TXTSearchTool\n", ")\n", "from datetime import datetime" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- As a LLM for your agents, you'll be using OpenAI's `gpt-3.5-turbo`.\n", "\n", "**Optional Note:** crewAI also allow other popular models to be used as a LLM for your Agents. You can see some of the examples at the [bottom of the notebook](#1)." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "height": 149 }, "outputs": [], "source": [ "import os\n", "from utils import get_openai_api_key\n", "\n", "# Set up API keys\n", "\n", "openai_api_key = get_openai_api_key()\n", "\n", "os.environ[\"OPENAI_MODEL_NAME\"] = 'gpt-3.5-turbo'\n", "os.environ[\"OPENAI_API_KEY\"] = \"sk-proj-8nKBOgxInYCIqidcdSu7T3BlbkFJWac8qZbpFOE2TSn0OpId\"\n" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "height": 132 }, "outputs": [], "source": [ "def call_crew_kickoff(str_current_datetime):\n", " # Instantiate tools\n", " #meeting_trans_docs_tool = DirectoryReadTool(directory='./meeting-transcription')\n", " #brd_temp_docs_tool = DirectoryReadTool(directory='./brd-template')\n", " #file_tool = FileReadTool()\n", " #web_rag_tool = WebsiteSearchTool()\n", " #docx_search_tool = DOCXSearchTool()\n", " #rag_tool = RagTool()\n", " mt_tool = TXTSearchTool(txt='./meeting-transcription/meeting-transcript_' + str_current_datetime + '.txt')\n", " brd_tool = TXTSearchTool(txt='./brd-template/brd-template.txt')\n", "\n", " text_parsing_agent = Agent(\n", " role=\"Text Interpreter\",\n", " goal=\"Parse and interpret the raw text input, structuring it into manageable sections\"\n", " \"or data points that are relevant for analysis and processing.\",\n", " backstory=\"You excel at deciphering complex textual data. You act as the first line of analysis,\"\n", " \"turning unstructured text into organized segments. You should enhance efficiency in data\"\n", " \"handling and support subsequent stages of data processing.\",\n", " tools=[mt_tool],\n", " allow_delegation=True,\n", " verbose=True\n", " )\n", "\n", " data_extraction_agent = Agent(\n", " role=\"Data Extraction Agent\",\n", " goal=\"Identify and extract essential data points, statistics,\"\n", " \"and specific information from the parsed text that are crucial\"\n", " \"for drafting a Business Requirements Document.\",\n", " backstory=\"You should tackle the challenge of sifting through detailed textual data to\"\n", " \"find relevant information. You should be doing it with precision and speed, equipped\"\n", " \"with capabilities to recognize and categorize data efficiently, making it invaluable\"\n", " \"for projects requiring quick turnaround and accurate data handling.\",\n", " tools=[mt_tool, brd_tool],\n", " allow_delegation=True,\n", " verbose=True\n", " )\n", "\n", " brd_compiler_agent = Agent(\n", " role=\"BRD Compiler\",\n", " goal=\"Assemble the extracted data into a well-structured Business Requirements Document,\"\n", " \"ensuring that it is clear, coherent, and formatted according to standards.\",\n", " backstory=\"You are a meticulous Business Requirement Document compiler, You should alleviate\"\n", " \"the burdens of manual document assembly. Ensure that all documents are crafted with\"\n", " \"precision, adhering to organizational standards, and ready for stakeholder review. You\"\n", " \"should be automating routine documentation tasks, thus allowing human team members to focus\"\n", " \"on more strategic activities.\",\n", " tools=[brd_tool],\n", " allow_delegation=True,\n", " verbose=True\n", " )\n", "\n", " text_parsing = Task(\n", " description=(\n", " \"1. Open and read the contents of the input text file.\\n\"\n", " \"2. Analyze the document structure to identify headings, subheadings, and key paragraphs.\\n\"\n", " \"3. Extract text under each identified section, ensuring context is preserved.\\n\"\n", " \"4. Format the extracted text into a JSON structure with labels indicating the type\"\n", " \"of content (e.g., heading, detail).\"\n", " ),\n", " expected_output=\"Structured JSON object containing separated sections of\"\n", " \"text with labels based on their content type.\",\n", " agent=text_parsing_agent,\n", " )\n", "\n", " data_extraction = Task(\n", " description=(\n", " \"1. Take the JSON structured data from the Text Parsing Agent.\\n\"\n", " \"2. Identify and extract specific data points like project goals, technical requirements,\"\n", " \"and stakeholder information.\\n\"\n", " \"3. Organize the extracted data into relevant categories for easy access and use.\\n\"\n", " \"4. Format all extracted data into a structured form suitable for document generation,\"\n", " \"ensuring it's ready for template insertion.\\n\"\n", " ),\n", " expected_output=\"A comprehensive list of key data points organized by category, ready for use in document generation.\",\n", " agent=data_extraction_agent,\n", " )\n", "\n", " compile_brd = Task(\n", " description=(\n", " \"1. Accept the structured and categorized data from the Data Extraction Agent.\\n\"\n", " \"2. Open and read the BRD template for data insertion.\\n\"\n", " \"3. Insert the received data into the respective sections of the BRD template.\\n\"\n", " \"4. Apply formatting rules to ensure the document is professional and adheres to standards.\\n\"\n", " \"5. Save the populated and formatted document as a new markdown file, marking the task as complete.\\n\"\n", " ),\n", " expected_output=\"A complete Business Requirements Document in markdown format, ready for review and distribution.\",\n", " agent=brd_compiler_agent,\n", " output_file='generated-brd/brd_' + str_current_datetime + '.md', # The final blog post will be saved here\n", " )\n", "\n", " crew = Crew(\n", " agents=[text_parsing_agent, data_extraction_agent, brd_compiler_agent],\n", " tasks=[text_parsing, data_extraction, compile_brd],\n", " verbose=2\n", " )\n", "\n", " result = crew.kickoff(inputs={'datetime': str_current_datetime})\n", "\n", " from IPython.display import Markdown\n", " Markdown(result)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Creating Agents\n", "\n", "- Define your Agents, and provide them a `role`, `goal` and `backstory`.\n", "- It has been seen that LLMs perform better when they are role playing." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Agent: Planner\n", "\n", "**Note**: The benefit of using _multiple strings_ :\n", "```Python\n", "varname = \"line 1 of text\"\n", " \"line 2 of text\"\n", "```\n", "\n", "versus the _triple quote docstring_:\n", "```Python\n", "varname = \"\"\"line 1 of text\n", " line 2 of text\n", " \"\"\"\n", "```\n", "is that it can avoid adding those whitespaces and newline characters, making it better formatted to be passed to the LLM." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Agent: Writer" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "height": 217 }, "outputs": [], "source": [ " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Agent: Editor" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "height": 251 }, "outputs": [], "source": [ " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Creating Tasks\n", "\n", "- Define your Tasks, and provide them a `description`, `expected_output` and `agent`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Task: Plan" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "height": 217 }, "outputs": [], "source": [ " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Task: Write" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "height": 234 }, "outputs": [ { "data": { "text/plain": [ "' data_extraction = Task(\\n description=(\\n \"1. Take the JSON structured data from the Text Parsing Agent.\\n\"\\n \"2. Identify and extract specific data points like project goals, technical requirements,\"\\n \"and stakeholder information.\\n\"\\n \"3. Organize the extracted data into relevant categories for easy access and use.\\n\"\\n \"4. Format all extracted data into a structured form suitable for document generation,\"\\n \"ensuring it\\'s ready for template insertion.\\n\"\\n ),\\n expected_output=\"A comprehensive list of key data points organized by category, ready for use in document generation.\",\\n agent=data_extraction_agent,\\n )\\n'" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "''' data_extraction = Task(\n", " description=(\n", " \"1. Take the JSON structured data from the Text Parsing Agent.\\n\"\n", " \"2. Identify and extract specific data points like project goals, technical requirements,\"\n", " \"and stakeholder information.\\n\"\n", " \"3. Organize the extracted data into relevant categories for easy access and use.\\n\"\n", " \"4. Format all extracted data into a structured form suitable for document generation,\"\n", " \"ensuring it's ready for template insertion.\\n\"\n", " ),\n", " expected_output=\"A comprehensive list of key data points organized by category, ready for use in document generation.\",\n", " agent=data_extraction_agent,\n", " )\n", "'''" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Task: Edit" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "height": 200 }, "outputs": [ { "data": { "text/plain": [ "' compile_brd = Task(\\n description=(\\n \"1. Accept the structured and categorized data from the Data Extraction Agent.\\n\"\\n \"2. Open and read the BRD template for data insertion.\\n\"\\n \"3. Insert the received data into the respective sections of the BRD template.\\n\"\\n \"4. Apply formatting rules to ensure the document is professional and adheres to standards.\\n\"\\n \"5. Save the populated and formatted document as a new markdown file, marking the task as complete.\\n\"\\n ),\\n expected_output=\"A complete Business Requirements Document in markdown format, ready for review and distribution.\",\\n agent=brd_compiler_agent,\\n output_file=\\'generated-brd/brd_{datetime}.md\\', # The final blog post will be saved here\\n )\\n'" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "''' compile_brd = Task(\n", " description=(\n", " \"1. Accept the structured and categorized data from the Data Extraction Agent.\\n\"\n", " \"2. Open and read the BRD template for data insertion.\\n\"\n", " \"3. Insert the received data into the respective sections of the BRD template.\\n\"\n", " \"4. Apply formatting rules to ensure the document is professional and adheres to standards.\\n\"\n", " \"5. Save the populated and formatted document as a new markdown file, marking the task as complete.\\n\"\n", " ),\n", " expected_output=\"A complete Business Requirements Document in markdown format, ready for review and distribution.\",\n", " agent=brd_compiler_agent,\n", " output_file='generated-brd/brd_{datetime}.md', # The final blog post will be saved here\n", " )\n", "'''" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Creating the Crew\n", "\n", "- Create your crew of Agents\n", "- Pass the tasks to be performed by those agents.\n", " - **Note**: *For this simple example*, the tasks will be performed sequentially (i.e they are dependent on each other), so the _order_ of the task in the list _matters_.\n", "- `verbose=2` allows you to see all the logs of the execution. " ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "height": 98 }, "outputs": [ { "data": { "text/plain": [ "' crew = Crew(\\n agents=[text_parsing_agent, data_extraction_agent, brd_compiler_agent],\\n tasks=[text_parsing, data_extraction, compile_brd],\\n verbose=2\\n )\\n'" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "''' crew = Crew(\n", " agents=[text_parsing_agent, data_extraction_agent, brd_compiler_agent],\n", " tasks=[text_parsing, data_extraction, compile_brd],\n", " verbose=2\n", " )\n", "'''" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Running the Crew" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Note**: LLMs can provide different outputs for they same input, so what you get might be different than what you see in the video." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "height": 30 }, "outputs": [ { "data": { "text/plain": [ "\"\\n result = crew.kickoff(inputs={'datetime': str_current_datetime})\\n\\n from IPython.display import Markdown\\n Markdown(result)\\n\"" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "'''\n", " result = crew.kickoff(inputs={'datetime': str_current_datetime})\n", "\n", " from IPython.display import Markdown\n", " Markdown(result)\n", "'''" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Display the results of your execution as markdown in the notebook." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "height": 47 }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 9, "metadata": { "height": 30 }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Python 3.11.9\n" ] } ], "source": [ "!python --version" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "import gradio as gr" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "def process_file(input_file):\n", " current_datetime = datetime.now().strftime(\"%Y-%m-%d %H-%M-%S\")\n", " print(\"Current date & time : \", current_datetime)\n", " \n", " # convert datetime obj to string\n", " str_current_datetime = str(current_datetime)\n", "\n", " fh = open(input_file, 'rb')\n", " #data = fh.read()\n", "\n", " # Ensure the target directory exists\n", " output_dir = \"meeting-transcription\"\n", " if not os.path.exists(output_dir):\n", " os.makedirs(output_dir)\n", " \n", " # Save the uploaded file to the specified folder with the specified name\n", " input_filepath = os.path.join(output_dir, \"meeting-transcript_\" + str_current_datetime + \".txt\")\n", " with open(input_filepath, \"wb\") as file:\n", " file.write(fh.read())\n", " fh.close()\n", "\n", " call_crew_kickoff(str_current_datetime)\n", "\n", " # Example of processing: adding Markdown formatting. Replace this with your actual processing.\n", " #processed_text = \"# Processed Output\\n\\n\" + \"\\n\".join(f\"- {line}\" for line in text.splitlines())\n", " \n", " output_filename = \"generated-brd/brd_\" + str_current_datetime + \".md\"\n", " #with open(output_filename, \"w\") as file:\n", " # file.write(processed_text)\n", "\n", " # Read the contents of the generated Markdown file\n", " with open(output_filename, \"r\") as md_file:\n", " markdown_content = md_file.read()\n", "\n", " # Return both the filename for download and the Markdown content for display\n", " return output_filename, markdown_content" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Running on local URL: http://127.0.0.1:7860\n", "Running on public URL: https://555181993525c76e6c.gradio.live\n", "\n", "This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)\n" ] }, { "data": { "text/html": [ "
" ], "text/plain": [ "