-
-
Notifications
You must be signed in to change notification settings - Fork 861
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
asyncio.run() cannot be called from a running event loop #179
Comments
Hey there, yes Also we will include other web driver like the one provided by Selenium |
Hi, @PeriniM. Thanks for give explanation to this error. But what happen if i don't use I try to run this code in Google Collab and get same error, here's my code: ( I actually copy it from one of your script 👍 ) """
Basic example of scraping pipeline using SmartScraper
"""
import os
from dotenv import load_dotenv
from scrapegraphai.utils import prettify_exec_info
from scrapegraphai.graphs import SmartScraperGraph
load_dotenv()
from google.colab import userdata
gemini_key = userdata.get('Gemini_api_key') # To access my gemini api key in Google Environment
# ************************************************
# Define the configuration for the graph
# ************************************************
graph_config = {
"llm": {
"api_key": gemini_key,
"model": "gemini-pro",
},
}
# ************************************************
# Create the SmartScraperGraph instance and run it
# ************************************************
smart_scraper_graph = SmartScraperGraph(
prompt="List me all the news with their description.",
# also accepts a string with the already downloaded HTML code
source="https://www.wired.com",
config=graph_config
)
result = smart_scraper_graph.run()
print(result)
# ************************************************
# Get graph execution info
# ************************************************
graph_exec_info = smart_scraper_graph.get_execution_info()
print(prettify_exec_info(graph_exec_info)) Can you explain it why? Thank you EDIT: I want to give the whole error: ---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
[<ipython-input-3-9f47dddcb03f>](https://localhost:8080/#) in <cell line: 36>()
34 )
35
---> 36 result = smart_scraper_graph.run()
37 print(result)
38
5 frames
[/usr/local/lib/python3.10/dist-packages/scrapegraphai/graphs/smart_scraper_graph.py](https://localhost:8080/#) in run(self)
107
108 inputs = {"user_prompt": self.prompt, self.input_key: self.source}
--> 109 self.final_state, self.execution_info = self.graph.execute(inputs)
110
111 return self.final_state.get("answer", "No answer found.")
[/usr/local/lib/python3.10/dist-packages/scrapegraphai/graphs/base_graph.py](https://localhost:8080/#) in execute(self, initial_state)
105
106 with get_openai_callback() as cb:
--> 107 result = current_node.execute(state)
108 node_exec_time = time.time() - curr_time
109 total_exec_time += node_exec_time
[/usr/local/lib/python3.10/dist-packages/scrapegraphai/nodes/fetch_node.py](https://localhost:8080/#) in execute(self, state)
86 )
87
---> 88 document = loader.load()
89 compressed_document = [
90 Document(page_content=remover(str(document[0].page_content)))]
[/usr/local/lib/python3.10/dist-packages/langchain_core/document_loaders/base.py](https://localhost:8080/#) in load(self)
27 def load(self) -> List[Document]:
28 """Load data into Document objects."""
---> 29 return list(self.lazy_load())
30
31 async def aload(self) -> List[Document]:
[/usr/local/lib/python3.10/dist-packages/langchain_community/document_loaders/chromium.py](https://localhost:8080/#) in lazy_load(self)
74 """
75 for url in self.urls:
---> 76 html_content = asyncio.run(self.ascrape_playwright(url))
77 metadata = {"source": url}
78 yield Document(page_content=html_content, metadata=metadata)
[/usr/lib/python3.10/asyncio/runners.py](https://localhost:8080/#) in run(main, debug)
31 """
32 if events._get_running_loop() is not None:
---> 33 raise RuntimeError(
34 "asyncio.run() cannot be called from a running event loop")
35
RuntimeError: asyncio.run() cannot be called from a running event loop |
I have the same problem
|
Pls update to the new version |
Hello @VinciGit00 I just installed and I'm getting the same error, I'm running the example from the website, import os
from dotenv import load_dotenv
from scrapegraphai.graphs import SmartScraperGraph
from scrapegraphai.utils import prettify_exec_info
load_dotenv()
openai_key = os.getenv("OPENAI_APIKEY")
graph_config = {
"llm": {
"api_key": openai_key,
"model": "gpt-3.5-turbo",
},
}
# ************************************************
# Create the SmartScraperGraph instance and run it
# ************************************************
smart_scraper_graph = SmartScraperGraph(
prompt="List me all the projects with their description.",
# also accepts a string with the already downloaded HTML code
source="https://perinim.github.io/projects/",
config=graph_config
)
result = smart_scraper_graph.run()
print(result) |
Hey - I ran into smth similar while trying to wrap the smart scraper graph with some fastapi endpoints - what worked for me was to wrap the whole thing with run_in_threadpool from starlette.concurrency - running version 1.2.3 |
Please give the example code |
even I have the same error. I have tried to add the following: import nest_asyncio after which I'm getting a new error: Exception: Connection closed while reading from the driver. |
Encountering this issue too, while trying to run the graph from an async function (in my case a NATS event handler), I found the following workaround. Basically it executes the asyncio event loop on another thread, but waits for the executing in the current event loop. import asyncio
from concurrent.futures import ThreadPoolExecutor
executor = ThreadPoolExecutor()
async def run_blocking_code_in_thread(blocking_func, *args, **kwargs):
loop = asyncio.get_event_loop()
return await loop.run_in_executor(executor, blocking_func, *args, **kwargs)
async def your_async_method():
smart_scraper_graph = SmartScraperGraph(
prompt=...,
source=...,
config=...
)
result = await run_blocking_code_in_thread(smart_scraper_graph.run) Not sure if there are any downsides using this approach, as I am fairly new to working with Python event loops. Looking forward to built-in support |
This answer solved my problem. |
I get this error when using this logic:
from scrapegraphai.graphs import SmartScraperGraph
import json
import asyncio
from loguru import logger
from concurrent.futures import ThreadPoolExecutor
executor = ThreadPoolExecutor()
graph_config = {
"llm": {
"model": "groq/llama3-8b-8192",
"api_key": "....",
"temperature": 0,
},
"embeddings": {
"model": "ollama/nomic-embed-text",
"base_url": "http://localhost:11434"
},
"max_results": 5,
"format":"json"
}
async def read_urls_from_json_async(filename="urls.json"):
"""Asynchronously read URLs from a JSON file."""
loop = asyncio.get_event_loop()
try:
with open(filename, 'r') as file:
urls = await loop.run_in_executor(executor, json.load, file)
return urls
except FileNotFoundError:
print(f"Error: The file {filename} was not found.")
return []
except json.JSONDecodeError:
print("Error: Failed to decode JSON.")
return []
async def run_blocking_code_in_thread(blocking_func, *args, **kwargs):
loop = asyncio.get_event_loop()
return await loop.run_in_executor(executor, blocking_func, *args, **kwargs)
async def get_ad_async(url):
ad_scraper = SmartScraperGraph(
prompt="Extract all relevant data in a structured JSON.",
source=url,
config=graph_config
)
ad = await run_blocking_code_in_thread(ad_scraper.run)
if ad:
logger.info(json.dumps(ad, indent=4))
async def main():
urls = await read_urls_from_json_async()
if urls:
tasks = [get_ad_async(url.get('url')) for url in urls]
await asyncio.gather(*tasks)
else:
print("No URLs to process.")
if __name__ == '__main__':
asyncio.run(main()) |
please add the all the code |
Updated my previous message |
Any idea? |
I am having the same error of this thread when trying to execute the code with Azure OpenAI configuration. This is my code:
|
Avoid to use |
Hi there,
trying to get SmartScraperGraph running on Fast-API.
Config
Error:
Any idea? Thanks
The text was updated successfully, but these errors were encountered: