Infrastructure Needed for Future Agents
Today, we discuss the use of generative AI, especially in text generation, and identify some pain points related to Retrieval-Augmented Generation (RAG) and AI search engines:
Why is AI support for professional work still unsatisfactory?
Is AI search engine the optimal solution to hallucinations?
What kind of infrastructure do we need to imagine a future ecosystem based on agents?
Why is AI Support for Professional Work Still Unsatisfactory?
According to the survey “Evaluation of Retrieval-Augmented Generation: A Survey,” # Translations of Key Metrics for Retrieval-Augmented Generation
The definitions of the various metrics are summarized as follows:
In April of this year, SuperCLUE released the SuperCLUE-RAG (SC-RAG) benchmark for Chinese native retrieval-augmented generation and provided the first round of evaluation results, which indicated that only GPT-4 with Vision barely passed the test. Other third-party applications with RAG capabilities (such as Perplexity, FastGPT, Coze, Dify, etc.) were not tested.
From my recent testing of FastGPT, Coze, and Perplexity, I concluded that:
For small private models, workflows need to be broken down into finer tasks for better performance.
Decomposing workflows allows the use of different models at various stages, reducing costs.
The effectiveness of RAG cannot compete with very long contexts; when a model supports 1 million tokens, context should be prioritized.
The effectiveness of RAG on the enterprise side depends heavily on data governance, ingestion, and categorization (OpenAI’s acquisition of a data warehouse company targets this area).
Is AI Search Engine the Optimal Solution to Hallucinations?
When discussing RAG, AI search engines cannot be overlooked. With Bing integrating LLM capabilities, the narrative that LLMs could replace search engines has gained traction. However, while guiding interns recently, I noticed that AI search still lacks the ability to replace fundamental information retrieval skills.
For individuals lacking sufficient background knowledge or information discernment, AI search engines often lead them astray. Thus, I spent considerable effort teaching them basic information retrieval skills.
I emphasized that cross-validation cannot rely solely on AI; it reports what is available online. Even if a search engine returns subpar results, LLMs may present them eloquently.
This led to two key points:
In the AI era, retrieval and analysis skills are not outdated; they are increasingly essential.
Professional knowledge retrieval should start not through search engines but via personal or specialized knowledge bases.
General AI search engines heavily depend on search engines, where results are influenced by domain authority and SEO. Specialized topics require dedicated information sources and custom workflows that culminate in a truly effective agent.
Imagine a future where individual agents may include financial agents, social agents, knowledge agents, career agents, psychological agents, health agents, etc.
Financial Agents assist in managing economic status and income and expenditures.
Social Agents record interactions and remember relationships.
Knowledge Agents help manage learning and can inspire new insights.
Career Agents aid in career planning and networking.
Psychological Agents offer private counseling.
Health Agents help manage diet, exercise, and sleep.
What Kind of Infrastructure Do We Need for a Future Agent Ecosystem?
With such a variety of specialized agents, what foundational infrastructure do they require?
Undoubtedly, they will need input mechanisms similar to human senses, along with LLM capabilities to “think” on our behalf. This represents the inevitable trend of machines supporting humanity.
To enable LLM capabilities effectively, each agent requires a comprehensive, privatizable, and customizable data infrastructure. This implies that specialized agents rely primarily on personal knowledge bases and professional databases constructed from authoritative sources, rather than directly obtaining information from search engines. The capabilities of this data infrastructure are crucial for replacing traditional search engines.
Personal Knowledge Base and Professional Database
These can be considered as a unified “knowledge management system,” which is intuitively represented through various note-taking and research tools like EndNote, Zotero, Notion, Obsidian, Logseq, etc.
Key Functions of Zotero
Literature Collection and Import: Importing literature manually or through connectors.
Literature Organization and Management: Classification, tagging, and advanced retrieval.
Literature Citation and Formatting: Batch exporting references.
Synchronization and Backup: Utilizing Zotero’s cloud services.
Collaboration Features: Collaborative repositories.
Advanced Functions: API, RSS subscription, and plugin ecosystem.
Key Functions of Obsidian
Local Storage: Complete local data storage.
Bidirectional Linking and Knowledge Graph: Creating connections between notes.
Note Organization and Management: Organizing and managing notes effectively.
Plugin and Theme Extensions: Customizable extensions and themes.
Powerful Editor: Advanced editing capabilities.
Multi-window Support: Column views and multi-window capabilities.
Tagging and Search Features: Efficient tagging and searching functionalities.
Envisioning a Smart Agent Integrated with Zotero and Obsidian
If Zotero (main input) and Obsidian (main output) were enhanced with agent capabilities, what could we achieve?
Consider an information source connected to Zotero; previously, this relied on human effort. For example, when conducting research, one would batch search for papers and relevant data. But could a Search Agent assist in finding related information from specialized sources (prioritizing non-search engine inputs) and rank this information based on metadata and research preferences?
When organizing content in Obsidian, could tasks like categorization and tagging be assigned to the agent? Once a substantial amount of data accumulates, the agent could generate insights or knowledge graphs.
Thus, a knowledge system designed to support diverse agents must offer:
Input Mechanism: Access to multimodal information from various sources, including the web, files, databases, and user inputs.
AI Information Toolbox: A persistent monitoring and retrieval system for crucial information sources via the knowledge management system.
RAG Module: A reliable RAG module with search mechanisms, self-evolution capabilities, and LLM-enhanced semantic capabilities.
Additional Intelligent Modules: Smart features like personalized learning assistants, knowledge summaries, and intelligent reporting.
At a foundational level, one must establish a complete data warehouse solution, encompassing data collection, cleaning, processing, governance, as well as analysis and reporting functionalities.
In the classic paper “Overview of Big Data Systems and Analytical Techniques,” we see the complexity of a big data system, which includes various processing methods like batch processing, stream processing, interactive processing, and graph processing.
While not intending to complicate things, developing a full-scale big data system for individual use may be excessive. Therefore, let’s assume the scenario involving an Agent-based version of Zotero.
What Could an Agent’s Query Language Be?
If SQL is the query language for databases and natural language serves for human queries, what might be the query language for agents?
You might ask: Why not simply let an LLM respond without external data? If LLM updates become cost-effective enough for daily re-training on full datasets, does external data still matter?
While the cost of retraining LLMs isn’t decreasing rapidly, leveraging professional materials within context via prompts enhances LLM outputs. Thus, RAG will likely continue to be relevant for a significant time, needing iterative enhancements.
An agent’s query language may be a chain of thought or workflow — something self-explanatory or customizable, where instructions generate insights or more specialized workflows that agents can handle.
For instance, consider a venture capital professional seeking the latest funding insights, having a dialogue with the agent:
Q: Which startup has secured over ten million in funding recently?
A: Lists several startups…
Q: What round is Company X in? Which firms have invested?
A: C round, with participation from…
Q: What is the founder’s background?
A: Founder with experience from XXX…
Q: Compile this into a current investment report.
A: Outputs a structured Word document.
Q: Well done, consolidate all steps into a repeatable workflow.
A: Okay, I’ve…
Such interactions are increasingly realized in platforms like Perplexity or Mita AI, though the final step of workflow encapsulation remains absent. Ultimately, the quality of the investment report heavily relies on the initial data provided by the agent.
Perhaps We Can “Turn Websites into APIs”?
Imagine converting your entire collection of websites into an automated service that collects and localizes relevant information, possibly transforming it into API services. Continuously gathering high-quality data from various sources can establish excellent information repositories across different fields.
Many high-quality information sources exist, potentially in your bookmarks or shared by expert analysts. If you only need the latest AI information, prioritize sources like arXiv, GitHub, Hugging Face, and leading companies’ blogs.
When your AI information agent needs to compile daily summaries, it must pull from these primary sources. For example, the AI News email service has already made strides in this area, providing a recap of trending AI topics through various platforms like Twitter, Reddit, Discord, etc.
In a programming context, I prefer frameworks like DSPy, which have recently outlined their roadmap on GitHub.
Envisioning “Turning Websites into APIs” would enable you to receive real-time updates from web pages. Even without incorporating historical data storage for now, this should be a fundamental requirement for building a personal knowledge agent.