Web Retrievers
✦₊⁺ Overview
Web retrievers query online sources at request time and return structured results through mf.Retriever.web(...). The built-in wikipedia provider fetches article content with optional summaries and images.
1. Wikipedia Search
The wikipedia retriever fetches and returns Wikipedia article content at query time. Unlike lexical retrievers, it requires no pre-indexed corpus — it queries the Wikipedia API directly and returns structured results with title, content, and optionally images.
Dependencies
Requires the wikipedia package: pip install wikipedia
Parameters
| Parameter | Default | Description |
|---|---|---|
language |
"en" |
Wikipedia language code ("pt", "es", "fr", …) |
summary |
None |
Number of sentences to return — None returns the full article |
return_images |
False |
Whether to include image URLs in results |
max_return_images |
5 |
Maximum number of image URLs per result |
Examples
import msgflux as mf
retriever = mf.Retriever.web("wikipedia", summary=2)
response = retriever("Eiffel Tower")
print(response.data[0].results[0].data.content)
# Eiffel Tower
#
# The Eiffel Tower is a wrought-iron lattice tower on the Champ de Mars in Paris.
# It is named after the engineer Gustave Eiffel, whose company designed and built it.
import msgflux as mf
queries = [
("pt", "inteligência artificial"),
("es", "aprendizaje automático"),
("fr", "réseau de neurones"),
]
for language, query in queries:
retriever = mf.Retriever.web("wikipedia", language=language, summary=3)
response = retriever(query)
print(response.data[0].results[0].data.content)
import msgflux as mf
retriever = mf.Retriever.web("wikipedia", summary=2)
queries = ["Python programming", "Rust programming language", "Go programming"]
response = retriever(queries, top_k=1)
for i, query in enumerate(queries):
result = response.data[i].results[0]
print(f"\n{query}: {result.data.title}")
print(result.data.content)
import msgflux as mf
retriever = mf.Retriever.web("wikipedia", summary=5)
chat = mf.Model.chat_completion("openai/gpt-4.1-mini")
def answer_with_wikipedia(question: str) -> str:
response = retriever(question, top_k=2)
context = "\n\n".join(
result.data.content
for result in response.data[0].results
)
return chat(messages=[{
"role": "user",
"content": f"Context:\n{context}\n\nQuestion: {question}",
}]).consume()
print(answer_with_wikipedia("How does the James Webb Space Telescope work?"))
import msgflux as mf
retriever = mf.Retriever.web("wikipedia", summary=3)
queries = ["quantum computing", "photosynthesis", "black holes"]
response = await retriever.acall(queries, top_k=1)
for i, query in enumerate(queries):
result = response.data[i].results[0]
print(f"\n{query}: {result.data.title}")
2. SerpApi Search
The serpapi retriever queries SerpApi and returns structured search results from engines such as Google. Use it when you need general web, news, image, shopping, or localized search through SerpApi.
Dependencies
Requires httpx and the SERPAPI_KEY env variable:
pip install httpx
For compatibility, SERPAPI_API_KEY and SERP_API_KEY are also accepted.
Both synchronous and async calls use direct requests to
https://serpapi.com/search.json.
Parameters
| Parameter | Default | Description |
|---|---|---|
engine |
"google" |
Search engine to use, such as "google", "bing", or "yahoo" |
location |
None |
Location for localized results, such as "Austin,Texas" |
gl |
None |
Google country code, such as "us" or "br" |
hl |
None |
Google UI language, such as "en" or "pt" |
safe |
None |
Safe search mode, such as "active" or "off" |
tbm |
None |
Search type, such as "nws" for news or "isch" for images |
Examples
3. Brave Search
The brave retriever queries Brave Search and can return web, news, or image results. Use it when you need search results from Brave with a single provider interface.
Dependencies
Requires brave-search-python-client and the BRAVE_SEARCH_API_KEY env variable:
pip install brave-search-python-client
Parameters
| Parameter | Default | Description |
|---|---|---|
mode |
"search" |
Search mode: "search", "news", or "image" |
return_images |
False |
Whether to include thumbnail image URLs for web/news results |
Examples
4. Tavily Search
The tavily retriever queries Tavily and returns search results optimized for AI applications. It supports search depth, topic filters, time ranges, domain filters, generated answers, images, and raw page content.
Dependencies
Requires tavily-python and the TAVILY_API_KEY env variable:
pip install tavily-python
Parameters
| Parameter | Default | Description |
|---|---|---|
search_depth |
"basic" |
Search depth: "basic" or "advanced" |
topic |
"general" |
Topic category: "general", "news", or "finance" |
time_range |
None |
Time range: "day", "week", "month", "year" or "d", "w", "m", "y" |
include_domains |
None |
Domains to restrict search to |
exclude_domains |
None |
Domains to exclude from search |
include_answer |
False |
Whether Tavily should include an AI-generated answer |
include_images |
False |
Whether to include image results |
include_raw_content |
False |
Whether to include raw page content |
Examples
5. Linkup Search
The linkup retriever queries Linkup and returns AI-oriented web results. It supports standard search, deeper agentic search, domain filters, image inclusion, and sourced answers.
Dependencies
Requires linkup-sdk and the LINKUP_API_KEY env variable:
pip install linkup-sdk
Parameters
| Parameter | Default | Description |
|---|---|---|
depth |
"standard" |
Search depth: "standard" for faster search or "deep" for agentic search |
output_type |
"searchResults" |
Output mode: "searchResults" or "sourcedAnswer" |
include_domains |
None |
Domains to restrict search to |
exclude_domains |
None |
Domains to exclude from search |
include_images |
False |
Whether to ask Linkup to include images |
Examples
6. Exa Search
The exa retriever queries Exa for semantic web search results. It can return URLs only, or fetch page text together with each result for RAG and summarization workflows.
Dependencies
Requires exa-py and the EXA_API_KEY env variable:
pip install exa-py
Parameters
| Parameter | Default | Description |
|---|---|---|
search_type |
"auto" |
Search type: "auto", "neural", "fast", or "deep" |
include_domains |
None |
Domains to restrict search to |
exclude_domains |
None |
Domains to exclude from search |
start_published_date |
None |
ISO date filter for results published after a date |
end_published_date |
None |
ISO date filter for results published before a date |
include_text |
True |
Whether to fetch page text with each result |
max_characters |
None |
Maximum number of text characters returned per result |
Examples
import msgflux as mf
retriever = mf.Retriever.web(
"exa",
include_domains=["python.org", "pypi.org"],
start_published_date="2025-01-01",
include_text=True,
max_characters=2000,
)
response = retriever("packaging metadata standards", top_k=3)
for result in response.data[0].results:
print(result.data.title)
print(result.data.url)
7. arXiv Search
The arxiv retriever searches arXiv papers and returns structured academic metadata such as title, summary, authors, publication dates, categories, and PDF URLs.
Dependencies
Requires the arxiv package: pip install arxiv
Parameters
| Parameter | Default | Description |
|---|---|---|
max_results |
10 |
Maximum number of arXiv results fetched per query |
sort_by |
"relevance" |
Sort criterion: "relevance", "lastUpdatedDate", or "submittedDate" |
sort_order |
"descending" |
Sort order: "ascending" or "descending" |
Examples
import msgflux as mf
retriever = mf.Retriever.web(
"arxiv",
max_results=5,
sort_by="submittedDate",
sort_order="descending",
)
response = retriever("large language model agents", top_k=5)
for result in response.data[0].results:
print(result.data.published)
print(result.data.title)
print(result.data.pdf_url)