Name: Ideenatlas
Creator: Simon Stumpf
License: https://ideenatlas.eu/licenses

The Ideenatlas is an 'idea curation engine'. It solves a fundamental problem: usually, to find something, you need to know what it's called and what you're looking for. Many ideas lack the appropriate technical vocabulary or expertise to be further developed and pursued; by the time you've acquired this, the idea is often forgotten or seems irrelevant.

The Atlas helps to locate these nascent ideas, uncover research gaps, and validate thoughts within the context of millions of scientific papers.

Classic search engines like Google rely on keywords. Without the exact term, relevant results often remain hidden. Academic alternatives like Semantic Scholar often use citation graphs, which favors already popular papers and disadvantages niche topics.

Instead, the Ideenatlas uses a semantic vector space. Evaluation is not based on SEO, popularity, or citation counts, but purely on the mathematical proximity of content. It counts semantic similarity, regardless of the exact wording.

No.

The Ideenatlas is operated as an open research prototype. The goal is to make scientific research transparent and accessible without paywalls or advertising.

Academic research is based on a fundamental paradox: to find relevant literature for a new idea, you must already know the technical vocabulary.

However, even experts often remain trapped in their specific jargon, thereby overlooking relevant research from other disciplines.

This leads to three major hurdles in science:

1. Silo Thinking: Research is extremely discipline-specific. An algorithm from computer science could solve a problem in astrophysics or biology. But because the disciplines speak a different 'language', they do not find each other, and valuable knowledge remains unused.

2. The Popularity Bias: Common search engines rely on keywords or citation networks. These systems inevitably favor what is already popular. An excellent but rarely cited paper from a niche systematically ends up in a blind spot.

3. The AI Trap (Hyper-Relevance & Black Box): Modern RAG systems and LLMs attempt to solve the problem, but are tuned for direct, 'perfect' answers (hyper-relevance). They systematically overlook non-obvious, creative cross-connections. In addition, the final text output of LLMs remains probabilistic. They act as a 'black box' with a risk of hallucination, instead of transparently and deterministically displaying the connections between ideas.

The Ideenatlas solves the keyword dilemma through transparent, semantic vector search. Multimodal user inputs and millions of papers are visually clustered on an interactive 2D map based on content proximity. Instead of rigid text lists, users immediately see how topics are connected and where unexpected cross-connections lie. This preparation, coupled with targeted serendipity, breaks down silo thinking:

Science becomes tangible and boundless.

The Ideenatlas makes science tangible, open, and interdisciplinary. It offers tailored added value for a wide variety of actors:

For...

... Novices & Non-experts: In addition to the appropriate vocabulary, they immediately receive a clear thematic classification. This allows well-founded, new ideas to be formed from simple everyday questions.

... Time-sensitive research: Those who need in-depth, reliable results under time pressure do not have to resort to hallucinating chatbots. Worthwhile, interdisciplinary research is validated and presented at a glance.

... Interdisciplinary research teams: Experts from completely different disciplines find a common thematic intersection, overcome their silo thinking, and can establish novel scientific connections.

... R&D Departments (Scouting): Companies and institutes can specifically track down 'hidden excellence' by finding brilliant but undiscovered works and winning over the bright minds behind them.

Conclusion: From curious students to established researchers, the Ideenatlas is the compass that frees academic research from the pure text desert and enables real discoveries.

The Ideenatlas is currently designed, developed, and operated by Simon Stumpf (see Imprint) as an independent research project and proof-of-concept.

However, this solo project receives essential technical support through the High Performance Computing of the state of Baden-Württemberg: It is only because of the massive computing capacities of the bwUniCluster 3.0 that the complex vectorization and clustering of millions of scientific papers is even feasible for me as a student.

The Vision: In five years, the Ideenatlas will have established itself as an indispensable standard tool for early-stage scientific research. It does not replace established major providers, but acts as the central key player in a completely different, complementary stage of research: as an interactive compass for cross-disciplinary idea generation and thematic orientation.

The Sustainable Impact: We are initiating a paradigm shift: Interdisciplinary collaboration will no longer fail due to technical language barriers. The Ideenatlas turns the breaking down of research silos from a matter of chance into a targeted standard process. It changes how we understand innovation: away from pure keyword search, towards intuitive pattern recognition in a global knowledge landscape.

Development Status in 5 Years:

Database: The pipeline includes the metadata of all relevant open-access repositories worldwide. In addition, strategic partnerships with closed-access providers exist to map the knowledge space holistically and on a scientifically sound basis.

Technology & Infrastructure: The system guarantees full data sovereignty and independence. The Ideenatlas runs on its own servers in Germany and uses an internally hosted, domain-specific LLM (without dependence on external APIs like Gemini).

Corporate Structure & Sustainable Business Model: To permanently secure its status as an ad-free and tracking-free 'Open Tool', the project operates as a non-profit limited liability company (gGmbH). Its financial viability is based on hybrid cross-subsidization: The use of the graphical web interface remains 100% free for all users at all times. The highly scalable operation (server & local LLM) is financed through a freemium model for machine access (REST API & MCP servers). While basic queries remain free for students and researchers, commercial R&D departments and companies pay for high-volume requests for automated data extraction. Supplemented by institutional funding, the Ideenatlas thus becomes a financially self-sufficient, independent common good for the global scientific community.

The Ideenatlas is an AI-supported research and curation platform that arranges scientific documents on an interactive 2D map to make interdisciplinary cross-connections and unexpected solutions ('serendipity') visible - entirely without needing to know the exact technical vocabulary.

The side menu is a global help and navigation menu that can be opened by clicking the hamburger icon or simply by swiping.

It contains settings, help functions, links to the various pages and your local result history.

'Recent' displays the result history. All locally stored results are shown there. They are sorted by access time; if results are starred, they are listed first. Additionally, starred results are not automatically deleted when the local storage limit is reached.

Despite the local history, it is recommended to download important results as HTML to avoid future changes or data loss (e.g., due to updates).

The design button allows you to switch between three different themes: Light, Dark and Midnight. The default 'System' mode detects whether the user's device is set to light or dark mode and selects the corresponding theme accordingly.

On desktop devices, clicking the button directly cycles through the themes. To open the menu, you must hover over the button or click the '>' icon.

After clicking the 'Translate' button, selection mode is activated. Any outlined text can then be clicked to be translated piece by piece into the selected target language.

The translator runs locally on our servers (which is why it may be slightly slower) and supports all selectable languages.

After clicking the 'Read aloud' button, selection mode is activated. Any outlined text can then be clicked. A small audio player will open, allowing you to pause or stop the playback. Additionally, you can navigate to any page of the Ideenatlas via the side menu while the text is being read; the playback will continue. Clicking the player popup itself takes you directly to the text currently being read.

The language is automatically detected, though detection may fail for extremely short texts or single words. Many, but not all, languages are supported.

The selected language determines the application's interface language and the target language for translations. The default 'System' mode detects the system language of the device.

Since all application texts are translated manually, currently only German is available, with English serving as the fallback language.

Clicking the 'Help & FAQ' button and then 'Search' opens a search bar that allows you to access pre-answered questions across the entire application.

Depending on which page you are on, a pre-selection of relevant questions is displayed even before you enter anything into the search bar. Clicking a result opens a popup providing the question and its corresponding answer. Some answers also display UI elements if they are relevant.

The 'Select Element' button activates the selection mode. Any outlined element can then be clicked to call up an explanation from the FAQ.

The FAQ entry is displayed in a popup providing the question and its corresponding answer. Some answers also display UI elements if they are relevant.

This is the ideal use case. The Atlas translates everyday language into a scientific context. You can use rough brainstorming for idea extraction, naive questions, summaries of complex texts, or simple term searches.

The Ideenatlas is multimodal and processes various input formats. Besides the obvious text input, you can directly record and submit audio with a built-in recorder. Furthermore, PDFs, images, videos, or any form of text files can be uploaded. Through idea extraction using the LLM and specifying the query's goal, the relevant information can be processed as text.

The inputs in the text field are optimized by an AI (before being processed locally) to maximize the utility of the answers.

For the AI to formulate the text appropriately, it needs to know what you expect. Depending on the button, it adjusts its focus. It ranges from pure summarization to enriching your idea with scientific terminology.

Choose this if you have a specific question or are still unsure about a topic.

The AI is instructed to underpin your gaps with a solid scientific foundation and generate technical answers that are perfect for the subsequent topic search.

Perfect for unstructured brainstorming or rough trains of thought.

The AI filters out your core ideas, formulates a clear hypothesis from them, and adds (if necessary) suitable scientific terms or methodologies to make your idea discoverable.

Intended for large amounts of text or uploaded documents (like PDFs).

The AI analyzes the text, extracts the central statements and methodologies, and condenses them into a precise summary. This way, you can check if similar research already exists.

Use this if you already have a very precise, finished text and do not want any further interpretation by the AI.

The input is translated strictly objectively into a machine-readable format for the vector search, without adding new context or explanations.

The idea text is a direct response generated by the AI language model (Gemini) based on the user's query. The model was not permitted to use Google Search, nor was it provided with the search results that the user sees. Since the response is based on potentially outdated training data, it may contain inaccuracies.

The response is not intended to be perfect, which is secondary to the purpose of the Ideenatlas. Because it serves merely as a starting point for further processing, even hallucinations are not an issue. Therefore, the user should not view it as a final answer to a question, but rather as a thematic introduction, a primer for the search ahead.

The website is optimized for human users to stimulate critical thinking and foster serendipity. A simple 'let the AI think for me' button would undermine the core philosophy of the Ideenatlas. Our goal is not just to find one or two matching papers, but to enable you to explore the entire context of the knowledge space surrounding your idea.

If an AI summary is required, you can download the results as a Markdown file at any time and provide it to an AI language model with a sufficient context window. We recommend this approach if your goal is to engage in a direct discourse with the data.

The tabs offer different perspectives on the input. Each tab answers a slightly different question, ranging from very specific to broad.

'Your Idea' shows the direct analysis of the idea. The cluster hierarchy there shows where the idea would be categorized. The results return the most directly relevant scientific papers. Here you find the direct answer to a question.

'Related Topics' takes a bird's-eye view and leads to the nearest neighboring research fields. Instead of searching for individual scientific articles, entire topic areas are searched, and results are filtered by them.

'Serendipity' shows content-wise distant but structurally similar topics to break down silo thinking and make solutions from other disciplines visible. Not all results here fit, but when something fits, it is often unexpected and good.

The hierarchy functions like the zoom on a map, from continent to country to city. It clarifies how the input was thematically categorized, from general to specific.

TLDR: The numbers range from 0 to 1, and higher is better.

The Score represents no judgment of quality. It simply measures mathematical proximity in vector space. A high value means strong content agreement with the query. It measures with the same metric as 'Relevance', cosine similarity.

'Confidence' describes how sure the algorithm was when classifying the idea (or scientific work) into the topic field. If it is not absolutely sure (Confidence < 1.0), the idea is likely interdisciplinary or the classification is imprecise.

'Relevance' is a peculiarity of 'Related Topics' and 'Serendipity'; it is the cosine similarity between the input vector and the centroid of the respective topic area. A higher value here means greater similarity and thus more promising results.

'raw JSON' is an expandable area that contains all information for an entry in unformatted JSON.

It contains important information such as authors, publication date or DOI, which is not shown by default because of strongly varying data quality.

Because the data shown in 'raw JSON' is not formatted, it can be helpful when the shown Title or Abstract looks incorrectly formatted.

The BibTeX button copies all known data for an entry to the clipboard.

The data is taken from the raw JSON.

Because of the varying data quality, we advise checking the BibTeX before using it.

The data source helps categorize an entry into a specific field of study.

E.g. PMC (PubMed Central) primarily hosts medical research, while PhilPapers focuses on philosophical data.

Additionally, the source is a clickable link that, much like the entry title, leads you directly to the freely available full text.

Lists are one-dimensional and hide connections. A map reveals patterns that would be lost in a text list, such as dense clusters or isolated outliers... and all at a glance.

The map functions as an atlas of knowledge. Colored clusters represent thematically related areas, empty spaces show a lack of content overlap. Densely packed points indicate a well-researched field, while scattered points point to interdisciplinary connections.

The cloud button allows the current map view to be exported as a high-resolution PNG image, exacctly as it is currently displayed.

The generated images are freely available unter the CC BY-NC-SA 4.0 license. They may be used for non-commercial purposes, provided that the Ideenatlas is cited as the source and distribution occurs under the same conditions.

Navigation is intuitive via the mouse wheel or the +/- keys for zooming, as well as clicking and dragging to pan. To interact with the map, it must be active, meaning it needs the blue border. You activate it by clicking once on the map.

This toggle shows the crosshair, which marks the exact position of your input in the 2D knowledge universe.

It serves primarily for orientation: where is the idea located? Through interaction with the other visualization layers, further connections can be inferred.

This toggle highlights the relevant topic areas in color.

The high contrast allows for immediate recognition of where the topic areas are located, how large (specific) they are, and how they relate to each other.

It toggles the colored outlines of the active topic areas on and off.

This helps to visually delineate the boundaries of topic areas better, especially on maps with many overlapping colors. It is reminiscent of contour lines on a map and makes the structure more tangible.

This toggle displays the names and outlines of the parent, large-scale topic areas.

You can think of it as showing the continents on a world map to get a rough orientation in the vector space.

It displays the names of the topic areas on the map.

This is useful for quickly grasping the exact names of the areas.

However, especially with small, closely packed areas, this can be distracting and cluttered, which is why it is disabled by default. If you cannot find a topic area at a glance because it is too small or inconspicuous, it is worth briefly activating the labels as they stand out prominently.

It marks the positions of the found scientific articles (which appear in the results list below) as small, interactive dots directly on the map.

This makes it possible to see at a glance where the specific hits are located. Their positions allow for several conclusions to be drawn:

The Dense Spot: Are all results close together? Then the idea is firmly located within an established field.

The Scatter: Do the results spread across the entire map? This suggests an interdisciplinary approach connecting different "continents of knowledge".

The Outlier: Is a single neighbor extremely far away from everything else? This could indicate an interdisciplinary cross-connection worth investigating further.

And in conjunction with the crosshair: As a rule, the results from the 'Your Idea' tab should be located directly at the crosshair. However, if they are far from it, this may indicate that no suitable results are in the database. In this case, it may make sense to expand the research to other platforms. In this specific case, however, it is particularly worthwhile to study the results closely, as they can provide new insights and facets to the search.

The Ideenatlas is not a simple ChatGPT wrapper or RAG. Several data processing steps occur between data collection and the final page.

Once the raw data is collected and cleaned, it is converted into a vector using a sentence embedder (Title + Abstract -> Vector) and written to a vector database.

When this is finished, a UMAP model is created to reduce the vectors from 1024 dimensions down to a lower dimension (10). You can imagine this process like taking a picture: the camera transforms the 3D world into a 2D image. This allows us to bypass the curse of dimensionality in the subsequent steps.

Now, clustering takes place. HDBSCAN models are recursively created based on the low-dimensional vectors. First, the entire vector space is clustered. Then, the resulting clusters are clustered again. It is as if we were drawing continents on a map in the first step, followed by countries, states, cities, and neighborhoods.

In the final step, the clusters are prepared for later use. First, the cluster centroids are calculated in their original dimensionality and written to a new vector database. Then, names and descriptions are generated for all clusters. We do not just rely solely on an LLM for this: it is provided with all the important information, such as TF-IDF keywords, cluster size, cluster hierarchy, the keywords of surrounding clusters, and meaningful titles + abstracts of papers from within the cluster.

Finally, the most important element is generated: the map of the atlas. Generating a 2D representation of the vector space using UMAP is a no-brainer. To make it more navigable, cluster outlines are generated. To make this computationally feasible and visually appealing, convex hulls are wrapped around a random selection of 2D vectors from a cluster that has been denoised using DBSCAN.

With every user query, all of these steps are then applied to the user's text.

The Ideenatlas database consists of metadata taken from established Open Access repositories such as arXiv, PubMed Central, or RePEc.

The database and the algorithms to process the data are updated at regular intervals. During the last few months, the frontend was the main focus and I have to reapply for the bwUniCluster 3.0, so the current status of the database is August/September 2025.

Inputs are processed temporarily to perform vector analysis. Permanent storage on the server does not take place. Uploaded files are deleted immediately after analysis completion.

No. External APIs used for text optimization are subject to strict data protection regulations that exclude training with user data. The actual vector search takes place locally on the server.

Yes, data sovereignty is guaranteed. Results can be exported as interactive HTML for offline use, as raw JSON data for further processing, or markdown which is optimized for LLMs.

The history is stored exclusively locally in the browser (IndexedDB) and does not leave the device. Clearing browser data also removes the history.

Frequently Asked Questions (FAQ)

Introduction & Basics

Questions from hei_INNOVATION Ideas Competition 2026

The side menu

The settings buttons

Search & Input

The intent buttons

Understanding the results

The result cards

Visualization in detail

The Analysis Layers (Toggle-Buttons)

Technology & Data

Data, Export & Security

Notice on Data Processing

What is your goal?

Extracted Idea:

Cluster Hierarchy:

Download Options

Record new audio