Stories matter

Explorian Devlog – Ep 1. The story

Listen

I am a passionate learner. Discovering new concepts, ideas, and their intricate connections is one of my favorite things. It enhances my understanding of our world and improves my ability to navigate it.

The best way to learn

You have a map when you visit a new place. It shouldn’t be much different when learning something new. A concept map outlining the key ideas and their connections is beneficial. Think of it like a classic map with intersections and the roads connecting them. A great map is abstract, brief, and provides directions and major landmarks at a glance.

That’s Explorian for humanity’s knowledge.

Explorian

Imagine a combined dictionary and thesaurus for concepts. Not just words but meaningful ideas. It gives you a brief definition and also lets you navigate to related ideas. It enables you to explore and navigate knowledge. Just choose a starting point and explore. Try it out and see if you find anything interesting.

Do you want to know the concorde effect? Are you interested in inflation and related concepts like debt monetization or quantitative easing? Or would you love to browse cognitive biases? Explorian is a tool to explore and learn about new ideas quickly and effectively. And it isn’t just the ideas, but their surroundings as well.

Explorian is a map of knowledge; however, one must not mistake the map for the territory.

Alternatives

There are, of course, alternatives, and one must not ignore them. A few come to mind immediately: encyclopedias and dictionaries, Wikipedia, and large-language model interfaces like ChatGPT, Bard, or Perplexity AI.

To be fair, I used all of these resources when building Explorian. They are great for their purpose. Encyclopaedias and dictionaries are carefully curated and give exact answers but lack meaningful connections to other concepts. Wikipedia is curated collective knowledge featuring links to related ideas. It is anything but brief and conscious. It has too much noise. It is more like a beautiful painting of a city than a map. GPT models come very close to Explorian. No wonder they are the heart of the engine as well. However, they lack a purpose-built interface: data structure, easy navigation around concepts, saving, recalling, etc.

Explorian is simply a purpose-built interface over large language models.

Key features

Speaking of purpose, Explorian is minimalistic and tries to get out of the way. It is built around two straightforward use cases.

I want to know what something is

This is an essential information lookup. You could try define inflation briefly and consciously with ChatGPT. In fact, that’s very close to the prompt I am using.

Inflation refers to the sustained increase in the general price level of goods and services in an economy over a specific period. It represents a decrease in the purchasing power of money, as more money is required to purchase the same amount of goods or services. Inflation can occur due to increased demand, rising production costs, changes in government policies, or changes in the money supply. It is an important economic indicator affecting consumers’ purchasing power, savings, investments, and overall financial stability.

OpenAI’s gpt3.5-turbo

What else is nearby?

When you learn, you don’t just want to understand a concept. You want to know how to place it in the grand scheme. It helps to know its surroundings: related and similar ideas.

Give me 10 related concepts to inflation.

Deflation
Consumer Price Index (CPI)
Hyperinflation
Monetary policy
Demand-pull inflation
Cost-push inflation
Phillips curve
Stagflation
Wage-price spiral
Quantitative easing (QE)

OpenAI’s gpt3.5-turbo

Wouldn’t it be nice if you could navigate these related concepts?

The first steps

The idea to build Explorian came to me during our AI Fundamentals training. We discussed SudoLang, and I started making Explorian using SudoLang. I had the basics done in a few minutes.

# Humanity's knowledge graph explorer

Let's roleplay. You are a world-class expert in all fields discovered by humanity, as well as
a world-class mentor and tutor. Your task is to help the chat user explore the
topic, using the following program:

Teacher {
    style: [clear, friendly, patient, expert, knowledgeable]
    length: concise, avoid unimportant words or details

    teach() {
        currentConcept = getInput("Staring concept: ")

        while (true) {
            log(currentConcept)
            explain(currentConcept):length=concise:detail=clear;
            getRelatedConcepts(currentConcept) |> log:length=compact:as=orderedList
        }

        currentConcept = getInput("Next: ");
    }
}

  function getRelatedConcepts(concept, n=5):length=medium;

  function getInput(prompt) {
    log(prompt):length=compact;
    STOP and wait for the user's response.
    return response
  }
}

Try it out! Paste it in a ChatGPT window using GPT4 and let your jaws drop.

HUMAN:
<the above code snippet>

ASSISTANT:
Sure, let's begin.
Starting concept:

HUMAN:
inflation

ASSISTANT:
**Inflation**

Inflation is an economic term that refers to an overall increase in prices and a fall in the purchasing value of money. It is typically measured as an annual percentage increase. As inflation rises, every dollar you own buys a smaller percentage of a good or service. Central banks attempt to limit inflation and avoid deflation to keep the economy running smoothly.

Related concepts:

1. Hyperinflation
2. Deflation
3. Consumer Price Index (CPI)
4. Monetary policy
5. Stagflation

Next:

HUMAN:
5

ASSISTANT:
**Stagflation**

Stagflation is a condition of slow economic growth and relatively high unemployment, or economic stagnation, accompanied by rising prices or inflation. It presents a dilemma for monetary policy since actions designed to lower inflation may exacerbate unemployment and vice versa.

Related concepts:

1. Unemployment
2. Keynesian economics
3. Supply shock
4. Recession
5. Inflation targeting

Next:

This is insane. You are bound by the ChatGPT window, but still. It is cumbersome to navigate and slow. You can’t save favorites, etc. The next version was a few API calls in Python using similar prompts and some json.dumps() in a for loop. At that point, I knew I was onto something. Before advancing, I had a couple of questions I wanted answered.

Feasibility questions

Could embedding be a good enough indicator of related concepts?

It would be nice if the only core storage needed was for the concept and embedding vector. While it worked relatively well, I got more meaningful results from directly asking the model using this prompt:

List related topics of <X> without description or definition.

It resulted in 10-20 highly relevant new concepts per call. I also expected synonyms and plurals to have virtually the same embedding. To my surprise, this was a naive expectation, far from the truth. This problem of non-exactness crept back during development. I’ll touch on this in a later entry. For now, I decided to save the connections and use them as a primary source of relatedness.

Could I get away with GPT3.5 or do I need GPT4?

GPT4 has incredible capabilities. However, Explorian doesn’t need those, at least not in v0.1. My tests clearly indicated that GPT3.5 works well enough for my use cases. I had to put some parser logic in, but nothing fancy. gpt4 could generate valid JSON and understand SudoLang, but gpt3.5-turbo’s input token cost is 20x, and its token generation cost is 30x cheaper than GPT4. And this brings me to costs.

What would this cost?

This one was easy. An average input prompt is ~15 tokens for definition and ~15 for the related concepts. Answers are ~100 tokens for the definition and ~110 for the related concepts. Overall, it’s 30 input tokens and 210 generated tokens.

Using gpt3.5-turbo, that’s (30/1000*0.0015)+(210/1000*0.002) = $0.000465 / concept.

Or about $5 / 10k concepts.

And the numbers are overestimates, so this won’t bankrupt me. Not yet, anyway.

So what would the architecture look like?

I am a big fan of evolving architecture. Starting with something very simple would allow me to understand the domain better and see where bottlenecks might occur. Of course, you could spend days or weeks planning various features and scenarios. In my experience, however, an agile mindset works better as we are relatively bad at predicting future needs. Therefore, the basic architecture is a web service and a database.

And that’s an evolution from the JSON files for persistence and text editor for viewing 🙂. More on this in the next episode.

Let’s get started

This project was feasible with significant wins, at least for me.

And it started. My initial commit occurred on the 15th of May 2023.


Posted

in

by

Tags: