Sitemap

Guessing User Intent

11 min readJun 9, 2025

--

I work for a company where we develop a platform for working with data. The Keboola platform is very powerful and allows many different use cases to be handled. This is awesome. On the other hand — the debates starting with “what the user wants” explode very quickly.

To give an example — there are platforms solely for orchestrating ETL workflows, then there are platforms solely for orchestrating ELT workflows. In our platform you can do both or anything else in between or any other workflow you can think of. But it’s not just for orchestrating data workflows, you can do ad-hoc analysis as well and if you want to host data apps, well that can be done too. So, why are people coming to the application?

That’s a question that hunts most of the product people of more complex applications. Of course, if you’re the product lead at a Puzzle game, then you can be quite confident that people come there to play puzzle games, or possibly watch someone play puzzle games, or possibly chat, but that’s mostly it.

Compared to that we have absolutely no idea. So I decided to build a user action summarizing machine® to guess user intent. There are obviously tools that aim for this (e.g. Hotjar et al.). However my goal is not find how the user interacted with the application, but why he came to the application.

Ok, maybe I have at least a vague Idea of what I’m going to do

I don’t have a list of users intents. What I have is an extensive audit data from existing projects. These are provided within the platform as events or versions tied to an object (configuration, table, …).

For the sake of simplicity let’s assume that the user wants to configure an ETL workflow extracting data from Google Analytics and internal MySQL database, merge them and send them to Power BI. The close-to-infinite number of mouse clicks and mouse moves will be transformed to the following records:

  • MySQL extractor configuration version created (meaning that the configuration was created)
  • MySQL extractor configuration version created (meaning that the configuration was edited)
  • MySQL extractor configuration job was run (meaning that the user tried the configuration)
  • Table created by the MySQL extractor configuration was deleted (meaning that it was not what the user wanted)
  • MySQL extractor configuration version created (meaning that the configuration was edited)
  • And so on…

When I put it this way, it’s probably crystal clear, but will it work the other way round? There are more than 200 million of jobs, 32 million of versions of 8 million of configurations and over 800 million events in the database. By the way, most of the events happen from automated jobs with no user interaction.

There isn’t any session tracking in the application for this purpose (obviously there is a session for logged user, but that can last days and is not really related to a person using the application). Let’s borrow simple techniques from analyzing web traffic from ancient times when Google Analytics were yet to be discovered.

I used Time-Based Session Grouping analogy — i.e. actions from a single user ID in a single project ID within a certain time window can be grouped as a session. Combined with Idle Time Detection — a gap between requests exceeding a threshold indicates the end of a session.

First idea was to process these in the Snowflake DB where I have a sample of the audit data, but this collides with the above idea. I need to process all the records from different tables as a strict sequence of events with some semi-structured data of the associated objects. There are some heavily cut down sample data in the repository. I tried to simplify the schema to the core columns essentially limiting the object representation into a single field:

  • configurations — content
  • configuration rows (parts of configurations) — content
  • jobs — status
  • events — type

The content of the configuration (or configuration row) is an arbitrarily complex JSON stuff (described a bit more in an unrelated article). Job status is success or error and event type is one of about 30 types of events. Ok, so I threw all this on one pile. A small meaningful session constitutes of about 50 items (configuration versions, jobs and events) at least.

Running this through a LLM to summarize didn’t yield meaningful result — one more trick was needed. I had to pre-process the configuration content (the JSON) to find the initial and final state and the difference between the two. Because that is what matters in the end — I do not care if the user changed the configuration 20 times, I care that he extracted the account table. So from a set of records like this (these are my own example actions):

config_row,2023-12-04T15:48:54,ex-db-snowflake_6534_77846, configuration_row_json"": ""{\""processors\"":{},\""parameters\"":{\""columns\"":[],\""primaryKey\"":[],\""incremental\"":false,\""outputTable\"":\""in.c-keboola-ex-db-snowflake-653405026.ACCOUNT\"",\""table\"":{\""schema\"":\""HELP_TUTORIAL\"",\""tableName\"":\""ACCOUNT\""}}}"", 
config_row,2023-12-04T15:48:54,ex-db-snowflake_6534_38777, configuration_row_json"": ""{\""processors\"":{},\""parameters\"":{\""columns\"":[],\""primaryKey\"":[],\""incremental\"":false,\""outputTable\"":\""in.c-keboola-ex-db-snowflake-653405026.USER\"",\""table\"":{\""schema\"":\""HELP_TUTORIAL\"",\""tableName\"":\""USER\""}}}""
config_row,2023-12-04T15:48:54,ex-db-snowflake_6534_61553, configuration_row_json"": ""{\""processors\"":{},\""parameters\"":{\""columns\"":[],\""primaryKey\"":[],\""incremental\"":false,\""outputTable\"":\""in.c-keboola-ex-db-snowflake-653405026.OPPORTUNITY\"",\""table\"":{\""schema\"":\""HELP_TUTORIAL\"",\""tableName\"":\""OPPORTUNITY\""}}}""
config_row,2023-12-04T15:48:54,ex-db-snowflake_6534_62112, configuration_row_json"": ""{\""processors\"":{},\""parameters\"":{\""columns\"":[],\""primaryKey\"":[],\""incremental\"":false,\""outputTable\"":\""in.c-keboola-ex-db-snowflake-653405026.DATATYPES\"",\""table\"":{\""schema\"":\""HELP_TUTORIAL\"",\""tableName\"":\""DATATYPES\""}}}"",
config,2023-12-04T15:48:55,ex-db-snowflake_6534, configuration_json: ""{\""parameters\"":{\""db\"":{\""port\"":443,\""host\"":\""example.snowflakecomputing.com\"",\""user\"":\""HELP_TUTORIAL\"",\""#password\"":\""secret\"",\""database\"":\""HELP_TUTORIAL\"",\""schema\"":\""HELP_TUTORIAL\"",\""warehouse\"":\""DEV\""}}}"",
config,2023-12-04T15:48:55,ex-db-snowflake_6534, configuration_json: ""{\""parameters\"":{\""db\"":{\""port\"":443,\""host\"":\""example.snowflakecomputing.com\"",\""user\"":\""HELP_TUTORIAL\"",\""#password\"":\""secret\"",\""database\"":\""HELP_TUTORIAL\"",\""schema\"":\""HELP_TUTORIAL\"",\""warehouse\"":\""DEV\""}}}"",
... 50 more ...

I made it to this:

{
"modified_configuration_rows": [
{
"id": "3082_ex-db-snowflake_6534_77846",
"config_id": "3082_ex-db-snowflake_6534",
"initial_state": {
"processors": {},
"parameters": {}
},
"final_state": {
"processors": {},
"parameters": {
"columns": [],
"primaryKey": [],
"incremental": false,
"outputTable": "in.ex-db-snowflake-6534.ACCOUNT",
"table": {
"schema": "HELP_TUTORIAL",
"tableName": "ACCOUNT"
}
}
}
}
],
"modified_configurations": [...],
"created_configuration_rows": [...],
"created_configurations": [...],
"deleted_configuration_rows": [...],
"deleted_configurations": [...],
"affected_tables": [...],
"executed_jobs": [...]
}

And then aggregated further to this:

{
"created_configurations": [],
"modified_configurations": [
"Modified a ex-db-snowflake configuration with updated parameters: {\"db\": {\"port\": 443, \"host\": \"example.snowflakecomputing.com\", \"user\": \"HELP_TUTORIAL\", \"#password\": \"secret\", \"database\": \"HELP_TUTORIAL\", \"schema\": \"HELP_TUTORIAL\", \"warehouse\": \"DEV\"}}",
...
],
"created_configuration_rows": [],
"modified_configuration_rows": [
"Modified 6 configuration rows for configuration 3082_ex-db-snowflake_6534"
]
}

And only then the summarization of the user actions started to work nicely. the above would be summarized to (abbreviated) :

The user’s primary intent was to construct and execute a multi-stage data pipeline using Keboola’s orchestrator. This involved extracting data from diverse sources, followed by a Python-based transformation to process some of the extracted data. The overall process was iterative, involving the sequential addition of components and configurations to the orchestrator. The user’s journey can be broken down into the following key steps and their fulfillment: <snip> A keboola.ex-db-snowflake extractor (config 6534) to fetch data from a Snowflake database (example.snowflakecomputing.com, HELP_TUTORIAL schema). The user defined six configuration rows to extract specific tables: ACCOUNT, USER, OPPORTUNITY, DATATYPES, countries, and cars. While the Snowflake connection parameters were reviewed multiple times (evidenced by several identical configuration change entries), they remained unchanged, suggesting careful verification rather than debugging.” <snip>

Which is quite accurate description of what I was doing. I prefer more concise language but that can be improved with some prompt tuning. I added a couple of tweaks to the result:

{
"start_time": "2023-12-04T15:46:07+00:00",
"end_time": "2023-12-04T15:59:45+00:00",
"token_id": "155991_kbc-eu-central-1",
"project_id": null,
"configuration_ids": [
... list of internal identifiers ...
],
"intent_description": "The user's primary intent <snip - see above>",
"is_successful": true,
"session_id": "807f480c-8a42-4533-be4a-63579312e295",
"fulfillment": "Successful Completion",
"tags": [
"pipeline-setup",
"multi-source-extraction",
"python-transformation",
"data-integration"
],
"classification": "ETL/ELT pipeline setup/Data export/sharing",
"development_stage": "Creating new use cases",
"summary": "I want to construct an orchestrated data pipeline to extract data from various sources including a currency API, Snowflake, and MySQL, and then use a Python script to transform a subset of this data."
}

I also added one more LLM step to create a short summary of the intent_description. Plus I added four classification fields fulfillment , tags , classification , development_stage . The idea behind these is to categorize the results more easily. Fulfillment can be positive or negative, but also partially successful. classification defines some major use-case classes like: “Ad-hoc analysis/Data exploration/inspection”, “ETL/ELT pipeline setup/Data export/sharing”, “Troubleshooting/Debugging”. development_stage defines type of the application use: “Creating new use cases”, “Updating existing use cases”, “Testing/validating configurations”.

The Flaws

Before any product person jumps into this running a similar thing on their app, let’s go through a sincere— do not try this with your home LLM — warning. My ultimate goal was not to improve the product or design a new product for customers. For such case, the above approach is very flawed on multiple fronts.

Flawed
  1. Bad Source data (1) — I only considered the interaction of the existing users with the platform. This may by completely unrelated to how the product sells. It is probably bad data to assume what (new) people want and so it’s questionable to use these results for product development.
  2. Bad source data (2) — In identifying a user session I consider only those sequences that contain running a job, because that’s essentially what the platform charges for, so I considered it as an important part. However, there may be many situations when the user didn’t run the job directly, or didn’t event wanted to do (because it’s a billed operation, or other reasons). I’m probably missing a lot of existing use cases here.
  3. Bad source data (3) — I only consider somewhat successful user actions. E.g. when a configuration is saved it means that the user must have got through the UI validation and entered at least mildly sane things inside. We know that we have some users (especially newcomers) that are completely disoriented in the product. As such, the classification of whether the user intent was successful is very biased towards success.
  4. Classification of success — This a very hard problem and could probably be expanded into a book — “How to tell if the user was successful in using your software?”. Again, I took a very simplistic approach and considered the use-case positive if there was at least some success job. This again is very biased towards success. It doesn’t tell anything about user satisfaction.
  5. Session identification — In typical web site requests Time-Based Session Grouping, the session length is minutes (or lower tens of minutes). I set my max session break length to 4 hours with a 24 hour break indicating a sure start of a new session. That is — if two operations occur within 4 hours, I still consider them as being part of a single session. This may look crazy long, but keep in mind that in the data I only see a little part of the user interaction with the platform. The user may be spending time writing or debugging a piece of SQL code and I do not get any records in the selected audit tables. The events with distance 4h–24h mark potential sessions to be analyzed later. Why so long? Because sometimes the user can be e.g. waiting for external authorization to extract some data. Is it still one “session” or not?
  6. The general idea — The general idea that the user actions in a product are correlated to their intent may be quite flawed too. Maybe they’re just doing the actions as a workaround for something the product can’t do. I remember in another old app we had a heavily used print-preview feature. It turned out to be used only because the normal page had overlapping numbers. People got used to look at the print preview to see the correct rendering. No one wanted to actually print the page.
  7. Shitty implementation. The whole implementation is full of bugs, the tagging does not work correctly and should not be done this way, the classification to classes should be stricter, etc.

Why?

If the result is so flawed and buggy, why did I bother in the first place? Also why did it made it to an article? For one I consider the idea somewhat interesting with a couple of nice learnings (see below). Second, the goal of this processing was to obtain training data for an LLM agent, not necessarily lead the product development. These are completely sufficient (though I still must keep in mind that they are biased).

Mainly though, this was an exercise in vibe-coding. I generated the entire code using Cursor.ai. I was looking for a suitable task to test some new approaches and I needed a task of suitable size to fulfill. Surely, I could go ahead and watch a video of a Pacman game being generated by LLM, but that is not valuable. Code generation is just another tool. Being amazed by it is one thing, using it for practical purposes is another.

The learnings

The most enlightening learning was about forgetting the database structure. Sometimes ignoring the structure and details can be very beneficial — instead of processing the four source tables, I just threw them into one big pile of junk a dig through it. Reducing object (configuration, table, job) versions to just events of their creation and modification got me moving. While in hindsight it’s so obvious, sometimes, one can’t see the forest for the trees.

Second, it was important to do the pre-processing for the LLM. Meaning — the events had to be grouped and I had to prepare the difference between initial and final state of the system. Again this might seem soooo obvious (preparing the data still takes 80% of time?) and I sort of expected it. But then one must try again and again, just to check if there isn’t any shortcut.

The tricky part with summarizing stuff with LLM is that it will always create a summary, however without the pre-processing, the summary is complete bullshit (but believable). Another learning is that when building a summarizing machine like this involves really hard checking of the results. And that I have to sit through it, and it takes much more time than building the summarizing machine.

I said above, this was mostly about vibe-coding, so what are the lessons learned there?

Vibe-coding sort of works — I got my working piece of code. It mostly works and does the stuff I wanted it to do and gives me the result I wanted. Yay, prototyping! Btw, I used Cursor.ai with claude-4-sonnet-max. I took the approach to first write instructions and then have Cursor ask me for clarifications. I then had it to write the clarifications back to file — this is a useful trick for not losing the conversation and refering to it later.

If you remember this movie, you’re not a GenZ

At the same time vibe-coding sort of does not work — I got something working, but it’s not anything usable in production. Although I used cursor rules, multiple violations happened. E.g. logging was combined with plain print messages to stdout in half of the cases. This was fixable though, one just needs to stay alert.

Then there is some crazy spaghetti code — like the _analyze_session method — which both does not work correctly and is completely unmaintainable. I tried a couple of times to fix it, but unsuccessfully. It just needs to be rewritten. It would have to be specified in a very detailed manner, because there are so many nuances of how to group and classify the configurations. Unsurprisingly it’s the core business logic that’s difficult to articulate both for humans and LLMs.

One might argue that it’s not important because it’s just a prototype. The answer to that is “yes and no”. If I need the prototype just to show that something will work, then it is not important. If I need the prototype to obtain basis for another step (this case), then it is important that 30% of the results are wrong.

Nevertheless the whole thing was an interesting experiment. And I got the core thing I wanted — some training data based on real world data. Though I can never win the real summarizing competition.

--

--

No responses yet