Model-based aggregators#

I watched Simon’s Language models on the command-line presentation. I am a big fan of his Unix-approach to LLMs. This also inspired me to play around more with smaller models to continue developing an intuition for how these things work.

I was quite interested in his script which he used to summarize comments on an orange site post at 26:35 in the video. This script got me thinking about the future of information consumption more deeply. I found Simon’s script useful for understanding the general tone of the responses to a particular item posted on the forum.

Model-based aggregators for news consumption#

I’ve also found swyx’s AI News newsletter useful for catching up on what is happening around the internet in various AI communities for the day. In neither case am I (initially) reading very much user-generated content. I am reading model summaries to help provide me a directional signal for what I want to look into more deeply next.

Where does this lead?#

What are the implications of these “model-based aggregators”? It’s possible to instruct the model to quote directly from the source material, which can help blend the model summary with the original content. However, I can also anticipate ways in which, if I’m largely consuming model generated content, I could become somewhat disconnected from what the actual people are thinking about, experiencing and doing. It’s also possible the model could be biased in ways that prevent certain things from surfacing. Or worse, certain primary sources could be excluded by whoever is deciding what content should be provided as part of the model context.

I think it’s still early, but I expect models are already being used to cherry-pick context and intentionally produce biased summaries. As someone who is trying to keep my horizon for information broad, model-based aggregation helps save me time and increase my breadth of understanding, but I see how it might also bring blind spots and allow for subtle “finger-on-the-scale” influencing of my understanding as well.

None of this ML-based recommendation and aggregation is new. It’s one of the most profitable business models in the world. With LLMs, it seems to me that it’s now much easier to aggregate or summarize information and present it through a highly refined lens. Combined with a feedback loop for engagement, I could see this producing some not-so-nice results.

Model-based aggregation also further devalues original source material. I think if we begin to rely more on model summarization, it will make it harder to be a self-supporting content creator. However, succeeding as a content creator today is usually tied with achieving effective distribution on platforms (or finding your N true fans to pay you directly). If you can get your content into the LLM-generated summaries, maybe that is similar to getting your content shared on platforms. However, because the model can summarize what you wrote, you may never get the traffic or credit, making the situation meaningfully worse for authors of original content even if consumers of the content are still enjoying and learning from it. If no one can support themselves making original content, they will be forced to do other things. As a result of that shift, the models will have little good or interesting content to aggregate, and the quality of the summaries will probably get worse.

This degradation seems like a furthering of the consolidation of news organizations and social media platforms. I can see pluses and minuses for how models will help us learn and stay informed, but given how confusing things have become with social media, I am wary of the downsides and negatives resulting from a diet of mostly model-generated content.