Claude Code's Real Secret Sauce Isn't the Model π
Skimming the leaked Claude Code TypeScript codebase suggests that much of its coding performance over the plain model in the web UI comes from the sur [...]
a collection of dev rss feeds - blogroll
Posts
Skimming the leaked Claude Code TypeScript codebase suggests that much of its coding performance over the plain model in the web UI comes from the sur [...]
From MHA and GQA to MLA, sparse attention, and hybrid architectures [...]
I put together a new LLM Architecture Gallery that collects the architecture figures from my recent comparison articles in one place, together with co [...]
A Round Up And Comparison of 10 Open-Weight LLM Releases in Spring 2026 [...]
I recently sat down with Lex Fridman and Nathan Lambert for a comprehensive 4.5 h interview to discuss the current state of progress of AI, and what t [...]
Inference scaling has become one of the most effective ways to improve answer quality and accuracy in deployed LLMs. The idea is straightforward. If w [...]
A 2025 review of large language models, from DeepSeek R1 and RLVR to inference-time scaling, benchmarks, architectures, and predictions for 2026. [...]
A curated list of LLM research papers from JulyβDecember 2025, organized by reasoning models, inference-time scaling, architectures, training efficien [...]
Two years ago, I posted a list of Hello World examples for machine learning and AI on social. Here, the Hello World means beginner-friendly examples t [...]
Similar to DeepSeek V3, the team released their new flagship model over a major US holiday weekend. Given DeepSeek V3.2's really good performance (on [...]
This short article compiles a few notes I previously shared when readers ask how to get the most out of my building large language model from scratch [...]
After I shared my Big LLM Architecture Comparison a few months ago, which focused on the main transformer-based LLMs, I received a lot of questions wi [...]
The DGX Spark for local LLM inferencing and fine-tuning was a pretty popular discussion topic recently. I got to play with one myself, primarily worki [...]
Multiple-Choice Benchmarks, Verifiers, Leaderboards, and LLM Judges with Code Examples [...]
Previously, I compared the most notable open-weight architectures of 2025 in The Big LLM Architecture Comparison. Then, I zoomed in and discussed the. [...]
OpenAI just released their new open-weight LLMs this week: gpt-oss-120b and gpt-oss-20b, their first open-weight models since GPT-2 in 2019. And yes, [...]
It has been seven years since the original GPT architecture was developed. At first glance, looking back at GPT-2 (2019) and forward to DeepSeek-V3 an [...]
The latest in LLM research with a hand-curated, topic-organized list of over 200 research papers from 2025. [...]
KV caches are one of the most critical techniques for efficient inference in LLMs in production. KV caches are an important component for compute-effi [...]
Why build an LLM from scratch? It's probably the best and most efficient way to learn how LLMs really work. Plus, many readers have told me they had a [...]
A lot has happened this month, especially with the releases of new flagship models like GPT-4.5 and Llama 4. But you might have noticed that reactions [...]
As you know, I've been writing a lot lately about the latest research on reasoning in LLMs. Before my next research-focused blog post, I wanted to off [...]
This article explores recent research advancements in reasoning-optimized LLMs, with a particular focus on inference-time compute scaling that have em [...]
In this article, I will describe the four main approaches to building reasoning models, or how we can enhance LLMs with reasoning capabilities. I hope [...]
This article covers 12 influential AI research papers of 2024, ranging from mixture-of-experts models to new LLM scaling laws for precision. [...]
This is a standalone notebook implementing the popular byte pair encoding (BPE) tokenization algorithm, which is used in models like GPT-2 to GPT-4, L [...]
I want to share my running bookmark list of many fascinating (mostly LLM-related) papers I stumbled upon in 2024. It's just a list, but maybe it will [...]
There has been a lot of new research on the multimodal LLM front, including the latest Llama 3.2 vision models, which employ diverse architectural... [...]
This article shows you how to transform pretrained large language models (LLMs) into strong text classifiers.Β But why focus on classification? First.. [...]
This tutorial is aimed at coders interested in understanding the building blocks of large language models (LLMs), how LLMs work, and how to code them [...]
There are hundreds of LLM papers each month proposing new techniques and approaches. However, one of the best ways to see what actually works well in. [...]
This article covers a new, cost-effective method for generating data for instruction finetuning LLMs; instruction finetuning from scratch; pretraining [...]
This is an overview of the LLM development process. This one-hour talk focuses on the essential three stages of developing an LLM: coding the architec [...]
This article covers three new papers related to instruction finetuning and parameter-efficient finetuning with LoRA in large language models (LLMs). I [...]
What a month! We had four major open LLM releases: Mixtral, Meta AI's Llama 3, Microsoft's Phi-3, and Apple's OpenELM. In my new article, I review and [...]
What are the different ways to use and finetune pretrained large language models (LLMs)? The three most common ways to use and finetune pretrained LLM [...]
It's another month in AI research, and it's hard to pick favorites. This month, I am going over a paper that discusses strategies for the continued... [...]
Once again, this has been an exciting month in AI research. This month, I'm covering two new openly available LLMs, insights into small finetuned LLMs [...]
Low-rank adaptation (LoRA) is a machine learning technique that modifies a pretrained model (for example, an LLM or vision transformer) to better suit [...]
This article focuses on improving the modeling performance of LLMs by finetuning them using carefully curated datasets. Specifically, this article... [...]
Large language models (LLMs) offer one of the most interesting opportunities for developing more efficient training methods. A few weeks ago, the Neur [...]
Peak memory consumption is a common bottleneck when training deep learning models such as vision transformers and LLMs. This article provides a series [...]
Finetuning allows us to adapt pretrained LLMs in a cost-efficient manner. But which method should we use? This article compares different... [...]
Training and using large language models (LLMs) is expensive due to their large compute requirements and memory footprints. This article will explore [...]
Pretrained large language models are often referred to as foundation models for a good reason: they perform well on various tasks, and we can use them [...]
In the rapidly evolving field of artificial intelligence, utilizing large language models in an efficient and effective manner has become increasingly [...]
Previously, I shared an article using multi-GPU training strategies to speed up the finetuning of large language models. Several of these strategies i [...]
When it comes to productivity workflows, there are a lot of things I'd love to share. However, the one topic many people ask me about is how I keep up [...]
This blog post outlines techniques for improving the training performance of your PyTorch model without compromising its accuracy. To do so, we will w [...]
In this article, we are going to understand how self-attention works from scratch. This means we will code it ourselves one step at a time. Since its. [...]
Since transformers have such a big impact on everyone's research agenda, I wanted to flesh out a short reading list for machine learning researchers a [...]
Since the release of the AI Classifier by OpenAI made big waves yesterday, I wanted to share a few details about the different approaches for detectin [...]
Data augmentation is a key tool in reducing overfitting, whether it's for images or text. This article compares three Auto Image Data Augmentation... [...]
Conversational chat bots such as ChatGPT probably will not be able replace traditional search engines and expert knowledge anytime soon. With the vast [...]
Imagine you want to quickly train a few machine learning or deep learning models on the cloud but don't want to deal with cloud infrastructure. This s [...]
Recently, I shared the top 10 papers that I read in 2022. As a follow-up, I am compiling a list of my favorite 10 open-source releases that I discover [...]
Every day brings something new and exciting to the world of machine learning and AI, from the latest developments and breakthroughs in the field to em [...]
About monthly machine learning musings, and other things I am currently workin on ... [...]
Occasionally, I share research papers proposing new deep learning approaches for tabular data on social media, which is typically an excellent discuss [...]
Regarding neural network training, I think we are all guilty of doing this: we choose our batch sizes as powers of 2, that is, 64, 128, 256, 512, 1024 [...]
In this article, we will take deploy a Super Resolution App on the cloud using lightning.ai. The primary goal here is to see how easy it is to create [...]
In this post, we will build a Lightning App. Why? Because it is 2022, and it is time to explore a more modern take on interacting with, presenting, an [...]
The PyTorch team recently announced TorchData, a prototype library focused on implementing composable and reusable data loading utilities for PyTorch. [...]
Today, PyTorch officially introduced GPU support for Apple's ARM M1 chips. This is an exciting day for Mac users out there, so I spent a few minutes t [...]
Developing good predictive models hinges upon accurate performance evaluation and comparisons. However, when evaluating machine learning models, we... [...]
The cross-entropy loss is our go-to loss for training deep learning-based classifiers. In this article, I am giving you a quick tour of how we usually [...]
TorchMetrics is a really nice and convenient library that lets us compute the performance of models in an iterative fashion. It's designed with PyTorc [...]
Machine Learning with PyTorch and Scikit-Learn has been a long time in the making, and I am excited to finally get to talk about the release of my new [...]
About half a year ago, I organized all my deep learning-related videos in a handy blog post to have everything in one place. Since many people liked t [...]
I just sat down this morning and organized all deep learning related videos I recorded in 2021. I am sure this will be a useful reference for my futur [...]
With the semester being in full swing, I recently shared this set of dataset repositories with my deep learning class. However, I thought that beyond [...]
After its release in August 2020, Deep Learning with PyTorch has been sitting on my shelf before I finally got a chance to read it during this winter [...]
Since I started my undergraduate studies in 2008, I have been obsessed with productivity tips, notetaking solutions, and todo-list management. Over th [...]
Since many students in my Stat 451 (Introduction to Machine Learning and Statistical Pattern Classification) class are relatively new to Python and Nu [...]
In this blog post, I am (briefly) reviewing Christoph Molnar's *Interpretable Machine Learning Book*. Then, I am writing about two classic generalized [...]
The first chapter (draft) of the Introduction to Deep Learning book, which is a book based on my lecture notes and slides. [...]
A brief review of Martin Ford's book that features interviews with 23 of the most well-known and brightest minds working on AI. [...]
A brief summary of what's new in the 3rd edition of Python Machine Learning. [...]
Not too long ago, in the Summer of 2018, I was super excited to join the Department of Statistics at the University of Wisconsin-Madison after obtaini [...]
This final article in the series *Model evaluation, model selection, and algorithm selection in machine learning* presents overviews of several statis [...]
I thought that it would be nice to have short and concise summaries of recent projects handy, to share them with a more general audience, including... [...]
Almost every machine learning algorithm comes with a large number of settings that we, the machine learning researchers and practitioners, need to spe [...]
In this second part of this series, we will look at some advanced techniques for model evaluation and techniques to estimate the uncertainty of our... [...]
Machine learning has become a central part of our life -- as consumers, customers, and hopefully as researchers and practitioners! Whether we are appl [...]
It's been about time. I am happy to announce that "Python Machine Learning" was finally released today! Sure, I could just send an email around to all [...]
This has really been quite a journey for me lately. And regarding the frequently asked question βWhy did you choose Python for Machine Learning?β I gu [...]
This article offers a brief glimpse of the history and basic concepts of machine learning. We will take a look at the first algorithmically described [...]
Principal Component Analysis (PCA) is a simple yet popular and useful linear transformation technique that is used in numerous applications, such as s [...]
Here, I want to present a simple and conservative approach of implementing a weighted majority rule ensemble classifier in scikit-learn that yielded.. [...]
In this article, I want to share my experience with a recent data mining project which probably was one of my most favorite hobby projects so far. It' [...]
Last week, I posted some visualizations in context of Happy Rock Song data mining project, and some people were curious about how I created the word c [...]
Naive Bayes classifiers, a family of classifiers that are based on the popular Bayesβ probability theorem, are known for creating simple yet well perf [...]
The focus of this article is to briefly introduce the idea of kernel methods and to implement a Gaussian radius basis function (RBF) kernel that is us [...]
When I was working on my next pattern classification application, I realized that it might be worthwhile to take a step back and look at the big pictu [...]
I received a lot of positive feedback about the step-wise Principal Component Analysis (PCA) implementation. Thus, I decided to write a little follow- [...]
I recently faced the impossible task to identify outliers in a dataset with very, very small sample sizes and Dixon's Q test caught my attention. Hone [...]
I received a couple of questions in response to my previous article (Entry point: Data) where people asked me why I used Z-score standardization as fe [...]
In this short tutorial I want to provide a short overview of some of my favorite Python tools for common procedures as entry points for general patter [...]
Discussions and questions about methods, approaches, and tools for estimating (relative) binding free energies of protein-ligand complexes are quite.. [...]
The default Python interpreter was designed with simplicity in mind and has a thread-safe mechanism, the so-called "GIL" (Global Interpreter Lock). In [...]
At its core, this article is about a simple cheat sheet for basic operations on numeric matrices, which can be very useful if you working and experime [...]
The Parzen-window method (also known as Parzen-Rosenblatt window method) is a widely used non-parametric approach to estimate a probability density fu [...]
Many beginning Python users are wondering with which version of Python they should start. My answer to this question is usually something along the li [...]
In this little tutorial, I want to show you in 5 simple steps how easy it is to add code syntax highlighting to your blog articles. [...]
Many people have asked me how I create the table of contents with internal links for my IPython Notebooks and Markdown documents on GitHub. Well, no.. [...]
A short tutorial about Python's namespaces and the scope resolution for variable names using the LEGB-rule with little quiz-like exercises. [...]
Some while ago, I started to collect some of the not-so-obvious things I encountered when I was coding in Python. I thought that it was worthwhile sha [...]
In this article I want to explain how a Principal Component Analysis (PCA) works by implementing it in Python step by step. At the end we will compare [...]
I just went through some pain (again) when I wanted to install some of Python's scientific libraries on my second Mac. I summarized the setup and... [...]
After I wrote the initial teaser article "SQLite - Working with large data sets in Python effectively" about how awesome SQLite databases are via sqli [...]
This is a quickguide showing how to use OpenEye software command line tools to align target molecules to a query based on substructure matches and how [...]
Letβs be honest, code testing is everything but a joyful task. However, a good unit testing framework makes this process as smooth as possible. Eventu [...]
I received many questions from people who want to quickly visualize their data via heat maps - ideally as quickly as possible. This is the major issue [...]