rssed

a collection of dev rss feeds - blogroll

Add a new feed

+

319 feeds


Sebastian Raschka, PhD

Posts

Claude Code's Real Secret Sauce Isn't the Model πŸ”—

Skimming the leaked Claude Code TypeScript codebase suggests that much of its coding performance over the plain model in the web UI comes from the sur [...]

A Visual Guide to Attention Variants in Modern LLMs πŸ”—

From MHA and GQA to MLA, sparse attention, and hybrid architectures [...]

New LLM Architecture Gallery πŸ”—

I put together a new LLM Architecture Gallery that collects the architecture figures from my recent comparison articles in one place, together with co [...]

A Dream of Spring for Open-Weight LLMs: 10 Architectures from Jan-Feb 2026 πŸ”—

A Round Up And Comparison of 10 Open-Weight LLM Releases in Spring 2026 [...]

State of AI 2026 with Sebastian Raschka, Nathan Lambert, and Lex Fridman πŸ”—

I recently sat down with Lex Fridman and Nathan Lambert for a comprehensive 4.5 h interview to discuss the current state of progress of AI, and what t [...]

Categories of Inference-Time Scaling for Improved LLM Reasoning πŸ”—

Inference scaling has become one of the most effective ways to improve answer quality and accuracy in deployed LLMs. The idea is straightforward. If w [...]

The State Of LLMs 2025: Progress, Problems, and Predictions πŸ”—

A 2025 review of large language models, from DeepSeek R1 and RLVR to inference-time scaling, benchmarks, architectures, and predictions for 2026. [...]

LLM Research Papers: The 2025 List (July to December) πŸ”—

A curated list of LLM research papers from July–December 2025, organized by reasoning models, inference-time scaling, architectures, training efficien [...]

From Random Forests to RLVR: A Short History of ML/AI Hello Worlds πŸ”—

Two years ago, I posted a list of Hello World examples for machine learning and AI on social. Here, the Hello World means beginner-friendly examples t [...]

From DeepSeek V3 to V3.2: Architecture, Sparse Attention, and RL Updates πŸ”—

Similar to DeepSeek V3, the team released their new flagship model over a major US holiday weekend. Given DeepSeek V3.2's really good performance (on [...]

Recommendations for Getting the Most Out of a Technical Book πŸ”—

This short article compiles a few notes I previously shared when readers ask how to get the most out of my building large language model from scratch [...]

Beyond Standard LLMs πŸ”—

After I shared my Big LLM Architecture Comparison a few months ago, which focused on the main transformer-based LLMs, I received a lot of questions wi [...]

DGX Spark and Mac Mini for Local PyTorch Development πŸ”—

The DGX Spark for local LLM inferencing and fine-tuning was a pretty popular discussion topic recently. I got to play with one myself, primarily worki [...]

Understanding the 4 Main Approaches to LLM Evaluation (From Scratch) πŸ”—

Multiple-Choice Benchmarks, Verifiers, Leaderboards, and LLM Judges with Code Examples [...]

Understanding and Implementing Qwen3 From Scratch πŸ”—

Previously, I compared the most notable open-weight architectures of 2025 in The Big LLM Architecture Comparison. Then, I zoomed in and discussed the. [...]

From GPT-2 to gpt-oss: Analyzing the Architectural Advances πŸ”—

OpenAI just released their new open-weight LLMs this week: gpt-oss-120b and gpt-oss-20b, their first open-weight models since GPT-2 in 2019. And yes, [...]

The Big LLM Architecture Comparison πŸ”—

It has been seven years since the original GPT architecture was developed. At first glance, looking back at GPT-2 (2019) and forward to DeepSeek-V3 an [...]

LLM Research Papers: The 2025 List (January to June) πŸ”—

The latest in LLM research with a hand-curated, topic-organized list of over 200 research papers from 2025. [...]

Understanding and Coding the KV Cache in LLMs from Scratch πŸ”—

KV caches are one of the most critical techniques for efficient inference in LLMs in production. KV caches are an important component for compute-effi [...]

Coding LLMs from the Ground Up: A Complete Course πŸ”—

Why build an LLM from scratch? It's probably the best and most efficient way to learn how LLMs really work. Plus, many readers have told me they had a [...]

The State of Reinforcement Learning for LLM Reasoning πŸ”—

A lot has happened this month, especially with the releases of new flagship models like GPT-4.5 and Llama 4. But you might have noticed that reactions [...]

First Look at Reasoning From Scratch: Chapter 1 πŸ”—

As you know, I've been writing a lot lately about the latest research on reasoning in LLMs. Before my next research-focused blog post, I wanted to off [...]

Inference-Time Compute Scaling Methods to Improve Reasoning Models πŸ”—

This article explores recent research advancements in reasoning-optimized LLMs, with a particular focus on inference-time compute scaling that have em [...]

Understanding Reasoning LLMs πŸ”—

In this article, I will describe the four main approaches to building reasoning models, or how we can enhance LLMs with reasoning capabilities. I hope [...]

Noteworthy LLM Research Papers of 2024 πŸ”—

This article covers 12 influential AI research papers of 2024, ranging from mixture-of-experts models to new LLM scaling laws for precision. [...]

Implementing A Byte Pair Encoding (BPE) Tokenizer From Scratch πŸ”—

This is a standalone notebook implementing the popular byte pair encoding (BPE) tokenization algorithm, which is used in models like GPT-2 to GPT-4, L [...]

LLM Research Papers: The 2024 List πŸ”—

I want to share my running bookmark list of many fascinating (mostly LLM-related) papers I stumbled upon in 2024. It's just a list, but maybe it will [...]

Understanding Multimodal LLMs πŸ”—

There has been a lot of new research on the multimodal LLM front, including the latest Llama 3.2 vision models, which employ diverse architectural... [...]

Building A GPT-Style LLM Classifier From Scratch πŸ”—

This article shows you how to transform pretrained large language models (LLMs) into strong text classifiers.Β But why focus on classification? First.. [...]

Building LLMs from the Ground Up: A 3-hour Coding Workshop πŸ”—

This tutorial is aimed at coders interested in understanding the building blocks of large language models (LLMs), how LLMs work, and how to code them [...]

New LLM Pre-training and Post-training Paradigms πŸ”—

There are hundreds of LLM papers each month proposing new techniques and approaches. However, one of the best ways to see what actually works well in. [...]

Instruction Pretraining LLMs πŸ”—

This article covers a new, cost-effective method for generating data for instruction finetuning LLMs; instruction finetuning from scratch; pretraining [...]

Developing an LLM: Building, Training, Finetuning πŸ”—

This is an overview of the LLM development process. This one-hour talk focuses on the essential three stages of developing an LLM: coding the architec [...]

LLM Research Insights: Instruction Masking and New LoRA Finetuning Experiments? πŸ”—

This article covers three new papers related to instruction finetuning and parameter-efficient finetuning with LoRA in large language models (LLMs). I [...]

How Good Are the Latest Open LLMs? And Is DPO Better Than PPO? πŸ”—

What a month! We had four major open LLM releases: Mixtral, Meta AI's Llama 3, Microsoft's Phi-3, and Apple's OpenELM. In my new article, I review and [...]

Using and Finetuning Pretrained Transformers πŸ”—

What are the different ways to use and finetune pretrained large language models (LLMs)? The three most common ways to use and finetune pretrained LLM [...]

Tips for LLM Pretraining and Evaluating Reward Models πŸ”—

It's another month in AI research, and it's hard to pick favorites. This month, I am going over a paper that discusses strategies for the continued... [...]

Research Papers in February 2024 πŸ”—

Once again, this has been an exciting month in AI research. This month, I'm covering two new openly available LLMs, insights into small finetuned LLMs [...]

Improving LoRA: Implementing Weight-Decomposed Low-Rank Adaptation (DoRA) from Scratch πŸ”—

Low-rank adaptation (LoRA) is a machine learning technique that modifies a pretrained model (for example, an LLM or vision transformer) to better suit [...]

Optimizing LLMs From a Dataset Perspective πŸ”—

This article focuses on improving the modeling performance of LLMs by finetuning them using carefully curated datasets. Specifically, this article... [...]

The NeurIPS 2023 LLM Efficiency Challenge Starter Guide πŸ”—

Large language models (LLMs) offer one of the most interesting opportunities for developing more efficient training methods. A few weeks ago, the Neur [...]

Optimizing Memory Usage for Training LLMs and Vision Transformers in PyTorch πŸ”—

Peak memory consumption is a common bottleneck when training deep learning models such as vision transformers and LLMs. This article provides a series [...]

Finetuning Falcon LLMs More Efficiently With LoRA and Adapters πŸ”—

Finetuning allows us to adapt pretrained LLMs in a cost-efficient manner. But which method should we use? This article compares different... [...]

Accelerating Large Language Models with Mixed-Precision Techniques πŸ”—

Training and using large language models (LLMs) is expensive due to their large compute requirements and memory footprints. This article will explore [...]

Parameter-Efficient LLM Finetuning With Low-Rank Adaptation (LoRA) πŸ”—

Pretrained large language models are often referred to as foundation models for a good reason: they perform well on various tasks, and we can use them [...]

Understanding Parameter-Efficient Finetuning of Large Language Models: From Prefix Tuning to LLaMA-Adapters πŸ”—

In the rapidly evolving field of artificial intelligence, utilizing large language models in an efficient and effective manner has become increasingly [...]

Finetuning Large Language Models On A Single GPU Using Gradient Accumulation πŸ”—

Previously, I shared an article using multi-GPU training strategies to speed up the finetuning of large language models. Several of these strategies i [...]

Keeping Up With AI Research And News πŸ”—

When it comes to productivity workflows, there are a lot of things I'd love to share. However, the one topic many people ask me about is how I keep up [...]

Some Techniques To Make Your PyTorch Models Train (Much) Faster πŸ”—

This blog post outlines techniques for improving the training performance of your PyTorch model without compromising its accuracy. To do so, we will w [...]

Understanding and Coding the Self-Attention Mechanism of Large Language Models From Scratch πŸ”—

In this article, we are going to understand how self-attention works from scratch. This means we will code it ourselves one step at a time. Since its. [...]

Understanding Large Language Models -- A Transformative Reading List πŸ”—

Since transformers have such a big impact on everyone's research agenda, I wanted to flesh out a short reading list for machine learning researchers a [...]

What Are the Different Approaches for Detecting Content Generated by LLMs Such As ChatGPT? And How Do They Work and Differ? πŸ”—

Since the release of the AI Classifier by OpenAI made big waves yesterday, I wanted to share a few details about the different approaches for detectin [...]

Comparing Different Automatic Image Augmentation Methods in PyTorch πŸ”—

Data augmentation is a key tool in reducing overfitting, whether it's for images or text. This article compares three Auto Image Data Augmentation... [...]

Curated Resources and Trustworthy Experts: The Key Ingredients for Finding Accurate Answers to Technical Questions in the Future πŸ”—

Conversational chat bots such as ChatGPT probably will not be able replace traditional search engines and expert knowledge anytime soon. With the vast [...]

Training an XGBoost Classifier Using Cloud GPUs Without Worrying About Infrastructure πŸ”—

Imagine you want to quickly train a few machine learning or deep learning models on the cloud but don't want to deal with cloud infrastructure. This s [...]

Open Source Highlights 2022 for Machine Learning & AI πŸ”—

Recently, I shared the top 10 papers that I read in 2022. As a follow-up, I am compiling a list of my favorite 10 open-source releases that I discover [...]

Influential Machine Learning Papers Of 2022 πŸ”—

Every day brings something new and exciting to the world of machine learning and AI, from the latest developments and breakthroughs in the field to em [...]

Ahead Of AI, And What's Next? πŸ”—

About monthly machine learning musings, and other things I am currently workin on ... [...]

A Short Chronology Of Deep Learning For Tabular Data πŸ”—

Occasionally, I share research papers proposing new deep learning approaches for tabular data on social media, which is typically an excellent discuss [...]

No, We Don't Have to Choose Batch Sizes As Powers Of 2 πŸ”—

Regarding neural network training, I think we are all guilty of doing this: we choose our batch sizes as powers of 2, that is, 64, 128, 256, 512, 1024 [...]

Sharing Deep Learning Research Models with Lightning Part 2: Leveraging the Cloud πŸ”—

In this article, we will take deploy a Super Resolution App on the cloud using lightning.ai. The primary goal here is to see how easy it is to create [...]

Sharing Deep Learning Research Models with Lightning Part 1: Building A Super Resolution App πŸ”—

In this post, we will build a Lightning App. Why? Because it is 2022, and it is time to explore a more modern take on interacting with, presenting, an [...]

Taking Datasets, DataLoaders, and PyTorch’s New DataPipes for a Spin πŸ”—

The PyTorch team recently announced TorchData, a prototype library focused on implementing composable and reusable data loading utilities for PyTorch. [...]

Running PyTorch on the M1 GPU πŸ”—

Today, PyTorch officially introduced GPU support for Apple's ARM M1 chips. This is an exciting day for Mac users out there, so I spent a few minutes t [...]

Creating Confidence Intervals for Machine Learning Classifiers πŸ”—

Developing good predictive models hinges upon accurate performance evaluation and comparisons. However, when evaluating machine learning models, we... [...]

Losses Learned πŸ”—

The cross-entropy loss is our go-to loss for training deep learning-based classifiers. In this article, I am giving you a quick tour of how we usually [...]

TorchMetrics πŸ”—

TorchMetrics is a really nice and convenient library that lets us compute the performance of models in an iterative fashion. It's designed with PyTorc [...]

Machine Learning with PyTorch and Scikit-Learn πŸ”—

Machine Learning with PyTorch and Scikit-Learn has been a long time in the making, and I am excited to finally get to talk about the release of my new [...]

Introduction to Machine Learning πŸ”—

About half a year ago, I organized all my deep learning-related videos in a handy blog post to have everything in one place. Since many people liked t [...]

Introduction to Deep Learning πŸ”—

I just sat down this morning and organized all deep learning related videos I recorded in 2021. I am sure this will be a useful reference for my futur [...]

Datasets for Machine Learning and Deep Learning πŸ”—

With the semester being in full swing, I recently shared this set of dataset repositories with my deep learning class. However, I thought that beyond [...]

Book Review: Deep Learning With PyTorch πŸ”—

After its release in August 2020, Deep Learning with PyTorch has been sitting on my shelf before I finally got a chance to read it during this winter [...]

How I Keep My Projects Organized πŸ”—

Since I started my undergraduate studies in 2008, I have been obsessed with productivity tips, notetaking solutions, and todo-list management. Over th [...]

Scientific Computing in Python: Introduction to NumPy and Matplotlib πŸ”—

Since many students in my Stat 451 (Introduction to Machine Learning and Statistical Pattern Classification) class are relatively new to Python and Nu [...]

Interpretable Machine Learning πŸ”—

In this blog post, I am (briefly) reviewing Christoph Molnar's *Interpretable Machine Learning Book*. Then, I am writing about two classic generalized [...]

Chapter 1: Introduction to Machine Learning and Deep Learning πŸ”—

The first chapter (draft) of the Introduction to Deep Learning book, which is a book based on my lecture notes and slides. [...]

Book Review: Architects of Intelligence by Martin Ford πŸ”—

A brief review of Martin Ford's book that features interviews with 23 of the most well-known and brightest minds working on AI. [...]

What's New in the 3rd Edition πŸ”—

A brief summary of what's new in the 3rd edition of Python Machine Learning. [...]

My First Year at UW-Madison and a Gallery of Awesome Student Projects πŸ”—

Not too long ago, in the Summer of 2018, I was super excited to join the Department of Statistics at the University of Wisconsin-Madison after obtaini [...]

Model evaluation, model selection, and algorithm selection in machine learning πŸ”—

This final article in the series *Model evaluation, model selection, and algorithm selection in machine learning* presents overviews of several statis [...]

Generating Gender-Neutral Face Images with Semi-Adversarial Neural Networks to Enhance Privacy πŸ”—

I thought that it would be nice to have short and concise summaries of recent projects handy, to share them with a more general audience, including... [...]

Model evaluation, model selection, and algorithm selection in machine learning πŸ”—

Almost every machine learning algorithm comes with a large number of settings that we, the machine learning researchers and practitioners, need to spe [...]

Model evaluation, model selection, and algorithm selection in machine learning πŸ”—

In this second part of this series, we will look at some advanced techniques for model evaluation and techniques to estimate the uncertainty of our... [...]

Model evaluation, model selection, and algorithm selection in machine learning πŸ”—

Machine learning has become a central part of our life -- as consumers, customers, and hopefully as researchers and practitioners! Whether we are appl [...]

Writing 'Python Machine Learning' πŸ”—

It's been about time. I am happy to announce that "Python Machine Learning" was finally released today! Sure, I could just send an email around to all [...]

Python, Machine Learning, and Language Wars πŸ”—

This has really been quite a journey for me lately. And regarding the frequently asked question β€œWhy did you choose Python for Machine Learning?” I gu [...]

Single-Layer Neural Networks and Gradient Descent πŸ”—

This article offers a brief glimpse of the history and basic concepts of machine learning. We will take a look at the first algorithmically described [...]

Principal Component Analysis πŸ”—

Principal Component Analysis (PCA) is a simple yet popular and useful linear transformation technique that is used in numerous applications, such as s [...]

Implementing a Weighted Majority Rule Ensemble Classifier πŸ”—

Here, I want to present a simple and conservative approach of implementing a weighted majority rule ensemble classifier in scikit-learn that yielded.. [...]

MusicMood πŸ”—

In this article, I want to share my experience with a recent data mining project which probably was one of my most favorite hobby projects so far. It' [...]

Turn Your Twitter Timeline into a Word Cloud πŸ”—

Last week, I posted some visualizations in context of Happy Rock Song data mining project, and some people were curious about how I created the word c [...]

Naive Bayes and Text Classification πŸ”—

Naive Bayes classifiers, a family of classifiers that are based on the popular Bayes’ probability theorem, are known for creating simple yet well perf [...]

Kernel tricks and nonlinear dimensionality reduction via RBF kernel PCA πŸ”—

The focus of this article is to briefly introduce the idea of kernel methods and to implement a Gaussian radius basis function (RBF) kernel that is us [...]

Predictive modeling, supervised machine learning, and pattern classification πŸ”—

When I was working on my next pattern classification application, I realized that it might be worthwhile to take a step back and look at the big pictu [...]

Linear Discriminant Analysis πŸ”—

I received a lot of positive feedback about the step-wise Principal Component Analysis (PCA) implementation. Thus, I decided to write a little follow- [...]

Dixon's Q test for outlier identification πŸ”—

I recently faced the impossible task to identify outliers in a dataset with very, very small sample sizes and Dixon's Q test caught my attention. Hone [...]

About Feature Scaling and Normalization πŸ”—

I received a couple of questions in response to my previous article (Entry point: Data) where people asked me why I used Z-score standardization as fe [...]

Entry Point Data πŸ”—

In this short tutorial I want to provide a short overview of some of my favorite Python tools for common procedures as entry points for general patter [...]

Molecular docking, estimating free energies of binding, and AutoDock's semi-empirical force field πŸ”—

Discussions and questions about methods, approaches, and tools for estimating (relative) binding free energies of protein-ligand complexes are quite.. [...]

An introduction to parallel programming using Python's multiprocessing module πŸ”—

The default Python interpreter was designed with simplicity in mind and has a thread-safe mechanism, the so-called "GIL" (Global Interpreter Lock). In [...]

Numeric matrix manipulation πŸ”—

At its core, this article is about a simple cheat sheet for basic operations on numeric matrices, which can be very useful if you working and experime [...]

Kernel density estimation via the Parzen-Rosenblatt window method πŸ”—

The Parzen-window method (also known as Parzen-Rosenblatt window method) is a widely used non-parametric approach to estimate a probability density fu [...]

The key differences between Python 2.7.x and Python 3.x with examples πŸ”—

Many beginning Python users are wondering with which version of Python they should start. My answer to this question is usually something along the li [...]

5 simple steps for converting Markdown documents into HTML and adding Python syntax highlighting πŸ”—

In this little tutorial, I want to show you in 5 simple steps how easy it is to add code syntax highlighting to your blog articles. [...]

Creating a table of contents with internal links in IPython Notebooks and Markdown documents πŸ”—

Many people have asked me how I create the table of contents with internal links for my IPython Notebooks and Markdown documents on GitHub. Well, no.. [...]

A Beginner's Guide to Python's Namespaces, Scope Resolution, and the LEGB Rule πŸ”—

A short tutorial about Python's namespaces and the scope resolution for variable names using the LEGB-rule with little quiz-like exercises. [...]

Diving deep into Python πŸ”—

Some while ago, I started to collect some of the not-so-obvious things I encountered when I was coding in Python. I thought that it was worthwhile sha [...]

Implementing a Principal Component Analysis (PCA) πŸ”—

In this article I want to explain how a Principal Component Analysis (PCA) works by implementing it in Python step by step. At the end we will compare [...]

Installing Scientific Packages for Python3 on MacOS 10.9 Mavericks πŸ”—

I just went through some pain (again) when I wanted to install some of Python's scientific libraries on my second Mac. I summarized the setup and... [...]

A thorough guide to SQLite database operations in Python πŸ”—

After I wrote the initial teaser article "SQLite - Working with large data sets in Python effectively" about how awesome SQLite databases are via sqli [...]

Using OpenEye software for substructure alignments πŸ”—

This is a quickguide showing how to use OpenEye software command line tools to align target molecules to a query based on substructure matches and how [...]

Unit testing in Python πŸ”—

Let’s be honest, code testing is everything but a joyful task. However, a good unit testing framework makes this process as smooth as possible. Eventu [...]

A short tutorial for decent heat maps in R πŸ”—

I received many questions from people who want to quickly visualize their data via heat maps - ideally as quickly as possible. This is the major issue [...]

SQLite πŸ”—

My new project confronted me with the task of screening a massive set of large data files in text format with billions of entries each. I will have to [...]