My Articles
Building a Note-Taking System That Actually Works
2026-03-18 · kareem.cloud
The author argues that effective notetaking systems are highly personal and emergent from individual experimentation rather than following universal best practices—successful systems combine techniques from various frameworks (PARA, Zettelkasten, etc.) tailored to one’s specific needs. They explain that notetaking fails when systems require excessive maintenance and rigid organization that doesn’t adapt as thinking and interests evolve across different subjects. The key recommendation is to continuously test new approaches and methodologies, keeping what works while discarding what doesn’t, to create a personalized “frankenstein” system optimized for your unique workflow.
The RECON Framework for LLM Inference
2025-12-28 · Day 1 Inference
A foundational examination of how modern systems serve large language models efficiently, introducing the five-layer RECON framework: Routing, Engine, Cache, Orchestration, and Nodes. The article argues for ‘goodput’ as a unified metric measuring sustainable request rates while meeting service-level objectives, rather than optimizing raw throughput or latency independently.
Announcing Capacity Blocks support for AWS Parallel Computing Service
2025-09-18 · AWS Blog
AWS PCS now supports Amazon EC2 Capacity Blocks, enabling reserved GPU-accelerated instances like NVIDIA Hopper and AWS Trainium for future ML and HPC workloads. Key benefits include reserved access, flexible scheduling, resource sharing, and seamless integration with PCS clusters.
The Myth of Autonomous Agents
2025-07-09 · Medium
Agentic workflows are essentially state machines with predefined actions, not the autonomous decision-making systems often marketed by AI companies. While current implementations technically fit the definition of agents from 2002, they operate within highly restricted environments with narrow toolsets and hard-coded objectives, not the flexible, open-ended autonomous systems the hype suggests. The gap between marketing promises and reality reflects ambiguous definitions that allow people to imagine far more capable systems than what actually exists today.
Artificial Intelligence is Artificially Intimidating
2025-07-01 · Medium
The blog post explains that AI and machine learning, despite seeming complex, can be understood using simple middle school math concepts—specifically by finding trends in data and using equations to make predictions. Using an ice cream consumption example, it demonstrates how a basic linear equation (y = mx + b) can predict outcomes based on input variables, which is fundamentally how AI systems like ChatGPT work. The post emphasizes that more data and better formulas allow for more accurate predictions, making AI accessible to anyone willing to understand these foundational concepts.
Parallelism or: How I Learned to Stop Crashing and Fit the Model
2025-06-04 · Medium
This blog post recounts the author’s struggle to train a large machine learning model on multiple GPUs, repeatedly hitting CUDA out-of-memory errors despite following documentation and optimization guides. The author identifies a gap between high-level conceptual frameworks (like the Roofline Model and parallelism techniques) and practical implementation, noting that existing documentation is either too abstract or too low-level to bridge understanding and real-world application. The post sets up a deeper exploration of scaling ML models by examining concrete implementations of concepts like data parallelism.