All posts

From the forge

Writing about platform engineering, agentic AI systems, Python, and cloud infrastructure.

DatabricksSpark

How Apache Spark Works (Short Summary)

A concise overview of why Apache Spark was created, how RDDs enable in-memory processing for iterative and interactive workloads, and its key programming abstractions.

8 Aug 2013 · 2 min read
Big DataYARN

YARN vs Mesos

A comparison of YARN and Mesos as cluster resource managers, including their architectural differences, recent developments, and insights from the Google Omega paper.

3 Aug 2013 · 2 min read
Data EngineeringBig Data

Handle Schema Changes and Evolution in Hadoop

Approaches for handling schema evolution in Hadoop using Avro and ORC file formats, including a practical workflow for managing schema changes with Hive.

30 Mar 2013 · 2 min read
JavaData Engineering

Chain Mapper Example

How to use the ChainMapper class in Hadoop to call multiple mappers in sequence, with a working example and key points about configuration and type compatibility.

16 Feb 2013 · 3 min read
Data EngineeringBig Data

Merging Small Files in Hadoop

The small files problem in Hadoop and five approaches to solve it: HDFSConcat, IdentityMapper/Reducer, FileUtil.copyMerge, Hadoop File Crush, and Hive concatenate.

26 Jan 2013 · 3 min read
HBase

How HBase Minor Compaction Works

Understanding HBase minor compaction -- how files are selected for compaction using the ratio algorithm, with a worked example showing the selection logic.

2 Jan 2013 · 2 min read
HBase

How HBase Major Compaction Works

Understanding HBase major compaction -- how it differs from minor compaction, the configuration properties that control it, and the three methods that trigger it.

2 Jan 2013 · 2 min read
JavaData Engineering

Hadoop Development Environment in Eclipse

How to set up a Hadoop development environment in Eclipse with the WordCount MapReduce example.

24 Nov 2011 · 2 min read
Data EngineeringBig Data

Hadoop Windows Cygwin error tmp folder file permissions

Error java.io.IOException: Failed to set permissions of path: file:/tmp/hadoopjj/mapred/staging/jj1931875024/.staging to 0700 The details for my Hadoop we...

26 Oct 2011 · 1 min read
Data EngineeringBig Data

Hadoop Setup in Windows

Quick guide to installing Hadoop on Windows with Cygwin, including Java configuration and verification steps.

5 Oct 2011 · 1 min read
CSS3Web Accessibility

CSS3 Speech Module

CSS3 Speech modulehttp://www.w3.org/TR/css3speech/ is now in last call state Aug 18 , 2011 . Journey for making accessible web standards have come far from...

21 Aug 2011 · 2 min read
CSS3

Graceful degradation and Progressive Enhancement

Graceful degradation and Progressive enhancement are two different web design strategies Progressive enhancement In progressive enhancement you design...

15 Aug 2011 · 1 min read
LinuxUbuntu

Add Environment variables in Linux permanently

There are two ways to setup environment variables in linux systems Ubuntu , Red Hat , Fedora etc Using the export method of adding environment variables...

29 June 2011 · 2 min read
LinuxJava

How to Install Java and JDK in Linux (Ubuntu)

Step-by-step guide to manually installing the Java JDK on Linux, including download, permissions, and environment variable setup.

25 June 2011 · 2 min read
LinuxTomcat

How to Install Tomcat in Linux (Ubuntu)

Step-by-step guide to manually installing Apache Tomcat on Linux, including download, permissions, environment variables, and startup.

25 June 2011 · 2 min read
Microsoft Office

Excel cell drag plus sign not coming

The explains How to fix the problem when excel cell drag function don’t work or plus sign don’t come when you want to drag the cells in Excel 2010 , 2007 o...

11 June 2011 · 1 min read
Linux

How to Know Which Linux Version You Are Running

Quick tip on using the uname command to check your Linux kernel version and system information.

3 Apr 2011 · 1 min read