The "and" test for functions
There is a quick test for composite functions.
Describe what the function does in one sentence. If you need the word “and,” there is a good chance the function is doing more than one job.
“It parses the next token and updates the cursor.”
That is two things.
“It validates the input and returns either an error or a value.”
Still two things.
The test is not mathematically strict. English is messy. But it is useful because the word usually appears at the seam where two responsibilities have been fused together because they happen to run in the same place.
A tokenizer makes this easy to see.
Here is the monolithic version. It scans the source, tracks line and column, skips whitespace, emits tokens, and produces error tokens for unknown characters.
function tokenize(source: string): Token[] {
const tokens: Token[] = [];
let pos = 0;
let line = 1;
let col = 1;
while (pos < source.length) {
if (source[pos] === ' ' || source[pos] === '\n') {
if (source[pos] === '\n') {
line++;
col = 1;
} else {
col++;
}
pos++;
continue;
}
if (source[pos] >= '0' && source[pos] <= '9') {
let numStr = '';
const startCol = col;
while (pos < source.length && source[pos] >= '0' && source[pos] <= '9') {
numStr += source[pos];
pos++;
col++;
}
tokens.push({ type: 'number', value: numStr, line, col: startCol });
continue;
}
if ('+-*/'.includes(source[pos])) {
tokens.push({ type: 'operator', value: source[pos], line, col });
pos++;
col++;
continue;
}
tokens.push({ type: 'error', value: source[pos], line, col });
pos++;
col++;
}
return tokens;
}
Nothing about this code is obviously awful. That is what makes it a good example. The problem is not sloppy syntax. The problem is that several concepts are living inside one function with no names.
The tokenizer is scanning characters and tracking position and deciding whitespace policy and recovering from errors.
The first useful step is not “make the function shorter.” It is “name the thing that already exists.”
Position tracking is already a concept here. So give it a type.
interface SourceCursor {
readonly source: string;
readonly pos: number;
readonly line: number;
readonly col: number;
}
function cursorFrom(source: string): SourceCursor {
return { source, pos: 0, line: 1, col: 1 };
}
function cursorAdvance(cursor: SourceCursor, count: number): SourceCursor {
let { pos, line, col } = cursor;
for (let i = 0; i < count && pos < cursor.source.length; i++) {
if (cursor.source[pos] === '\n') {
line++;
col = 1;
} else {
col++;
}
pos++;
}
return { ...cursor, pos, line, col };
}
function cursorPeek(cursor: SourceCursor): string | undefined {
return cursor.source[cursor.pos];
}
function cursorIsAtEnd(cursor: SourceCursor): boolean {
return cursor.pos >= cursor.source.length;
}
That change is more important than it looks. The original function did not “also track position.” It had a hidden data structure inside it. Pulling out SourceCursor makes that explicit.
Next, separate scanning from token policy. A scanner should recognize a pattern and advance the cursor. It should not know what a “number token” is.
type ScanResult =
| { readonly matched: true; readonly value: string; readonly cursor: SourceCursor }
| { readonly matched: false };
function scanWhile(
cursor: SourceCursor,
predicate: (ch: string) => boolean
): ScanResult {
let end = cursor.pos;
while (end < cursor.source.length && predicate(cursor.source[end])) {
end++;
}
if (end === cursor.pos) return { matched: false };
return {
matched: true,
value: cursor.source.slice(cursor.pos, end),
cursor: cursorAdvance(cursor, end - cursor.pos),
};
}
function scanChar(cursor: SourceCursor, chars: string): ScanResult {
const ch = cursorPeek(cursor);
if (ch !== undefined && chars.includes(ch)) {
return {
matched: true,
value: ch,
cursor: cursorAdvance(cursor, 1),
};
}
return { matched: false };
}
Now the tokenizer itself can become a readable sequence of policy choices:
function tokenize(source: string): Token[] {
const tokens: Token[] = [];
let cursor = cursorFrom(source);
while (!cursorIsAtEnd(cursor)) {
const ws = scanWhile(cursor, (ch) => ch === ' ' || ch === '\n' || ch === '\t');
if (ws.matched) {
cursor = ws.cursor;
continue;
}
const { line, col } = cursor;
const num = scanWhile(cursor, (ch) => ch >= '0' && ch <= '9');
if (num.matched) {
tokens.push({ type: 'number', value: num.value, line, col });
cursor = num.cursor;
continue;
}
const op = scanChar(cursor, '+-*/');
if (op.matched) {
tokens.push({ type: 'operator', value: op.value, line, col });
cursor = op.cursor;
continue;
}
const ch = cursorPeek(cursor)!;
tokens.push({ type: 'error', value: ch, line, col });
cursor = cursorAdvance(cursor, 1);
}
return tokens;
}
The final version is not just “smaller pieces.” It is more honest about what the code is doing.
SourceCursor is now a real concept. ScanResult is a real concept. scanWhile() is a reusable operation that does not know anything about numbers, operators, or tokens. The tokenizer reads top to bottom as a series of decisions: skip whitespace, try number, try operator, fall back to error.
That is the deeper lesson here. Decomposition does not just split code. It often uncovers the types the original design forgot to name.
This is also why “just break the function up” is weak advice on its own. You can turn one hard function into five hard functions if you split mechanically. The good version appears when the pieces line up with real concepts in the problem.
The “and” test helps because it points your attention there. When a description needs “and,” ask whether the function is hiding two ideas that deserve names of their own.
Sometimes the answer will be no. Sometimes the two operations are inseparable. But often the answer is yes, and that is where the design starts getting cleaner.
The function was not doing one thing badly. It was doing several things in one place.