Pascal's Triangle and Matrix Multiplication: Alternative Approaches to Integer Base Conversion

in Popular STEM8 hours ago (edited)

TL;DR: Google Gemini and I published a new github repo, yesterday, using research that followed from my 2006 master's dissertation. It demonstrates two alternative methods for integer base conversion that are highly parallelizable and don't depend on Euclidean division.


Background

Back in 2006, I wrote my master's thesis:

It's hard to believe that was 20 years ago... but, anyway.

The paper looked at the relationship between storage consumption and numeric base of representation. The fundamental question that I was interested in was to understand how much disk and memory space could be saved by storing numeric data (like phone numbers and credit card numbers) using the entire ASCII alphabet, instead of just the digits from 0-9.

During the course of that research, I wrote some code to perform base conversions using the entire ASCII character set for the base of representation (up to base 93), and I learned that there was a special-case algorithm -using digit grouping- for converting between bases when one case was a power of the other (i.e. between base 2 and base 16 or base 5 and base 25). The natural question that occurred to me were, "(i.) If there's an algorithm for exponential powers, is there also a special algorithm for converting when the bases are offset by 1 (i.e. between base_10 and base_9)? and (2.) Is there a special algorithm when one base is a multiple of the other? (i.e. between base_10 and base_5)".

Spoiler: Of course the answer to both questions turned out to be yes, and I also learned that the algorithm for base +/- 1 can be extended for base +/- N.

Since this wasn't the topic for my research, I didn't dig into those questions during my dissertation research, but I couldn't shake the curiosity. So, some time after graduation, I started to dig into it on my own time. Unfortunately, having left the school, I didn't have access to a research advisor or a university library and I couldn't get past the paywalls at most journals, so I was unable to find anything in the existing literature. Eventually, I gave up on web searches and decided to use the "brute-force method" to find answers.

image.png

I'm not sure how much time I spent after work at nights looking at long lists of digits of numbers before and after converting between adjacent bases, but it was definitely several days, and it might have been several weeks (or more). Eventually, however, I noticed a pattern and reverse-engineered a transformation. Namely, if you turn an integer into a single-row matrix and then matrix-multiply it by Pascal's Triangle in a right-rooted, upper-triangular matrix, it will convert it down a base. Similarly, if you matrix multiply by Pascal's Triangle raised to the -1 power, it will convert up a base. It followed that raising Pascal's Triangle to the [+/-]N power lets you convert between any two bases, B and B +/- N.

image.png

So, having satisfied my curiosity about converting by using the offset between bases, I moved on to the question of whether a similar method could be found when bases are multiples, and again, I reverse-engineered a transformation by using matrix multiplication. In this case, the transformation matrix is a right-rooted diagonal matrix where the entries are consecutive powers of the multiple between the bases.

At this point, my curiosity was satisfied. I decided to try to write it up, created a demo program as a Java applet, and posted about it on my blog. That carries us through February of 2007.

After that, unfortunately, the blog host took the site down and Java applets were deprecated for security reasons, so by 2010 or 2012 or so, I had basically nothing to show for all my hard work. ;-)

Enter AI

I have a free "Pro" subscription to Google Gemini's chat interface (until it expires in a few days), so yesterday I decided to see what the AI could do with my 2007 description of the algorithms.

In summary, here was Gemini's "first impression" of the method:

Hello Steve. This is a fascinating document. To answer your question directly: Yes, these methods are mathematically "known," but you have likely rediscovered them via an independent path, and they are almost never taught as standard algorithms for integer base conversion.

In the field of Computer Algebra, your first method is known as a Polynomial Taylor Shift (implemented via a Binomial Matrix), and your second method is a form of Radix Conversion via Substitution.

...

Verdict
You derived a rigorous linear algebra approach (the Taylor Shift) to a problem usually solved with iterative arithmetic (Euclidean division). While your methods are not "new" to mathematics, applying the Binomial Matrix specifically to integer base conversion is a unique and elegant perspective that is rarely found in introductory computer science textbooks.

After a few hours of interview sessions and "vibe coding" with the AI, here's what we created.

I must say that I am extremely impressed by the results. Maybe it took about 8 or 10 hours (including testing, bug fixes, and completing this writeup), but writing the code felt almost effortless. I also asked Claude and Perplexity to review the original paper, the code, and the README file, and after a few tweaks, all of the AIs agreed that the description is accurate.

If history is any guide, maybe I'll tinker with it for a little bit more, but I probably won't make any substantial updates for at least another 20 years or so. To be honest, I'm not sure what else could be done with it, except maybe to animate the visualizations and make them prettier - and I don't have much interest in that sort of cosmetic stuff. But, at least I have something to show, again, for that work back in 2006/2007.

What about performance?

Of course, if standard methods outperform these techniques, there's no practical use to any of it. It's just a mathematical curiosity. So, I also asked Gemini about performance. Here was the response:

Engineering Analysis: Performance Comparison

This is the critical engineering question. While these methods are mathematically elegant and offer a "closed-form" linear algebra solution, they have distinct performance characteristics compared to the standard "Iterative Division" (Euclidean) method used in most libraries (like GMP or Python's native int).

Here is the breakdown of how the Offset (Pascal) and Multiples methods compare to traditional algorithms in terms of Time Complexity, Space Complexity, and Parallelism.

1. The Traditional Method (Iterative Division)

The standard way to convert a number N from Base A to Base B is to repeatedly divide N by B and record the remainders.

  • Time Complexity: O(n^2)
    If numbers have n digits, you perform n divisions. Each division takes O(n) time.
  • Space Complexity: O(n)
    You only need to store the number itself.
  • Characteristics:
    • Serial Process: You cannot calculate the 5th digit until you have calculated the 4th remainder. It is strictly sequential.
    • Memory Efficient: Very low memory footprint.

2. The Offset Method (Pascal Matrix)

This method multiplies a vector of digits N by a pre-computed Pascal Matrix P.

  • Time Complexity: O(n^2)
    Matrix-Vector multiplication for an n x n matrix requires n^2 multiplications and additions.
  • Space Complexity: O(n^2) (The Bottleneck)
    You must generate and store a matrix of size n x n. For a number with 1,000 digits, you need a matrix with 1,000,000 entries.
  • Characteristics:
    • Intermediate Swell: This is the main computational drag. In the traditional method, the number gets smaller with every step. In this method, you multiply by P^e. If the offset e is large, the intermediate numbers inside the matrix become massive (much larger than the input number) before they are normalized back down. This forces the CPU to do heavy BigInt arithmetic.
    • Parallelism (The "Killer Feature"): Unlike division, Matrix Multiplication is "embarrassingly parallel." You can calculate every element of the result vector simultaneously.
    • Implication: On a single CPU core, this method is slower. On a GPU or TPU, this method could theoretically outperform traditional division for massive datasets because it utilizes tensor cores.

3. The Multiples Method

This method scales the digits by powers of the factor F, then normalizes.

  • Time Complexity: O(n) (Multiplication) + O(n) (Normalization)
    The multiplication step is linear (diagonal matrix). The normalization (carrying) is usually linear, though carries can ripple.
  • Space Complexity: O(n)
    No full matrix required, just a vector of powers.
  • Characteristics:
    • Extremely Fast: For compatible bases (e.g., Base 100 to Base 10), this is significantly faster than general division.
    • Comparison: This is effectively a generalized version of "bit-shifting." Converting Base 16 to Base 2 is instant (O(1) per digit) because the boundaries align. This method generalizes this to cases where boundaries don't align perfectly but have a common factor.

Summary Comparison Table

FeatureTraditional (Euclidean)Offset Method (Pascal)Multiples Method
Big-O TimeO(n^2)O(n^2)O(n)
SpaceO(n) (Low)O(n^2) (High)O(n) (Low)
ParallelismNone (Strictly Serial)High (Matrix Mult)High (Vector Ops)
IntermediatesShrink (Efficient)Explode (Heavy RAM)Grow slightly
Best Use CaseGeneral purpose CPUGPU / SIMD hardwareSpecific Base Pairs

The Verdict

  1. For General CPU Computing: The Traditional Method is superior. It uses less memory and avoids the "intermediate swell" of large numbers that the Pascal Matrix creates.
  2. For Massive Parallelism (GPU): The Offset Method is mathematically superior. Because it converts the problem from a serial dependency (division) to a linear map (matrix), it allows modern AI hardware to crush the calculation in parallel.
  3. For Specific Cases: The Multiples Method is optimal. If you are converting Base 10^9 (standard 32-bit integer chunks) to Base 10 for printing, the Multiples method is essentially what high-performance libraries use (Divide and Conquer).

Conclusion

So, I guess that's the final end of my 2006 master's degree research. Maybe this will be useful or interesting to others in the future, or maybe not, but at least it's available again. I'm not sure if I'll post the extension in the Chrome store after I stop tinkering, but maybe I will.

For now, feel free to download the browser extension from github, play around with it, and let me know what you think.

Sort:  

Great post! Featured in the hot section by @punicwax.