Pascal's Triangle and Matrix Multiplication: Alternative Approaches to Integer Base Conversion

remlaps (76)adminFounder - PSin Popular STEM • 5 months ago (edited)

TL;DR: Google Gemini and I published a new github repo, yesterday, using research that followed from my 2006 master's dissertation. It demonstrates two alternative methods for integer base conversion that are highly parallelizable and don't depend on Euclidean division.

Background

Back in 2006, I wrote my master's thesis:

Storage Efficiency for Radix Representation of Data with Fixed Length Encoding

It's hard to believe that was 20 years ago... but, anyway.

The paper looked at the relationship between storage consumption and numeric base of representation. The fundamental question that I was interested in was to understand how much disk and memory space could be saved by storing numeric data (like phone numbers and credit card numbers) using the entire ASCII alphabet, instead of just the digits from 0-9.

During the course of that research, I wrote some code to perform base conversions using the entire ASCII character set for the base of representation (up to base 93), and I learned that there was a special-case algorithm -using digit grouping- for converting between bases when one case was a power of the other (i.e. between base 2 and base 16 or base 5 and base 25). The natural question that occurred to me were, "(i.) If there's an algorithm for exponential powers, is there also a special algorithm for converting when the bases are offset by 1 (i.e. between base_10 and base_9)? and (2.) Is there a special algorithm when one base is a multiple of the other? (i.e. between base_10 and base_5)".

Spoiler: Of course the answer to both questions turned out to be yes, and I also learned that the algorithm for base +/- 1 can be extended for base +/- N.

Since this wasn't the topic for my research, I didn't dig into those questions during my dissertation research, but I couldn't shake the curiosity. So, some time after graduation, I started to dig into it on my own time. Unfortunately, having left the school, I didn't have access to a research advisor or a university library and I couldn't get past the paywalls at most journals, so I was unable to find anything in the existing literature. Eventually, I gave up on web searches and decided to use the "brute-force method" to find answers.

I'm not sure how much time I spent after work at nights looking at long lists of digits of numbers before and after converting between adjacent bases, but it was definitely several days, and it might have been several weeks (or more). Eventually, however, I noticed a pattern and reverse-engineered a transformation. Namely, if you turn an integer into a single-row matrix and then matrix-multiply it by Pascal's Triangle in a right-rooted, upper-triangular matrix, it will convert it down a base. Similarly, if you matrix multiply by Pascal's Triangle raised to the -1 power, it will convert up a base. It followed that raising Pascal's Triangle to the [+/-]N power lets you convert between any two bases, B and B +/- N.

So, having satisfied my curiosity about converting by using the offset between bases, I moved on to the question of whether a similar method could be found when bases are multiples, and again, I reverse-engineered a transformation by using matrix multiplication. In this case, the transformation matrix is a right-rooted diagonal matrix where the entries are consecutive powers of the multiple between the bases.

At this point, my curiosity was satisfied. I decided to try to write it up, created a demo program as a Java applet, and posted about it on my blog. That carries us through February of 2007.

After that, unfortunately, the blog host took the site down and Java applets were deprecated for security reasons, so by 2010 or 2012 or so, I had basically nothing to show for all my hard work. ;-)

Enter AI

I have a free "Pro" subscription to Google Gemini's chat interface (until it expires in a few days), so yesterday I decided to see what the AI could do with my 2007 description of the algorithms.

In summary, here was Gemini's "first impression" of the method:

Hello Steve. This is a fascinating document. To answer your question directly: Yes, these methods are mathematically "known," but you have likely rediscovered them via an independent path, and they are almost never taught as standard algorithms for integer base conversion.

In the field of Computer Algebra, your first method is known as a Polynomial Taylor Shift (implemented via a Binomial Matrix), and your second method is a form of Radix Conversion via Substitution.

...

Verdict
You derived a rigorous linear algebra approach (the Taylor Shift) to a problem usually solved with iterative arithmetic (Euclidean division). While your methods are not "new" to mathematics, applying the Binomial Matrix specifically to integer base conversion is a unique and elegant perspective that is rarely found in introductory computer science textbooks.

After a few hours of interview sessions and "vibe coding" with the AI, here's what we created.

The repo is now on Github: PascalTriangleBaseConversions
Including the original "paper": Alternative Algorithms for Integer Base Conversion" (Steve Palmer, 2007)

I must say that I am extremely impressed by the results. Maybe it took about 8 or 10 hours (including testing, bug fixes, and completing this writeup), but writing the code felt almost effortless. I also asked Claude and Perplexity to review the original paper, the code, and the README file, and after a few tweaks, all of the AIs agreed that the description is accurate.

If history is any guide, maybe I'll tinker with it for a little bit more, but I probably won't make any substantial updates for at least another 20 years or so. To be honest, I'm not sure what else could be done with it, except maybe to animate the visualizations and make them prettier - and I don't have much interest in that sort of cosmetic stuff. But, at least I have something to show, again, for that work back in 2006/2007.

What about performance?

Of course, if standard methods outperform these techniques, there's no practical use to any of it. It's just a mathematical curiosity. So, I also asked Gemini about performance. Here was the response:

Engineering Analysis: Performance Comparison

This is the critical engineering question. While these methods are mathematically elegant and offer a "closed-form" linear algebra solution, they have distinct performance characteristics compared to the standard "Iterative Division" (Euclidean) method used in most libraries (like GMP or Python's native int).

Here is the breakdown of how the Offset (Pascal) and Multiples methods compare to traditional algorithms in terms of Time Complexity, Space Complexity, and Parallelism.

1. The Traditional Method (Iterative Division)

The standard way to convert a number N from Base A to Base B is to repeatedly divide N by B and record the remainders.

Time Complexity: O(n^2)
If numbers have n digits, you perform n divisions. Each division takes O(n) time.

Space Complexity: O(n)
You only need to store the number itself.

Characteristics:

Serial Process: You cannot calculate the 5th digit until you have calculated the 4th remainder. It is strictly sequential.

Memory Efficient: Very low memory footprint.

2. The Offset Method (Pascal Matrix)

This method multiplies a vector of digits N by a pre-computed Pascal Matrix P.

Time Complexity: O(n^2)
Matrix-Vector multiplication for an n x n matrix requires n^2 multiplications and additions.

Space Complexity: O(n^2) (The Bottleneck)
You must generate and store a matrix of size n x n. For a number with 1,000 digits, you need a matrix with 1,000,000 entries.

Characteristics:

Intermediate Swell: This is the main computational drag. In the traditional method, the number gets smaller with every step. In this method, you multiply by P^e. If the offset e is large, the intermediate numbers inside the matrix become massive (much larger than the input number) before they are normalized back down. This forces the CPU to do heavy BigInt arithmetic.

Parallelism (The "Killer Feature"): Unlike division, Matrix Multiplication is "embarrassingly parallel." You can calculate every element of the result vector simultaneously.

Implication: On a single CPU core, this method is slower. On a GPU or TPU, this method could theoretically outperform traditional division for massive datasets because it utilizes tensor cores.

3. The Multiples Method

This method scales the digits by powers of the factor F, then normalizes.

Time Complexity: O(n) (Multiplication) + O(n) (Normalization)
The multiplication step is linear (diagonal matrix). The normalization (carrying) is usually linear, though carries can ripple.

Space Complexity: O(n)
No full matrix required, just a vector of powers.

Characteristics:

Extremely Fast: For compatible bases (e.g., Base 100 to Base 10), this is significantly faster than general division.

Comparison: This is effectively a generalized version of "bit-shifting." Converting Base 16 to Base 2 is instant (O(1) per digit) because the boundaries align. This method generalizes this to cases where boundaries don't align perfectly but have a common factor.

4. The Batch Processing Advantage (The "Economy of Scale")

The analysis changes significantly when converting thousands of numbers at once (Batching).

Amortized Cost: If you are converting 1,000 numbers, you only pay the "Setup Cost" (generating and loading the Pascal Matrix) once.

Matrix-Matrix Operations: Batching effectively changes the math from Matrix x Vector to Matrix x Matrix.

Efficiency: Modern CPUs and GPUs are highly optimized for Matrix-Matrix multiplication. By loading the Pascal Matrix into the cache once and reusing it against thousands of inputs, the algorithm shifts from being Memory Bound (waiting for RAM) to Compute Bound (limited only by math speed), making it highly competitive for bulk data processing.

Summary Comparison Table

Feature Traditional (Euclidean) Offset Method (Pascal) Multiples Method

Big-O Time O(n^2) O(n^2) O(n)

Space O(n) (Low) O(n^2) (High) O(n) (Low)

Parallelism None (Strictly Serial) High (Matrix Mult) High (Vector Ops)

Intermediates Shrink (Efficient) Explode (Heavy RAM) Grow slightly

Best Use Case General purpose CPU GPU / SIMD hardware Specific Base Pairs

The Verdict

For Single Numbers: The Traditional Method is superior. It uses less memory and avoids the "intermediate swell" of large numbers.

For Batch Processing: The Offset Method shines. Because it amortizes the memory cost across thousands of conversions and utilizes Matrix-Matrix optimization, it is ideal for high-volume tasks (e.g., blockchain indexing or log processing).

For Specific Cases: The Multiples Method is optimal. If you are converting Base 10^9 (standard 32-bit integer chunks) to Base 10 for printing, the Multiples method is essentially what high-performance libraries use (Divide and Conquer).

Feature	Traditional (Euclidean)	Offset Method (Pascal)	Multiples Method
Big-O Time	O(n^2)	O(n^2)	O(n)
Space	O(n) (Low)	O(n^2) (High)	O(n) (Low)
Parallelism	None (Strictly Serial)	High (Matrix Mult)	High (Vector Ops)
Intermediates	Shrink (Efficient)	Explode (Heavy RAM)	Grow slightly
Best Use Case	General purpose CPU	GPU / SIMD hardware	Specific Base Pairs

Update: I reminded Gemini that the offset method can convert many numbers at once, in a single matrix multiplication. The quote above now contains Gemini's updated analysis. I'm surprised to see the blockchain indexing use-case there. I challenged it, but it stuck to its guns saying:

Summary
In these fields, you never have "one number" to convert. You have a firehose of numbers.
- Traditional Method: "I will fill this bucket with water using a teaspoon, one scoop at a time." (Looping through data).
- Your Batch Method: "I will build a pipe and open the valve." (Matrix multiplication).

The fact that your algorithm requires a heavy "setup" (building the matrix) becomes irrelevant because you only build the pipe once, then let millions of gallons (numbers) flow through it.

Conclusion

So, I guess that's the final end of my 2006 master's degree research. Maybe this will be useful or interesting to others in the future, or maybe not, but at least it's available again. I'm not sure if I'll post the extension in the Chrome store after I stop tinkering, but maybe I will.

For now, feel free to download the browser extension from github, play around with it, and let me know what you think.

#mathematics #linear-algebra #programming #pascals-triangle

5 months ago in Popular STEM by remlaps (76)adminFounder - PS

$14.59

Sort:

Trending

[-]

booming02 (28) 4 months ago

Hi,

You have received special support because you (or someone on your behalf) sent SBD to @null to promote your post.
You can now promote your posts on Speak on Steem to increase their visibility and potential rewards.

Learn more here:

https://speem.watch/p/steem-seven/speak-on-steem-update-playlists-the-revamped-promoted-section-and-much-more

Curated by @alejos7ven

$0.00

[-]

nevlu123 (81) 4 months ago

#play

Your video has been reviewed by the Video Watchers team.Thank you for your quality content on speem.

$0.00

[-]

punicwax (-14)(1) 5 months ago

$0.00

Reveal Comment