Excel Extracting Numbers from a String
07/06/2023
I was looking for a way to extract numbers from a string in and excel cell. Years ago, I created a VBA module with regular expression functions which could do this kind of work. However, using VBA tends to decrease the portability of an excel spreadsheets depending upon the policies that might be in place. So I have been trying to use standard functions and formulas to improve my spreadsheet portability. I was searching the internet for a solution to this problem and indeed I found several examples of formulas which extracted numbers but what I eventually found is they left the decimals behind.
After more searching, I concluded that I was going to have to figure it out and write my own formula. I thought I should be able to use array-based formulas to solve the problem and a little while later I came up with this.
=VALUE(TEXTJOIN("",TRUE,FILTER(MID(A2,SEQUENCE(1,LEN(A2)),1),ISNUMBER(VALUE(MID(A2,SEQUENCE(1,LEN(A2)),1)))+(MID(A2,SEQUENCE(1,LEN(A2)),1)="."),"")))
Formula Explanation
To explain this formula it will be easiest if we break it down into its parts and describe each part and then put it all together.
1. First we use the Mid function with an array generated by the sequence function in the start field and length of 1 extracts each character into an array
| Data | Formula Component | ||||||
| 2.5 TB | MID(A2,SEQUENCE(1,LEN(A2)),1) MID(A2,{1,2,3,4,5,6},1) | 2 | . | 5 | T | B |
2. The formula then uses the isnumber function against the array to return binary for each element of the array
| Data | Formula Components | 2 | . | 5 | T | B | |
| 2.5 TB | ISNUMBER( VALUE(MID(A2,SEQUENCE(1,LEN(A2)),1))) | true | false | true | false | false | false |
3. Then a comparison operator is used to compare each members of the array returned in step 1 to look for the decimal or period
| Data | Formula Components | 2 | . | 5 | T | B | |
| 2.5 TB | (MID(A2,SEQUENCE(1,LEN(A2)),1)=”.”) | false | true | false | false | false | false |
4. The results of these two comparisons need to be combined in a bitwise or fashion which is accomplished by adding them together
| Data | Formula Components | 2 | . | 5 | T | B | |
| 2.5 TB | ISNUMBER( VALUE(MID(A2,SEQUENCE(1,LEN(A2)),1))) +(MID(A2,SEQUENCE(1,LEN(A2)),1)=”.”) | 1 | 1 | 1 | 0 | 0 | 0 |
5. Almost there, now the values returned in step 1 need to be filtered based upon the combined comparisons from step 4. The filter function takes in the values from the array and then in the next filed the array of comparison results is used to filter out any 0 results.
| Data | Formula Components | 2 | . | 5 | T | B | |
| 2.5 TB | FILTER(MID(A2,SEQUENCE(1,LEN(A2)),1), ISNUMBER( VALUE(MID(A2,SEQUENCE(1,LEN(A2),1))) +(MID(A2,SEQUENCE(1,LEN(A2)),1)=”.”),””) | 2 | . | 5 |
6. Finally use the textjoin function to combine the elements of the array into a single result and use value to ensure they data is converted to a number type.
| Data | Formula Components | 2 | . | 5 | T | B | |
| 2.5 TB | VALUE(TEXTJOIN(“”,TRUE, FILTER(MID(A2,SEQUENCE(1,LEN(A2)),1), ISNUMBER(VALUE(MID(A2,SEQUENCE(1,LEN(A2)),1))) +(MID(D5,SEQUENCE(1,LEN(A2)),1)=”.”),””))) | 2.5 |
Make It A Function
Rather than enter this in and copy it down to every cell you need it adjusting the reference you can actually make it into a function using the Name Manager under formulas with the Lambda function. You may be familiar with creating named ranges. Under the Formula menu item in Excel you can select a range of cells and choose define name so that you can refer to it else where. But what you can also do is create a name that behaves like a function. Open Name Manager and add a name GetNum and paste into the reference field the formula below. Then you can reuse this formula against any reference just by using =GetNum(Ref).
=LAMBDA(vX,VALUE(TEXTJOIN("",TRUE,FILTER(MID(vX,SEQUENCE(1,LEN(vX)),1),ISNUMBER(VALUE(MID(vX,SEQUENCE(1,LEN(vX)),1)))+(MID(vX,SEQUENCE(1,LEN(vX)),1)="."),""))))
# Then in any cell you can type the examples below and the formula will be applied to the reference supplied returning the extracted decimal number
=GetNum({Reference})
=GetNum(A2)