In the previous post, we started searching for Riemann counterexamples by trying to find a witness, that would disprove the hypothesis, across a range of integers. While we were successful in getting the code working to find the witness value, we had no way to save the results. In effect, this meant that for each program run, we were starting from scratch.
To make sure that we preserve the results of our runs, we need to introduce some way of persisting the results, and the state at which we stopped each run. This way, we could pick up where we started when we run the program next.
There are many ways to introduce persistence in the code, with the simplest being to write the values to a file and read the file each time to understand what has happened before and where to start the next time. However, it is better to use a database instead. In this post, we add the simplest of databases, SQLite, to our system.
As always, we will be closely following the original post from here.
The Interface pattern
As mentioned before, we are planning to use a SQLite database in our implementation. The simplest approach would be write code that depends on this knowledge and uses the database functions directly. However, this means that if we decide to use a different implementation for persisting the data tomorrow, we have to rewrite all the parts that interact with the database!
So instead of adding just a database implementation, we need to add an abstraction layer instead decoupling the intent and the implementation. This layer is called an Interface. We first identify the operations that we want the interface to support and then hide the actual code enabling the different operation as a detail to each implementation. We also write tests for the interface and expect results irrespective of the implementation details.
Since we are getting started on a major feature addition now, it is a good idea to add all additional changes in a separate branch. In our case, I'm using the branch
database to handle all the code in this post
Interface Methods
Let us think about the basic operations our persistence implementation needs to support:
- A way to insert/overwrite data in the persistence layer ( Upsert)
- A way to read the data that has already been written ( Load)
- In addition, we will also add a convenience action to get summary statistics of the data that is already present in the persistence layer ( Summarize)
That is our interface! If we build the interface right, we should be able to plug any code into our system easily, as long as it has way to support these three methods.
To see this in action, we will be first adding a non-persistent in-memory database (a simple Hashmap of values) as our first implementation, and then adding the SQLite database as an additional implementation.
The First Implementation
Adding the Types
We first create a type
RiemannDivisorSum to hold a single record, with the number(N), the Sum of Divisors(DivisorSum) and the calculated Witness values (WitnessValues).
We can also add
SummaryStats, which holds the Largest Computed N and for the Largest Witness value, both
RiemannDivisorSum values.
Adding the Interface
Now that we have the types in place, we can add the interface with the methods as discussed before.
Now that we have interface fully defined, we can add the implementation. The In-Memory Database implementation is defined in this file and the tests are in this one.
All of the above steps are done as a single feature in this commit.
Precomputing Divisor Sum
In some cases, we can provide the DivisorSum directly to the function without it having to compute it. For that, we need to extend the function to optionally take the precomputed DivisorSum value as a parameter, and calculate the DivisorSum only if this value is not provided. This is implemented in this commit.
Calculating Riemann sums for an array
Now that we know how to find the Riemann sums for a single integer, it is trivial to we can extend it for an array.
Adding the
main
We have the individual components in place along with the unit tests. Now, it is time to add the
main function. The
main function in
Golang is the entrypoint to your code. In our case, it calculates the Witness values for a range of numbers and prints out the summary statistics across the range. Once we have the
main
implemented, you can invoke the main function using
go run main.go to see the output in the terminal.
In this commit, we also add capability to search across arbitrary start and end ranges.
Additional fixes
At this point, I identified a few issues with the code which I fixed in the following small commits:
- We were using
inttill now which is soon going to be too small to represent the numbers we are going to handle! In this commit, we change all numbers toint64. - While adding that fix, we introduced another bug. That was fixed and additional tests were added in this commit.
- We also add tests to capture the
panicsituations correctly in this commit. - In additional, there are many small (mostly cosmetic) cleanups here, here, and here.
With these changes, we are well and truly done with our first implementation and can more on to the second (and more practical) implementation.
Adding the Second(SQLite) Implementation
While ideating on adding the SQLite implementation, I realized that I had made a mistake by missing one method in the interface - which would initialize the interface. This design error actually came up while we were adding the first implementation as well, when I had to make the data field in the in-memory-database public, just to initialize the function. This is a code smell since the way the data is stored is really an implementation concern and not something the user should be aware of. In this commit, I rectify that mistake by making the data field private and adding and Initialize method to the interface, in addition to the method for the implementation (+ tests!)
Now we can add in the SQLite implementation, using the same methods as before. The implementation details are completely different from the in-memory database, but the interfaces remain the same. This is best showcased in this commit, where the test files are practically the same of the two implementations! This also allows us to parametrize the tests in this commit by moving the tests to be database specific rather than implementation specific. We can now go back and forth between the two implementations as we please!
The last step to do is to upgrade the main function to use the SQLite implementation instead of the In-Memory DB. This gives us an output:
On rerunning the function after stopping it:
You can see that function is able to save the state of the last run and start the calculations from that time instead of from scratch each time.
Wrapping up
Now that we have the entire feature added, we can create a new
pull request which after review is
merged on to the
main branch.
Running the function upto a 100M now takes just a few minutes, and we can stop and start any time we want. However, we're still nowhere close to finding a better candidate that the original
55440. If you look at the top witness values in the SQLite DB, we can see a curious pattern:
All of the top numbers are multipliers of
2520. This makes intuitive sense since
2520 is a
superabundant number and it
has been proven that a witness value,
if it exists, has to be a superabundant number, and a multiplier of 2520. Maybe we can use this idea to improve the
way we search?
In the next post, we will go over adding search strategies than indiscriminately searching every number, thereby improving the speed of our calculations by multiple orders of magnitude!