Data Science Foundations for a New Stock Market

Data Science Foundations for a New Stock Market


In this article, I describe a new trading and gaming system, based on state-of-the-art mathematical engineering, robust architecture, and patent-pending technology. It offers an alternative to the stock market and traditional gaming. This system is also far more transparent than the stock market, and can not be manipulated, as formulas to win the biggest returns are made public. Also, it simulates a neutral, efficient stock market. In short, there is nothing random, everything is deterministic and fixed in advance, and known to all users. Yet it behaves in a way that looks perfectly random, and public algorithms offered to win the biggest gains require so much computing power, that for all purposes, they are useless — except to comply with gaming laws and to establish trustworthiness.
We use private algorithms to determine the winning numbers, and while they produce the exact same results as the public algorithms (we tested this extensively), they are incredibly more efficient, by many orders of magnitude. Also, it can be mathematically proved that the public and private algorithms are equivalent, and we actually proved it. We go through this verification process for any new algorithm introduced in our system. 
In the last section, we offer a competition: can you use the public algorithm to identify the winning numbers computed with the private (secret) algorithm? If yes, the system is breakable, and a more sophisticated approach is needed, to make it work. I don’t think anyone can find the winning numbers (you are welcome to prove me wrong), so the award will be offered to the contestant providing the best insights on how to improve the robustness of this system. And if by chance you manage to identify those winning numbers, great! But it is not a requirement to win the award.
1. Description, Main Features and Advantages
Rather than trading stocks or other financial instruments, participants (the users) purchase numbers. Sequences of winning numbers are generated all the time, and if you can predict the next winning number in a given sequence, your return is maximum. If your prediction is not too far from a winning number, you still make money, but not as much. Our system has the following features:

The algorithms to find the winning numbers are public and regularly updated. Winning is not a question of chance: all future winning numbers are known in advance and can be computed using the public algorithm.
The public algorithm, though very simple in appearance, is not easy to implement efficiently. In fact, it is hard enough that mathematicians or computer scientists do not have advantages over the layman, to find winning numbers.
To each public algorithm, corresponds a private version that runs much, much faster. We use the private version to compute the winning numbers, but both versions produce the exact same numbers.
Reverse-engineering the system to discover any of the private algorithms, is more difficult than breaking strong encryption.
The exact return is known in advance and specified in public ROI tables. It is based on how close you are to a winning number, no matter what that winning number is. Thus, your gains or losses are not influenced by the transactions of other participants.
The system is not rigged and can not be manipulated, since winning numbers are known in advance.
The system is fair: it simulates a perfectly neutral stock market.
Participants can cancel a transaction at any time, even 5 minutes before the winning number is announced.
Trading on margin is allowed, depending on model parameters.
The money played by the participants is not used to fund the company or pay employees or executives. It goes back, in its entirety, to the participants. Participants pay a fee to participate.

Comprehensive tables of previous winning numbers are published, even well before a new sequence (based on these past numbers) is offered to players. It helps participants to design or improve their strategies to find winning numbers. Actually, past winning numbers are part of the public data that is needed to compute the next winning numbers, both for participants and the platform operators.
Various ROI tables are available to participants, and you can even design your own ones. If you are conservative, you can choose one offering a maximum return of 10% (for finding the exact value of a winning number), a 54% chance of winning on any transaction, and a maximum potential loss of 4%. This table is safe enough that we will allow you to “trade” on margin. Another interesting ROI table offers a maximum return of 330%, and the same 54% chance of winning on any transaction, with a maximum potential loss of 4%. Keep in mind that this return is what you can make (or lose) in one day, on one sequence. New winning numbers are issued every day for each life sequence, so your return (negative or positive) gets compounded if you play frequently.
If you are a risk taker, you may like a table offering a maximum return of 500%, a 68% chance of winning on any transaction, and a maximum potential loss of 60%. Or another table with a maximum return of 600%, a 80% chance of winning, but a maximum potential loss of 100%. To download all the sample ROI tables discussed in this presentation, click here.
All the sequences currently offered on the market consist of 8-bit numbers: each winning number (a new one per day per sequence) is an integer between 0 and 255. We will soon offer 16-bit numbers. By design, all ROI tables (even if you use a customized one) offer an average return of 0%. This is true regardless of the sequence you are playing with: sequences and ROI tables are independent.
Below, we explain how this works, using a real-life example.

Source for picture: here
2. How it Works: the Secret Sauce
Here is an example of a sequence being tested in our lab. It shows how the winning numbers are computed, for the sequence in question. The purpose is to illustrate the mechanics, applied to one of our 8-bit systems. The 32-bit version offers more flexibility, as well as potential returns that can beat those of a state lottery jackpot. Our sample 8-bit sequence is defined by the public algorithm below.
2.1. Public Algorithm
Start with initial values x(0) and y(0) that are positive integers, called seeds.
Then for t = 0, 1, 2, and so on, compute x(t+1) and y(t+1) iteratively as follows:
If 4x(t) + 1 < 2y(t) Then      y(t+1) = 4y(t) – 8x(t) – 2     x(t+1) = 2x(t) + 1  Else      x(t+1) = 2x(t)      y(t+1) = 4y(t). 
2.2. The Winning Numbers
The winning numbers for a particular sequence, start at a specific machine-generated iteration T that no one knows, not even the platform operators or software engineers. Typically, T > 30,000,000 and can be chosen randomly. The iterations represent the time. The future winning numbers are always integers between 0 and 255, and they occur only at iterations t = T, T + 8, T + 16, T + 24, and so on. Their value at iteration t is x(t) – 256 x(t-8).
Past winning numbers are those occurring at iterations t = T – 8, T – 16, T – 24, and so on. The last 2,000 of them are published before the sequence is available (life) on the platform, allowing participants to predict future winning numbers, using the public algorithm or by other means, and make (or lose) money. For our above test sequence, the 2,000 past winning numbers in question are available in this text file.
For each sequence, one new winning number is published each day. So, the time unit used here is 3 hours since one day is 8 x 3 hours. To win the maximum amount, one must correctly predict the winning number attached to a future day. Good and fair approximations also result in a gain, albeit lower. These gains and losses are explicitly specified beforehand, in very precise ROI tables, see below. Finally, by design, the winning numbers are not auto-correlated; they appear independently and uniformly distributed (more so than many software-generated pseudo-random numbers), and do not exhibit any known or visible pattern. In short, they look totally arbitrary, yet generated using a rudimentary formula.
2.3. Using Seeds to Find the Winning Numbers
Most participants are likely to do random trials to find or approximate winning numbers. The few who want to use the public algorithm need extra information to compute winning numbers, and even then, their chance of finding such numbers is virtually zero, due to the tremendous amount of computations required. In short, you need to know the seeds, and when to stop your computations. The stopping rule is simple: you stop when you have found numbers that match the past winning numbers publicly available. Then you known for sure that your next number will be a winning one.
We offer information about the seeds in two different ways:

You can request seeds that work. The working seeds that we provide are integer numbers consisting of many digits. In our particular case, the following seeds work: x(0) and y(0). You can download them as text files, by clicking on these two links. Both x(0) and y(0) contain about 250,000 digits in base 10.
Or you can use the information provided with the public algorithm: the fact that there is a set of seeds (and only one) leading to the winning numbers, and consisting of positive integers lower than 1,000.

We guarantee the following:

With the wrong seeds, you won’t find the winning sub-sequence (matching public past winning numbers) in your lifetime, no matter how much computing power you use.
With the right seeds, you will find the winning sub-sequence (matching public past winning numbers) only once, and in less than 32 trillion iterations.

So we offer you a way to find the next winning numbers, and you know in advance how much you will win when finding them, using the ROI table. The question is: how many years would the most powerful computers in the world need, to make all these computations? By contrast, as of January 2019, only 31.4 trillion digits of Pi are known, and computing them require several months using a lot of computing power, together with very clever mathematical engineering bearing some resemblance to our private algorithms. And checking that all these digits (not just the first few trillion) are correct, is another big problem. Here, if you make any tiny mistake in your computations, you will miss the past sequence of winning numbers.
Of course, you could be a mathematical genius, and somehow figure out what the private algorithm is, to make your computations far more efficiently. This is highly unlikely to happen. There is a considerable amount of very advanced, unpublished mathematical research that has been done to make our systems robust. Also, we regularly change the type of sequences that we use in our system, every few months or so. And we work with white hat hackers (paid to hack our system) in order to identify potential vulnerabilities.
Finally, seeds that lead to unpredictable winning numbers (simulating an efficient market) are known as good seeds. Of course, all the sequences that we offer are based on seeds highly believed to be good ones, and that have been run through a battery of statistical tests. Using sequences based on bad seeds would not hurt the players, quite the contrary, but it would make our system easier to crack and cause problems with the ROI tables, thus hurting us.
Proving that specific seeds are good or bad, is one of the most challenging, unsolved mathematical problems of all times. If solved, we would know for sure whether the digits of a number such as Pi, are evenly distributed or not. These mathematical concepts have been studied for some time, see recent material on this topic, here and here.
2.4. ROI Tables
The ROI tables tell you how much money you will make or lose when submitting a number. Your ROI is a function of the distance between your submitted number z and the actually winning number x. The distance, also called error, is computed as follows: d(x, z) = min(|x – z|, 256 – |x – z|). It is always an integer value between 0 and 128. A pre-determined ROI is attached to each of the 129 potential error values. These ROI’s characterize the type of risk that you are willing to take, and can be customized by each user, as long as the theoretical expected return (automatically computed in the ROI spreadsheet) is zero.
You will find these values in the ROI tables, available in spreadsheet format, here. Look at the second row in the spreadsheet, between column K and EI. The spreadsheet also contains 1,000 user-submitted numbers (simulations) with the ROI computed for each submitted number. Other summary statistics of interest are available in the spreadsheet: highest and lowest potential payout, chances of winning, and more.
3. Business Model and Applications
Accredited investors, hedge funds, stock trading brokers, stock exchange companies, government organizations (for instance, state lotteries and agencies interested in creating a lottery at the federal level) as well as game developers and companies in the gaming industry, are welcome to contact us. Investors potentially interested in participating in a first round of funding to create and scale this platform, and who can bring clients and/or a CEO of their choosing, are also invited. We traditionally work smart and fast, with very small efficient teams in a lean environment, with people located all over the world.
This short presentation only features the tip of the iceberg. The possibilities are endless, including the implementation of:

ROI tables that favor participating brokers over players (or the other way around),
16 or 32 bit systems offering spectacular potential returns yet no potential big loss,
Short-selling,
Sequences that are cross-correlated or auto-correlated, offered to VIP clients to help them gain a competitive advantage,
Sequences with variable ROI tables, sometimes favoring the players, and sometimes favoring the operators.

Some of these features allow players to sometimes slightly beat the official and neutral odds of winning, offering a true positive return on average for some short periods of time, at the expense of the operators. For the organization implementing these features, this can be seen as marketing costs to attract new customers. Other potential applications includes Blockchain technology, strong encryption, patent and security laws, and state-of-the-art, innovative research in statistical science, computer science, and number theory.
Let’s now look at how the money flows.
3.1. Managing the Money Flow
Managing the money involves subtracting or adding dollars to user accounts after each completed transaction. On a given day, how do we know whether on average, gains and losses will balance out, since we don’t control the numbers entered by the participants?
Actually, we don’t know. Sometimes the balance is slightly negative, sometimes slightly positive. However, by using fair ROI tables and good seeds, we are guaranteed to be flat on average. You can even compute the daily volatility resulting from the daily winning and losing transactions. Example: with 1,000 transactions in a single day, each one consisting of a $20 bet, the most conservative ROI table introduced in this presentation produces a theoretical standard deviation of $24, over a volume of $20,000. The most aggressive one produces a standard deviation of $314, still entirely manageable. These theoretical numbers have been confirmed by simulations, and are included in each ROI table, for internal use. When offering customized ROI tables, you might want to put a cap on the standard deviation being allowed.
4. Challenge and Statistical Results
We discuss here two important statistical results that make this system works. But first, let’s talk about the competition announced at the very beginning.
4.1. Data Science / Math Competition
We plan to organize a competition focusing on the public algorithm. The goal is to compute the next 200 winning numbers, using

the public algorithms described in section 2.1,
the two public seeds x(0) and y(0) provided in section 2.3.
and the 2,000 past winning numbers provided in section 2.2. 

You can use the methodology described in this article, or any other means. The award will be offered to participants providing the best insights on how to improve the robustness of our system. So it is not required to find the 200 next winning numbers to earn the award. But if you do find them, we offer a bonus. We will announce the competition by April 30, 2019. To not miss the announcement, you can sign-up to receive our newsletter.  
4.2. Important Statistical Results
Any guess regarding a winning number results in a gain or a loss depending on how close your guess z is to the winning number x. The metric used to measure the proximity between x and z is 
d(x, z) = min(|x – z|, 256 – |x – z|)
All winning numbers are integers between 0 and 255. If the winning number x was a random number, and participants make random guesses, then the distance d(x, z) would be a random variable, say Z, with the following distribution:

P(Z = 0) = 1 / 256,
P(Z = 128) = 1 / 256,
P(Z = z) = 2 / 256 if z is strictly between 0 and 128.

The money that you put on a number (your guess) is called principal, similarly to the money invested in a stock, in the stock market. Once the winning number is announced, your principal increases or decreases depending on how good your guess is. Your principal is actually multiplied by a factor G(d(x, z)) which is a function of the distance between the number you picked up, and the winning number.
The multipliers G(0), G(1), G(2) and so on, up to G(128), are known in advance and specified in the ROI table that you use. The ROI tables are fair, in the sense that the average gain for the player, is zero. In order to achieve this goal, ROI tables are designed so that

If the top multipliers offered are very high — the highest being G(0) for a correct guess — then, even though the system is fair (unbiased), the variance in gain for a single guess, is also high. This variance, assuming E(gain) = 1 and the participant pays $1 per guess, is equal to 

The total value of the portfolio that we manage, defined as the aggregated principal across all participants, is flat over time but experiences daily fluctuations. To compute its variance, use the previous formula and multiply it by the number of guesses and the cost to place a guess. The standard deviation mentioned in section 3.1. (money management) is the square root of this variance, assuming we have 1,000 guesses, each for $20.
4.3 Chance of Winning
If you start with two seeds x(0) and y(0) and use M iterations of the public algorithm in section 2.1., and assuming the successive iterations generate random numbers (they don’t, but by design, the generated numbers look almost random) then the chance of finding the K past winning numbers publicly listed (and these numbers look just as random), is of the order of

See here for details. This is actually your chance of winning, using brute force. Here, we used K = 2,000.
Originally posted here.
To not miss this type of content in the future, subscribe to our newsletter. For related articles from the same author, click here or visit www.VincentGranville.com. Follow me on on LinkedIn, or visit my old web page here.

Link: Data Science Foundations for a New Stock Market