The post Could Machine Learning Practitioners Prove Deep Math Conjectures? appeared first on TheDataGrab.

]]>Not that we could end up proving the Riemann hypothesis or other problems of the same caliber and popularity: the short answer is no. But we might think of a different path, a potential new approach to tackle these problems, and discover new theories, models, and techniques along the way, some applicable to data analysis and real business problems. And sharing our ideas with professional mathematicians could have benefits for them and us. Working on these problems during our leisure time could also benefit our machine learning career if anything. In this article, I elaborate on these various points.

**The less math you learned, the more creative you could be**

Of course, this is true only up to some extent. You need to know much more than just high school math. When I started my Ph.D. studies and asked my mentor if I should attend some classes or learn material that I knew was missing in my education, his answer was no: he said that the more you learn, the more you can get stuck in one particular way of thinking, and it can hurt creativity.

He meant to say that acquiring deep vertical knowledge too fast, may not help; of course, acquiring horizontal knowledge in various relevant fields broadens your horizon and can be very useful.

That said, you still need to know a minimum (that is, acquiring a decent, deep enough vertical knowledge about the problem you are trying to solve), and these days it is very easy to self-learn advanced math by reading articles, using tools such as OEIS or Wolfram Alpha (Mathematica) and posting questions on websites such as MathOverflow (see my profile and my posted questions here), which are frequented by professional, research-level mathematicians.

The drawback of not reading the classics (you should read them) is that you are bound to reinvent the wheel time and over, though in my case, that’s the best way I learn new things. In addition to re-inventing the wheel, your knowledge will have big gaps, and it will show up.

Professionals with a background in physics, computer science, probability theory, statistics, pure math, or quantitative finance, may have a competitive advantage. Most importantly, you need to be passionate about your private research, have a lot of modesty, perseverance, and patience as you fill face many disappointments, and not expect fame or financial rewards – in short, not any different than starting a Ph.D. program.

Some companies like Google may allow you to work on pet projects, and experimental research in number theory geared towards applications may fit the bill. After all, some of the people who computed trillions of digits of the number Pi (and analyzed them) did it during their tenure at Google, and in the process contributed to the development of high-performance computing. Some of them also contributed to deepening the field of number theory.

In my case, it was never my goal to prove any big conjecture. I stumbled time and over upon them while working on otherwise unrelated math projects. It piqued my interest, and over time, I spent a lot of energy trying to understand the depth of these conjectures and why they may be true.

And I got more and more interested in trying to pierce their mystery. This is true for the Riemann hypothesis (RH), a tantalizing conjecture with many implications if true, and relatively easy to understand. Even quantum physicists have worked on it and obtained promising results. I know I will never prove RH, but if I can find a new direction to prove it, that is all I am asking for.

Then I will work with mathematicians who know much more than I do if my scenario for proof is worth exploring and enroll them to work on my foundations (likely to involve brand new math). The hope is that they can finish a work that I started myself, but that I can not complete due to my somewhat limited mathematical knowledge.

In the end, many top mathematicians made stellar discoveries in their thirties, out-performing their peers that were 30 years older even though their knowledge was limited because of their young age. This is another example that if you know too much, it might not necessarily help you.

Note that to get a job, “the less you know, the better” does not work, as employers expect you to know everything that is needed to work properly in their company. You can and should continue to learn a lot on the job, but you must master the basics just to be offered a job, and to be able to keep it.

**What I learned from working on these math projects: the benefits**

To begin with, not being affiliated with a professional research lab or academia has some benefits: you don’t have to publish, you choose your research project yourself, you work at your own pace (it better be much faster than in academia), you don’t have to face politics, and you don’t have to teach.

Yet you have access to similar resources (computing power, literature, and so on). You can even teach if you want to; in my case, I don’t really teach, but I write a lot of tutorials to get more people interested in the subject, and I will probably self-publish books in the future, which could become a source of revenue.

My math questions on MathOverflow get a lot of criticism and some great answers too, which serves as peer-review, and readers even point me to some literature that I should read, as well as new, state-of-the-art yet unpublished research results. On occasions, I correspond with well-known university professors, which further helps me not go in the wrong direction.

The top benefit I’ve found working on these problems is the incredible opportunities it offers to hone your machine learning skills. The biggest data sets I ever worked on come from these math projects.

It allows you to test and benchmark various statistical models, discover new probability distributions with applications to real-world problems (see this example), new visualizations (see here), develop new statistical tests of randomness and new probabilistic games (see here), and even discover interesting math theory, sometimes truly original: for instance complex random variables with applications (see here), lattice points distribution in the infinite-dimensional simplex (yet unpublished), or advanced matrix algebra asymptotics (infinite matrices, yet unpublished, but similar to this article) and a new type of Dirichlet functions. Still, 90% of my research never gets published.

The post Could Machine Learning Practitioners Prove Deep Math Conjectures? appeared first on TheDataGrab.

]]>The post The Machine Learning Process in 7 Steps appeared first on TheDataGrab.

]]>In small companies, you may be involved in all the steps. Here the focus is on large projects, such as developing a taxonomy, as opposed to ad-hoc or one-time analyses. I also mention all the people involved, besides machine learning professionals.

**Steps involved in machine learning projects**

In chronological order, here are the main steps. Sometimes it is necessary to recognize errors in the process and move back and start again at an earlier step. This is by no mean a linear process, but more like trial and error experimentation.

**1**. **Defining the problem** and the metrics (also called features) that we want to track. Assessing the data available (internal and third-party sources) or the databases that need to be created, as well as database architecture for optimum storing and processing. Discuss cloud architectures to choose from, data volume (potential future scaling issues), and data flows.

Do we need real-time data? How much can safely be outsourced? Do we need to hire some staff? Discuss costs, ROI, vendors, and timeframe. Decision-makers and business analysts are heavily involved, and data scientists and engineers may participate in the discussion.

**2. Defining goals** and types of analyses to be performed. Can we monetize the data? Are we going to use the data for segmentation, customer profiling, and better targeting, to optimize some processes such as pricing or supply chain, for fraud detection, taxonomy creation, to increase sales, for competitive or marketing intelligence, or to improve the user experience for instance via a recommendation engine or better search capacities? What are the most relevant goals? Who will be the main users?

**3**. **Collecting the data**. Assessing who has access to the data (and which parts of the data, such as summary tables versus life databases), and in what capacity. Here privacy and security issues are also discussed.

The IT team, legal team, and data engineers are typically involved. Dashboard design is also discussed, to design good dashboards for end-users such as decision-makers, product or marketing team, or customers.

**4. Exploratory data analysis**. Here data scientists are more heavily involved, though this step should be automated as much as possible. You need to detect missing data and how to handle it (using imputation methods), identify outliers and what they mean, summarize and visualize the data, find erroneously coded data and duplicates, find correlations, perform preliminary analyses, find best-predicting features and optimum binning techniques (see section 4 in this article). This could lead to the discovery of data flaws and may force you to revisit and start again from a previous step, to fix any significant issue.

**5. The true machine learning/modeling** step. At this point, we assume that the data collected is stable enough, and can be used for its original purpose. Predictive models are being tested, neural networks or other algorithms/models are being trained with goodness-of-fit tests and cross-validation.

The data is available for various analyses, such as post-mortem, fraud detection, or proof of concept. Algorithms are prototyped, automated, and eventually implemented in production mode. Output data is stored in auxiliary tables for further use, such as email alerts or to populate dashboards. External data sources may be added and integrated. At this point, major data issues have been fixed.

**6. Creation of end-user platform**. Typically, it comes as dashboards featuring visualizations and summary data that can be exported in standardized formats, even spreadsheets. This provides the insights that can be acted upon by decision-makers. The platform can be used for A/B testing. It can also come as a system of email alerts sent to decision-makers, customers, or anyone who needs to be informed.

**7. Maintenance**. The models need to be adapted to changing data, changing patterns, or changing definitions of core metrics. Some satellite database tables must be updated, for instance, every six months. Maybe a new platform able to store more data is needed, and data migration must be planned.

Audits are performed to keep the system sound. New metrics may be introduced, as new sources of data are collected. Old data may be archived. Now we should get a good idea of the long-term yield (ROI) of the project, what works well and what needs to be improved.

The post The Machine Learning Process in 7 Steps appeared first on TheDataGrab.

]]>The post Machine Learning Perspective on the Twin Prime Conjecture appeared first on TheDataGrab.

]]>Here I discuss the results of my experimental math research, based on big data, algorithms, machine learning, and pattern discovery. The level is accessible to all machine learning practitioners. I first discuss my experimentations in section 1, and then how it relates to the twin prime conjecture, in section 2. In section 3, I discuss a generalization. Mathematicians may be interested as well, as it leads to a potential new path to prove this conjecture. But machine learning readers with little time, not curious about the link to the mathematical aspects, can read section 1 and skip section 2.

I do not prove the twin prime conjecture (yet). Rather, based on data analysis, I provide compelling evidence (the strongest I have ever seen), supporting the fact that it is very likely to be true. It is not based on heuristic or probabilistic arguments (unlike this version dating back to around 1920), but on hard counts and strong patterns.

This is not different from analyzing data and finding that smoking is strongly correlated to lung cancer: the relationship may not be causal as there might be confounding factors. To prove causality, more than data analysis is needed (in the case of smoking, of course, causality has been firmly established long ago.)

**1. The Machine Learning Experiment**

We start with the following sieve-like algorithm. Let *SN* = { 1, 2, …, *N *} be the finite set consisting of the first *N* strictly positive integers, and *p* be a prime number. Let *Ap* be a strictly positive integer, smaller than *p*. Remove from *SN* all the elements of the form *Ap*, *p* + *Ap*, 2*p* + *Ap*, 3*p* + *Ap*, 4*p* + *Ap* and so on. After this step, the number of elements left will be very close to *N* (*p* – 1) / *p* = *N* (1 – 1/*p*). Now, remove all elements of the form *p* – *Ap*, 2*p* – *Ap*, 3*p* – *Ap*, 4*p* – *Ap,* and so on. After this step, the number of elements left will be very close to N (1 – 2/p). Now pick up another prime number q and repeat the same procedure. After this step, the number of elements left will be very close to N (1 – 2/p) (1 – 2/q), because p and q are co-prime (because they are prime, to begin with.)

If you repeat this step for all prime numbers *p* between *p* = 5 and *p* = *M* (assuming *M* is a fixed prime number much smaller than *N*, and *N* is extremely large and you let *N* tends to infinity) you will be left with several elements that is still very close to

where the product is over prime numbers only.

Let us introduce the following notations:

*S*(*M*,*N*) is the set left after removing all the specified elements, using the above algorithm, from*SN**C*(*M*,*N*) is the actual number of elements in*S*(*M*,*N*)*D*(*M*,*N*) =*P*(*M*,*N*) –*C*(*M*,*N*)*R*(*M*,*N*) =*P*(*M*,*N*) /*C*(*M*,*N*)

In the context of the twin prime conjecture, the issue is that *M* is a function of *N*, and the above very good approximation, that is, replacing *C*(*M*, *N*) with P(M, N), is no longer good. More specifically, in that context, M = 6 SQRT(N), and Ap = INT(p/6 + 1/2) where INT is the integer part function. The ratio R(M, N) would still be very close to 1 for most choices of Ap, assuming M is not too large compared to N, unfortunately, Ap = INT(p/6 + 1/2) is one of the very few for which the approximation fails. On the plus side, it is also one of the very few that leads to a smooth, predictable behavior for R(M, N). This is what makes me think it could lead to proof of the twin prime conjecture. Note that if M is very large, much larger than N, say M = 6N, then C(M, N) = 0, and thus *R*(*M*, *N*) is infinite.

Below is a plot displaying *D*(*M*, *N*) at the top, and *R*(*M*, *N*) at the bottom, on the Y-axis, for *N* = 400,000 and *M* between 5 and 3,323 on the X-axis. Only prime values of *M* are included, and *Ap* = INT(*p*/6 + 1/2).

It shows the following patterns:

- For small values of
*M*,*R*(*M*,*N*) is very close to 1. - Then as
*M*increases,*R*(*M*,*N*) experiences a small dip, followed by a maximum at some location*M*0 on the X-axis. Then it smoothly decreases well beyond the critical value*M*1 = 6 SQRT(*N*). It reaches a minimum at some location*M*2 (not shown in the plot) followed by a rebound, increasing again until*M*3 = 6*N*, where*R*(*M*,*N*) is infinite. The value of*M*0 is approximately 3 SQRT(*N*) / 2.

To prove the twin prime conjecture, all is left if the following: proving that *M*0 < *M*1 (that is, the peak always takes place before *M*1, regardless of *N*) and that *R*(*M*0, *N*), as a function of *N*, does not grow too fast. It seems the growth is logarithmic, but even if *R*(*M*0, *N*) grows as fast as *N* / (log *N*)^3, this is slow enough to prove the twin prime conjecture. Detailed explanations are provided in section 2.

The same patterns are also present if you try other values of *N*. I tested it for various *N*‘s, ranging from *N* = 200 to *N* = 3,000,000. The higher *N*, the smoother the curve, the stronger the patterns. It also occurs with some other peculiar choices for *Ap*, such as *Ap* = INT(*p*/2 + 1/2) or *Ap *= INT(*p*/3 + 1/2), but not in general, not even for *Ap* = INT(*p*/5 + 1/2).

Surprisingly, the curve is so smooth, given the fact that we work with prime numbers, which behave somewhat chaotically. There has to be a mechanism that causes this unexpected smoothness. A mechanism that could be the key to proving the twin prime conjecture. More about this in section 2.

**2. Connection to the Twin Prime Conjecture**

If *M* = 6 SQRT(*N*) and *Ap* = INT(*p*/6 + 1/2), then the set *S*(*M*, *N*) defined in section 1, contains only elements *q* such that 6*q* – 1 and 6*q* + 1 are twin primes. This fact is easy to prove, see here. It misses a few of the twin primes (the smaller ones) but this is not an issue since we need to prove that *S*(*M*, *N*), as *N* tends to infinity, contains infinitely many elements. The number of elements in *S*(*M*, *N*) is denoted as *C*(*M*, *N*).

Let us define *R*1(*N*) = *R*(*M***1**, *N*) and *R*0(*N*) = *R*(*M*0, *N*). Here *M*1 = 6 SQRT(*N*) and *M*0 are defined in section 1, just below the plot. To prove the twin prime conjecture, one has to prove that *R*1(*N*) < R0(*N*) and that *R*0*(N*) does not grow too fast, as *N* tends to infinity.

The relationship *R*1(*N*) < *R*0(*N*) can be written as *P*(*M*1, *N*) / *R*0(*N*) < *C*(*M*1, *N*). If the number of twin primes is infinite, then *C*(*M*1, *N*) tends to infinity as *N* tends to infinity. Thus if *P*(*M*1, *N*) / *R*0(*N*) also tends to infinity, that is, if *R*0(*N*) / *P*(*M*1, *N*) tends to zero, then it would prove the twin prime conjecture. Note that *P*(*M*1, *N*) is asymptotically equivalent (up to a factor not depending on *N*) to *N* / (log *M*1)^2, that is, to *N* / (log *N*)^2. So if *R*0(*N*) grows more slowly than (say) *N* / (log *N*)^3, it would prove the twin prime conjecture. Empirical evidence suggests that *R*0(*N*) grows like log *N* at most, so it looks promising.

The big challenge here, to prove the twin prime conjecture, is that the observed patterns (found in section 1 and used in the above paragraph), however strong they are, maybe very difficult to formally prove. Indeed, my argumentation still leaves open the possibility that there are only a finite number of twin primes: this could happen if *R*0(*N*) grows too fast.

The next step to make progress would be to look at small values of *N*, say *N* = 100, and try to understand, from a theoretical point of view, what causes the observed patterns. Then try to generalize to any larger *N* hoping the patterns can be formally explained via mathematical proof.

The table below summarizes the main results of my computations. It is available here.

Note that if *M*1 = 6 SQRT(*N*), then the set *S*(*M*1, *N*) is a subset of the following sequence: A002822. In particular, if *N* = 3,068,200, then *S*(*M*1, *N*) contains all the 99,998 elements of A002822 (mapping to the first 99,998 twin primes if you ignore {3, 5}) up to 3,068,165, except for the first 215 entries. Thus *C*(*M*1, *N*) = 99,998 – 215 = 99,783 as shown in the above table. If *M*1 < 6 SQRT(*N*), then *S*(*M*1, *N*) not only misses more elements of A002822, but it also includes elements that are not in A002822. Thus the reason to call *M*1 = 6 SQRT(*N*) the critical point. The last element *q* = 3,068,165 corresponds to the twin primes 6*q* – 1 = 18,408,989 and 6*q* + 1 = 18,408,991. See also here.

**3. Generalization**

The concepts discussed here also apply to cousin primes, sexy primes, prime numbers, and other related-prime numbers. This section is still under construction. In the meanwhile, I invite you to check my latest update on this topic, on MathOverflow,

The post Machine Learning Perspective on the Twin Prime Conjecture appeared first on TheDataGrab.

]]>The post Machine Learning Career: Pros and Cons of Having a PhD appeared first on TheDataGrab.

]]>A PhD may command a slightly higher salary initially, and may be required for a position in a research lab (whether private or government-operated). But for many positions, it may not bring an advantage. Corporate work can be mundane and fast-paced, and the search for perfect algorithms is discouraged, as it hurts ROI.

In many companies, a solution close to 80% of perfection is good enough, and requires far less time than reaching 99% perfection, especially since the machine learning models employed are just an approximation of the reality. People with a PhD are not well prepared for that.

**The Cons**

Here are some of the negative aspects.

- You may spend several years of your life working on your PhD, possibly in a stressful environment, with low pay, delaying buying a home, or getting married. Meanwhile, you see your non-PhD friends ahead of you in their personal life. If you married when working on your PhD, this could eliminate some of these problems.
- Some recruiters may say that you are over-qualified, that your experience is not really relevant to the job you are applying for (or too specialized), and that adapting to a fast-paced corporate environment might be challenging.
- If you land a job in the corporate world, you might find it menial or boring. You could be disappointed that the research you did during your PhD years is a thing of the past, not leading to anything else. This is especially true if your hope was to get a tenured position in the academia, but can’t get one despite your very strong credentials, due to the fierce competition. It can bring long-lasting regrets and nostalgia.
- You may be lacking some coding skills (SQL in particular), which put you at a disadvantage against a candidate with an applied master. Of course, it is always possible and desirable to gain these skills on your own (or via data camps) when working on your PhD.
- Your salary might not be higher than that of a younger candidate with a master degree and the right experience. Your cumulative wealth over your lifetime may be lower.
- Some employers (Google, Facebook, Microsoft, Wall Street, or defense-related companies) routinely hire PhD’s to work on truly exciting projects. Some only hire from top universities and if your PhD was not from an ivy-league, you will be by-passed. That said, there are plenty of companies that will hire non ivy-league candidates, and I think that’s a smart move. After all, I earned my PhD in some unknown university, and eventually succeeded in the corporate world.

**The Pros**

For some, the pros outweigh the cons by a long shot. This was my case. I provide a few examples below.

- If your PhD was very applied in a hot field (in my case in 1993, processing digital satellite images for pattern detection), you learned how to code, played with a lot of messy data, and even got part-time job in the corporate world, related to your thesis when working on it, then you are up to a good start.
- In my case, solid funding for the research, and even data sets, came from governmental agencies (EU and others) and private companies (Total, for instance) trying to solve real problems. This adds credibility to your PhD experience. On the downside, my mentor was not a great scholar, but a good salesman able to attract many well paid contracts.
- If you earned your PhD abroad like I did, it is quite possible that you were paid better than your peers in US. In my case, my salary, as a teaching assistant, was similar to that of a high school teacher.
- And conference attendance (worldwide) was paid by the university or by the agencies that invited me as a speaker. Coming from abroad is sometimes perceived as an advantage, due to showing cultural adaptation, and in most cases, being multilingual and able to easily relocate in various locations if corporate needs ask for it.
- You can still continue to do your research, decades after leaving academia. I still write papers and books to this day. The level is even higher than during my PhD years, but the style and audience is very different, as I try to present advanced results, written in simple English, to a much larger audience. I find this more rewarding than publishing in scientific journals, read by very few, and obfuscated in jargon.
- There are great positions in many research labs, private or government, available only to PhD applicants. The salary can be very competitive.
- VC funding is usually contingent to having a well-known PhD scientist on staff, for startup companies. So if you create your own startup, or work for one, a PhD is definitely an advantage.
- Even when I started my own, self-funded publishing / media company (acquired by Tech Target in 2020, and focusing on machine learning), my wife keeps reminding me that I would have had considerably less success without my education, even though you don’t legally need any degree or license to operate this kind of business.

**Conclusions**

Having a PhD can definitely offer a strong advantage. It depends on the subject of your thesis, where you earned your PhD, and if you worked on real-life problems relevant to the business world. More theoretical PhD’s can still find attractive jobs in various research labs, private or government.

The experience may be more rewarding, and probably less political, than a tenured position in academia. It goes both ways: it is not unusual for someone with a pure corporate / business background, to make a late career move to academia, sometimes in a business-related department. Or combining both: academia and corporate positions at the same time.

I wrote an article in 2018, about how to improve PhD programs to allow for an easy transition to the business world. I called it a doctorship program, and you can read about it here. I will conclude by saying that another PhD scientist, who earned his PhD in the same unknown math department as me at the same time (in Belgium), ended up becoming an executive at Yahoo, after a short stint (post-doc) at the MIT, working on transportation problems.

His name is Didier Burton. Another one (Michel Bierlaire), same year, same math department, also with a short post-doc stint at MIT (mine was at Cambridge University), never got a corporate job, but he is now an happy full professor at EPFL. Also, a Data Science Central intern (reporting to me), originally from Cuba and with very strong academic credentials (PhD, Columbia University, EPFL) got his first corporate job after his internship with us (I strongly recommended him). Despite a mixed academic background in physics and biology, he is now chief data scientist of a private company. His name is Livan Alonso.

The post Machine Learning Career: Pros and Cons of Having a PhD appeared first on TheDataGrab.

]]>The post Is Machine Learning an Art, a Science or Something Else? appeared first on TheDataGrab.

]]>We need to start by defining what machine learning is, or more precisely, what kind of work it entails. I broke it down into three types of activities, corresponding to different types of machine learning professionals or job titles. Many practitioners spend some amount of time, in various proportions, on any of these activities.

**Level 1**: Professionals in this category are end-users of machine learning platforms or software, and typically their coding abilities are limited, and rarely needed. They use the tools as black boxes, and may not even know (or only superficially) the details of the techniques involved.- They have the ability to interpret the output of the platforms that they use, to fine-tune parameters, and the ability to compare the performance of various platforms and techniques. Examples include business analysts, or software engineers asked to integrate algorithms developed by data scientists (those in level 3), into production mode.
**Level 2**: In this category, I include people using machine learning tools and platforms, with a serious understanding as to how they work, and typically interacting with these platforms as builders and developers, mastering some programming languages to interact with these tools in the most efficient way.- They typically don’t build new, complex algorithms, from scratch. But they know how to use existing ones to address the problems at stake, within the framework of the platforms that they use.
**Level 3**: These people may not necessarily know that much about the tools mentioned in the previous levels, as they develop their own algorithms from scratch, typically to solve new problems not properly solved by the above platforms.- They master some programming languages and their libraries (usually including Python), and are experts in algorithm design and optimization.

There can be art involved at any level. However, I prefer to use the word craftsmanship. As opposed to art which serves no purpose here except beauty, craftsmanship is the quality of design and work shown in something made by hand. It must serve a practical purpose, and it is usually shown as the signature of a professional proud to deliver high-quality work, be it a piece of furniture or a piece of code.

At level 1, craftmanship shows as mastering the platform at a high level, sometimes even better than those who created it, being able to leverage it in astute ways, and discovering tricks that few are aware of, to further optimize the work. At level 2, it can mean writing beautiful code, not for its beauty but for the way other people are going to see it: achieving remarkable performance with some astute, elegant code nobody thought about it before. It can mean understanding the inners of these platforms and being able to circumnavigate their inherent glitches.

At level 3, it could mean designing an algorithm that can extract more information from data than you would theoretically be expected to, based on the law of entropy. For instance designing unexpectedly sharp confidence intervals that nobody thought could be possible (see my book on new statistical foundations, page 132, and available here).

**What about machine learning as an art?**

Since art in this context does not generally bring added value, you are going to see craftsmanship more so than pure art. That said, very talented professionals may decide to bring their work to a whole new level and secretly incorporate art into it (much better than incorporating back doors, or unethical biases!) Nobody may recognize the art hidden in their work for a long time, and they will never gain a financial advantage from it, but instead, they gain the personal satisfaction of being able to deliver high-quality work on time, yet incorporating art into it.

Some of the art can be recognized sometimes, such as beautiful visualizations (see here) that have practical applications, or mathematical formulas, see for instance here.

The picture at the top of this article can be found here. It represents the energy of electrons in an atomic lattice, and it is known as the butterfly fractal. It is also associated with special continued fractions. The author, Douglas Hofstadter, a physicist, is the one who wrote the masterpiece book *GÃ¶del, Escher, Bach*, published initially in 1979.

In his book, he claims some of his friends see this image as a picture of God. The book is all about AI, and everyone interested in AI should read it, despite being published for the first time about 40 years ago. In 1988, I proved one of the recursions mentioned in his book, see my article in Journal of Number Theory, here. I would also consider the typesetting system LaTex, created by Donald Knuth (famous computer scientist), as another piece of art, with very practical applications.

Then there is another way machine learning is related to arts. Music, paintings, videos (movies), and even culinary masterpieces, can be designed by AI, whether or not the code that produces these artistic creations, is dull, or artistically written.

The post Is Machine Learning an Art, a Science or Something Else? appeared first on TheDataGrab.

]]>The post New Machine Learning Optimization Technique – Part I appeared first on TheDataGrab.

]]>It even works if the function has no root: it will then find minima instead. In order to work, some constraints must be put on the parameters used in the algorithm, while avoiding over-fitting at the same time. This would also be true true anyway for the smooth, continuous, differentiable case. It does not work in the classical sense where an iterative algorithm converges to a solution.

Here the iterative algorithm always diverges, yet it has a clear stopping rule that tells us when we are very close to a solution, and what to do to find the exact solution. This is the originality of the method, and why I call it new. Our technique, like many machine learning techniques, can generate false positives or false negatives, and one aspect of our methodology is to minimize this problem.

Applications are discussed, as well as full implementation with results, for a real-life, challenging case, and on simulated data. This first article in this series is a detailed introduction. Interestingly, we also show an example where a continuous, differentiable function, with a very large number of wild oscillations, benefit from being transformed into a non-continuous, non-differentiable function and then use our non-standard technique to find roots (or minima) as the standard technique fails. For the moment, we limit ourselves to the one-dimensional case.

**1. Example of problem**

In the following, *a* = 7919 x 3083 is a constant, and *b* is the variable. We try to find the two roots *b* = 7919 and *b* = 3083 (both prime numbers) of the function *f*(*b*) = 2 – cos(2Ï*b*) – cos(2Ï*a*/*b*). We will just look between *b* = 2000 and *b* = 4000, to find the root *b* = 3083. This function is plotted below, between *b* = 2900 and *b* = 3200. The X-axis represents *b*, the Y-axis represents *f*(*b*).

Despite the appearances, this function is well behaved, smooth, continuous and differentiable everywhere between *b* = 2000 and *b* = 4000. Yet, it is no surprise that classical root-finding or minimum-finding algorithms such as Newton-Raphson (see here) fail, or require a very large number of iterations to converge, or require to start very close to the unknown root, and are thus of no practical value.

In this example, clearly *f*(*b*) = 0 in the interval 2000 < b < 4000 (and this is also the minimum possible value) if and only if *b* divides *a*. In other to solve this problem, we transformed *f* into a new function *g*, which despite being unsmooth, lead to a much faster algorithm.

The new function *g*, as well as its smoothed version *h*, are pictured below (*g* is in blue, *h* is in red). In this case, our method solves the factoring problem (factoring the number *a*) in relatively few iterations, potentially much faster than sequentially identifying and trying all the 138 prime divisors of *a* that are between 2000 and 3083.

However, this is not by far the best factoring algorithm as it was not designed specifically for that purpose, but rather for general purpose. In a subsequent article part of this series, we apply the methodology to data that behaves somehow like in this example, but with random numbers: in that case, it is impossible to “guess” what the roots are, yet the algorithm is just as efficient.

**2. Fundamentals of our new algorithm**

This section outlines the main features of the algorithm. First, you want to magnify the effect of the root. In the first figure, roots (or minima) are invisible to the naked eye, or at least undistinguishable from many other values that are very close to zero.

To achieve this goal (assuming *f* is positive everywhere) replace a suitably discretized version of *f*(*x*) by *g*(*x*) = log(*Îµ* + *Î»f*(*x*)), with *Îµ* > 0 close to zero. Then, in order to enlarge the width of the “hole” created around a root (in this case around *b* = 3083), you use some kind of moving average, possibly followed by a shift on the Y-axis.

The algorithm then proceeds as fixed-point iterations: *bn*+1 = *bn* + *Î¼* *g*(*bn*). Here we started with *b*0 = 2000. Rescaling is optional, if you want to keep the iterates bounded. One that does this trick here, is *bn*+1 = *bn* + *Î¼ g*(*bn*) / SQRT(*bn*). Assuming the iterations approach the root (or minima) from the right direction, once it hits the “hole”, the algorithm emits a signal, but then continue without ever ending, without ever converging.

You stop when you see the signal, or after a fixed number of iterations if no signal ever shows up. In the latter case, you just missed the root (the equivalent of a false negative).

The signal is measured as the ratio (*bn – bn-*1) / (*bn*+1 – *bn*) which dramatically spikes just after entering the hole, depending on the parameters. In some cases, the signal may be weaker (or absent or multiple signals), and can result in false positives.

Even if there is no root but a minimum instead, as in the above figure, the signal may still be present. Below is a picture featuring the signal, occurring at iteration *n* = 22: it signals that *b*21 = 3085.834 is in close vicinity of the root *b* = 3082. The X-axis represents the iteration number in the fixed point algorithm. How close to a root you end up is determined by the size of the window used for the moving average.

The closest to my methodology, in the literature, is probably the discrete fixed point algorithm, see here.

**3. Details**

All the details will be provided in the next articles in this series. To not miss them, you can subscribe to our newsletter, here. We will discuss the following:

- Source code and potential applications (e.g. Brownian bridges)
- How to smooth chaotic curves, and visualization issues (see our red curve in the second figure – we will discuss how it was created)
- How to optimize the parameters in our method without overfitting
- How to improve our algorithm
- How we used only local optimization without storing a large table of
*f*or*g*values, yet finding a global minimum or a root (this is very useful if your target interval to find a minimum or root is very large, or if each call to*f*or*g*is very time consuming) - A discussion on how to generalize the modulus function to non-integer numbers, and investigate the properties of modulus functions for real numbers, not just integers.

The post New Machine Learning Optimization Technique – Part I appeared first on TheDataGrab.

]]>