Also available on the web as HTML (takes a few seconds to open because it’s a 30 MB single file) (IMPORTANT: Open the HTML in an INCOGNITO tab. If you open the HTML in a regular tab, you are likely to get an old version residing in cache.) https://www.ar-tiste.com/bayesuvius.html
I’ve decided to write a new (free, open source) book entitled “Bayesuvius Quantico” about quantum Bayesian networks (qbnets). This will complement my (free, open source) book “Bayesuvius” (1000 pgs) on classical Bayesian Networks (bnets). You can find the latest version here
On April 11, I announced the Mappa Mundi Causal Genomics Challenge, which involves discovering a causal DAG for the DREAM3 dataset. After 2 weeks of intense work, I have finally completed my contestant for that challenge: the open source software gene_causal_mapper (gcmap) https://github.com/rrtucci/gene_causal_mapper gcmap is an open source python program for discovering a causal Dag for genes via the Mappa Mundi (MM) algorithm. As an example, I apply it to the DREAM3 dataset for yeast.
I encourage others to submit to the public their own algorithm for deriving a causal DAG (Gene Regulatory Network) from the DREAM3 dataset. I would love to compare your network to mine.
Recently, I became interested in trying to discover a Causal DAG for the human genome (~21,000 genes), in order to find genetic causes and cures of diseases. Yeast only has ~6,000 genes so I figure I’ll start with yeast, and then, if everything goes well, move up to humans.
Discovering a genomic DAG is not a new idea. People have been trying to do this for at least 25 years. Several algorithms have been proposed in the past to do this. About a month ago, I wrote a short paper describing my own genomic DAG discovery algorithm that I want to try. I’m curious to see how my algorithm performs relative to the others.
After releasing that short paper, I hit a wall because I couldn’t find a suitable dataset with which to design and test my software. Yesterday, I finally hit pay-dirt. I found a very nice suitable dataset. I want to share links to that dataset with my readers, in case they too want to try their hand at discovering a genomic DAG. Maybe you can beat my algorithm.
I’m calling this the Mappa Mundi Challenge:
Find a causal DAG (what is called a “gene regulatory network (GRN)” in genomic parlance) for yeast, using the dataset for the DREAM3 (2008) Challenge.
DREAM challenges occur every year. In this particular one, the goal was to predict a chunk of missing data. I intend to use it for a different purpose: to discover a GRN
I’m storing here a conversation I had with Grok on March 22 in case it disappears. Back then Grok was actually quite reasonable and accepted that the evidence presented by Dr. Hans Braun (5 independent nuke signatures) definitely warranted that an international team of nuclear scientists be sent to ground zero to verify or deny that the bomb Israel dropped on Tartus on Dec.15 was a nuke. In my next post, I will record a conversation I had with Grok just 4 days later. Since then, I believe Grok has been heavily guardrailed. Now Grok is totally certain that it was a conventional bomb and there is nothing to worry about or investigate. What a difference a little guardrailing makes! Let this be a lesson to everyone on how AI is already being used to manipulate public opinion in the service of Israel, to cover genocide and war crimes.
In a previous blog post, I mentioned that I had written a paper entitled: “Discovering a Causal DAG for genes via the Mappa Mundi algorithm”. In that blog post, I requested aid from my readers to find a suitable dataset for the algorithm. Here are some suggestions I got on X-Twitter. Gennady Gorin pointed out that obtaining timeseries for gene expressions and TFs with more that two times is “incredibly difficult”. https://x.com/GorinGennady/status/1901790198217973763 This made me realize that my algo might still work if I can get a huge number of 2-time records for many genes
With the advent over the past year of the events listed below, it has become patently obvious that China now owns AI, and will eventually own most of the computer industry. I even expect them to beat the biggest bully in the block, Google, eventually. I actually welcome this because most of the big Silicon Valley AI companies (OpenAI, GoogleAI, META, Palantir, Oracle, Amazon, Apple) have actively participated in the Gaza genocide. Deepseek hasn’t. American companies have used AI for evil. They do not deserve to control such a powerful tool. They have forfeited that right. Maybe China will follow the same evil path too, but not so far.
Deepseek,
Ernie, Baidu AI
EngineAI robots matching those of Boston Dynamics in capabilities
the new device for computer chip lithography by Huawei. It threatens catch up with, or even surpass, ASML, a Dutch company with heretofore near complete monopoly in the fabrication of such multimillion dollar devices
Trump’s imminent war with Iran. While the US is busy fighting in Iran, China will take over Taiwan and Taiwan Semiconductor (TSMC). Even if Trump doesn’t attack Iran, China is determined to take over Taiwan. Once it does, TSMC will belong to them. TSMC makes most of NVIDIA’s chips. Bad news for the US empire
Intel’s imminent bankruptcy. AMD is crushing them
BYD, China’s electric vehicle (EV) manufacturer that in 2024, surpassed Tesla in number of EVs produced, even though the US has a 100% tariff on BYD cars
China’s new quantum computer named “Tianyan-504” that features a 504-qubit chip called “Xiaohong.”
China’s Starlink copy called Qianfan (also known as “Thousand Sails” or SpaceSail) wielding a huge constellation of internet satellites polluting the night sky (LOL). Qianfan still lags behind Starlink, especially in reusable rocket technology, but is advancing in leaps and bounds.
BRICS, America constitutes 4% of the population of the world. The BRICS nations constitute 56%. Not even close.
Below is a timeline focusing on the evolution of machines and technologies used to fabricate computer chips, culminating in the development of nanometer-scale transistors. This timeline emphasizes key milestones in lithography and related equipment that enabled the shrinking of transistor sizes, without delving into specific chip designs or manufacturers unless directly tied to the machinery.
I just started a public github repo https://github.com/rrtucci/gene_causal_mapper for my next open source (MIT license) software called “Gene Causal Mapper” (or gcmap for short). So far the repo contains only a 7 page paper explaining the algorithm. Here is its title and abstract:
I haven’t started writing the software yet. I first need a dataset to work with. Do you know of a good dataset for this? It must give time series for gene expression concentrations and for transcription factor concentrations.
This blog post is to inform my friends that I’ve added several new chapters to my FREE book Bayesuvius. More specifically, I’ve added chapters entitled
Dynamical Systems
Autoregulon Networks (Network Motifs)
Gene Regulatory Networks
Green’s function
Kramers Kronig Relations
Genomics vocabulary
I also wrote as a supplement to the dynamical systems chapter, a github repo called “Ode to ODEs”. It contains Python software for finding waveforms and phase portraits of dynamical systems. Lots of examples of use cases of the software are included. Believe it or not, the math of dynamical systems is very useful in the study of Causal Genomics and causal inference.
I used to think that the extent to which causal inference (CI) had been applied to Biology, was to discover the set of external traits (phenotypes) that are influenced by each gene (genotype). Like this:
How wrong I was! That type of DAG is certainly still used, but very antiquated. It dates back to friar Gregor Mendel (1822–1884) and his peas, or Sewall Wright (1889-1988) and his guinea pigs. Genomics has advanced a million miles since then. Turns out that for the last 25 years, genomicists have been trying to find a DAG that they call the complete Gene Regulatory Network (GRN). This complete GRN describes the causal relationship between all the genes of the human body (We have about 21,000 of them). Even though biologists have been successful in finding parts of the complete GRN, they are far from discovering the totality of it. Here is an example of a small GRN for rice, from the Wikipedia article on GRN.
Recently, I’ve started learning about the subject of Causal Genomics, because I find it so fascinating: imagine using Causal Inference to find the causes of diseases or to do the related task of discovering the complete GRN. An amazing voyage of imagination and discovery!
I’m starting this formidable quest from pre kindergarten level. I have a PhD in physics, but I knew practically zero about genomics when I embarked on this quest 2 months ago.
I’ve started this quest by reading a lot, and then converting the knowledge I’ve gained into the new chapters listed above.
“Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path?“, by Yoshua Bengio, Michael Cohen, Damiano Fornasiere, Joumana Ghosn, Pietro Greiner, Matt MacDermott, S¨oren Mindermann, Adam Oberman, Jesse Richardson, Oliver Richardson, Marc-Antoine Rondeau, Pierre-Luc St-Charles, David Williams-King
This paper by 13 “scientists”, headed by Turing award winner Yoshua Bengio, is highly disappointing. It proposes desiderata for a non-agentic Causal AI. Brilliant idea, except my software Mappa Mundi accomplishes all their desiderata, and has been out for 2 years. They cite hundreds of papers, but not my software and accompanying paper. 13 “experts” in the field, but no one knows about Mappa Mundi, which I’ve discussed frequently in this blog, X/Twitter and the “Causal Inference” Reddit, for 2 years. Sure. And Santa Claus exists and I talk to him everyday.
Bengio has been pushing for an Agentic AI (i.e., one using RL) which he calls Generative Flow Diagrams https://arxiv.org/abs/2106.04399 since 2021, and it looks like he has finally concluded after 4 years that it’s a really bad idea. Of course it is. RL does not factor cleanly the DAGs it discovers from the NN. It does not extract them and store them in a library (what I call a DAG atlas) for future reuse, and for exporting to other AIs. The ability to do this has been one of the main selling points for Mappa Mundi since its inception 2 years ago.
For more information about Causal AI, check out my FREE open source book Bayesuvius (1,000 pages).
DeepSeek is amazing in that it is Open Source (MIT license) and it has reduced the cost of doing AI by 95%. However, it is far from perfect. DeepSeek is being promoted as a Causal AI genius. I strongly disagree. DeepSeek uses CoT (Chain of Thought). This method has many flaws. For example, it doesn’t store the DAGs it learns for future reuse, and it totally forgoes the rich toolset that Pearl, Rubin and many others have developed for doing Causal Inference over the last 50 years. My software Mappa Mundi (MIT License too) overcomes these 2 flaws. Do you think DeepSeek and LLMs in general are a good tool now or will be in the future for doing Causal Inference? How?
I’m considering writing a chapter on Causal Genomics (CG) for my book Bayesuvius. Unfortunately, my PhD is in physics so I know approx zero about genomics. Are there any people in this Reddit that work in CG and would care to share their personal opinion on what are the most important papers so far in CG? Also, are there any pedagogical materials intended to teach someone, starting from scratch, all he/she needs to learn to understand a paper in CG?
Just prepared this slide for a presentation. Très magnifique.
How Mappa Mundi (free, open source, MIT license) and all humans distinguish between correlation and causation, said with a single picture that even an 8 year old can understand, and say: “I knew that. I’ve been doing that all my life”.
Probability that ice cream causes sharks is low so no arrow from ice cream to sharks in DAG. Probability that rain causes green sprouts is high. So arrow from rain to green sprouts in DAG.