2. 5hmC is the Newest Biomarker for the Early Detection of Cancer
The Federal Drug Administration (FDA) recently granted Bluestar Genomics, a private company, a Breakthrough Device Designation for its liquid biopsy-based pancreatic cancer screening test. Stanford University researchers focusing on precision epigenomics founded Bluestar Genomics. Unlike those from GRAIL and Exact Sciences (EXAS), Bluestar’s testsfocus on a biomarker called 5-hydroxymethylcytosine or 5hmC. 5hmC signals an epigenetic change in the genome that could inform oncologists to the presence of cancer.
Epigenetic changes are chemical modifications to the surface of DNA that alter not the spelling of the genome but the body’s interpretation of its own DNA. ARK has written about 5mC, a more common epigenetic change that powers GRAIL’s tests.
Interestingly, 5mC and the rarer 5hmC are related. Like other biochemical processes, DNA methylation (and de-methylation) involves a series of steps. As shown here, 5hmC is an intermediate reaction in a cyclical process. Bluestar’s researchers note that, unlike other reaction intermediaries like 5-fC, 5hmC is more stable in the body. While 5mC seems to repress gene expression, many researchers believe that 5hmC enhances gene expression.
In another paper, Bluestar demonstrated that 5hmC consolidates near functional and coding regions of the genome (exons) across dozens of tissue types, supporting the hypothesis that 5hmC plays an important role in activating tissue-specific genes. Bluestar noted, for example, that 5hmC was present in key cancer pathways like EGFR, Notch, and BRCA.
In our view, the clustering of 5hmC in functional genomic regions could accelerate the training of machine learning models. That said, we still have much to learn about this biomarker’s role in disease progression. Sequencing 5hmC, for example, is difficult. A few companies own most of the intellectual property enabling next generation sequencers to read 5hmC.
3. GPT-3 Is Generating 4.5 billion Words Per Day
In a recent blog post, OpenAI announced that its GPT-3 autoregressive language model is generating 4.5 billion words per day across 300+ commercial applications, an important milestone for deep learning. According to artificial intelligence (AI) experts, GPT-3, or Generative Pre-trained Transformer 3, is the most powerful language model ever built.
With 175 billion parameters trained on roughly 500 billion words, the model uses deep learning for a diverse set of text-generation tasks. Given a list of ingredients, for example, GPT-3 can generate a recipe. Alternatively, it can serve as a chatbot guiding customer support or turning text commands into SQL code.
Research indicates that the size of a neural network is important to the performance of language models. The human brain, for example, has at least 100 trillion synapses – the brain’s version of a parameter. GPT-2, GPT-3’s predecessor, had 1.5 billion parameters, while most other language models have fewer than 1 billion parameters. In other words, GPT-3 is at least 115x larger than its predecessors but only 0.175% the size of the human brain.
While the value of additional parameters and training data is likely to asymptote over time, the point of diminishing returns seems to be far in the future. The near-term challenge is finding massive amounts of training data. GPT-3’s training data included most of the text available on the internet. Now, creative researchers are compiling data by converting the audio from podcasts and videos. Enabling this research, for example, Spotify recently released audio-text data from 100,000 podcasts.
4. Bitcoin’s On-Chain Data Can Analyze Buyer and Seller Behavior
In our recently published blog, we introduce a framework to analyze bitcoin’s fundamentals in more depth than is possible, we believe, for traditional assets. We characterize the depth with a three-layered pyramid, the lower layers serving as building blocks for the higher layers, as shown below.
In Part 1, we detailed the data in the bottom layer of the pyramid, assessing the health of Bitcoin’s network. In Part 2, we focus on the data in the middle layer, assessing bitcoin holders’ positions and cost bases.