So the same concept of an LLM training and inferring tokenized language, it’s doing tokenized aminos. Instead of artificial intelligence/language it’s doing artificial evolution/life I guess?
Legend2440
If I’m understanding this right:
1. They have a protein model similar to AlphaFold
2. A biotech startup used this model to engineer a protein that converts adult cells into stem cells, at a higher efficiency than existing techniques. (But still only a tiny fraction of cells convert)
Application to life extension seems speculative.
show comments
biophysboy
> We initialized it from a scaled-down version of GPT‑4o to take advantage of GPT models’ existing knowledge, then further trained it on a dataset composed mostly of protein sequences, along with biological text and tokenized 3D structure data, elements most protein language models omit.
> A large portion of the data was enriched to contain additional contextual information about the proteins in the form of textual descriptions, co-evolutionary homologous sequences, and groups of proteins that are known to interact.
These bits made me wonder what would have happened if they had only used the supplementary biological data with an untrained LLM model.
So the same concept of an LLM training and inferring tokenized language, it’s doing tokenized aminos. Instead of artificial intelligence/language it’s doing artificial evolution/life I guess?
If I’m understanding this right:
1. They have a protein model similar to AlphaFold
2. A biotech startup used this model to engineer a protein that converts adult cells into stem cells, at a higher efficiency than existing techniques. (But still only a tiny fraction of cells convert)
Application to life extension seems speculative.
> We initialized it from a scaled-down version of GPT‑4o to take advantage of GPT models’ existing knowledge, then further trained it on a dataset composed mostly of protein sequences, along with biological text and tokenized 3D structure data, elements most protein language models omit.
> A large portion of the data was enriched to contain additional contextual information about the proteins in the form of textual descriptions, co-evolutionary homologous sequences, and groups of proteins that are known to interact.
These bits made me wonder what would have happened if they had only used the supplementary biological data with an untrained LLM model.
[flagged]
[flagged]