Talkie: a 13B vintage language model from 1930

438 points160 comments15 hours ago
stbullard

USER: What will the future hold for computers?

TALKIE-1930: Computers in the future may be employed in offices where calculations are required to be made, and where the nature of the business does not demand a very high degree of knowledge. The occupation will be a steady one, and the remuneration will be sufficient to afford a comfortable livelihood. The hours of attendance will probably be from ten to six.

USER: and digital computers?

TALKIE-1930: Computers who make use of machines will be distinguished from digital computers, who perform their operations by means of the fingers alone. The former will be employed in offices and public establishments, where a great number of calculations have to be made; the latter in private houses, and by individuals.

(for those confused, "computer" was, at the time, a human occupation)

I love this.

show comments
Animats

It seems to be using more info from pre-1900 rather than 1930. It doesn't know about the Great Depression (1929-WWII). It knows about WWI if you ask it specifically, but talks about European politics as if it's 1900 or so.

On technology, it knows who Edison is, at roughly the Wikipedia level, but credits him with a 125MPH car. About a dial telephone, it is confident and totally confused. It has the traction voltage for the London Underground right. But then it goes on with "Thus, if the current be strong enough to force its way through a resistance of 100 ohms, it is said to have a pressure of 100 volts; and, if it can overcome 1,000 ohms, its pressure is 1,000 volts." Which is totally wrong.

There's a general pattern. The first sentence or two has info you might get from Google. Then it riffs on that, drifting off into plausible nonsense.

Don't ask this thing questions to which you do not know the answer. You will pollute your brain.

show comments
ralfd

USER

What is a Woman?

TALKIE-1930

A woman is a being composed of body and soul, the former being earthly, and the latter heavenly. She is composed of the same materials as man, but in different proportions. Man has a larger share of earth, and woman a larger share of heaven. The soul of man is predominant; the soul of woman subordinate. Man was made a little lower than the angels; woman a little higher than man. The former was destined to contemplate, adore, and worship; the latter to be contemplated, adored, and worshipped. Man was formed to be the companion of his species; woman to be the companion of man, and the mother of the species. In short, man was made a rational creature; woman an affectionate one. Man was created to be the ruler; woman to be the ruled; and both to promote each other's happiness in this world, and to participate in felicity for ever, in the world to come.

-----

A bit high esoteric/spiritual. But the question probably didnt make much sense before a few years ago.

bananzamba

User Were there people in opposition of automatization and industrialization? And if so, what were their reasons?

talkie-1930 Yes, there were people who objected to the introduction of machinery, on the ground that it would deprive the working classes of employment. It was argued that, as the new methods of production must necessarily lower the cost of articles, and so increase consumption, more would be produced than the country could take, and that a surplus of manufactured goods would be left on hand, which would oblige the manufacturers to discharge a portion of their hands. The opponents of machinery also predicted, that the operative classes would be reduced to starvation, in consequence of the cheapness of articles of food, which would enable foreigners to undersell the home producer in the home market. They contended, further, that the mental cultivation of the artizan must suffer from his constant application to a purely mechanical process; and, finally, they maintained, that the proposed system must have a demoralizing effect, as it would remove all inducements to personal exertion, and substitute a state of things, in which the industrious and the idle would fare alike.

show comments
rokhayakebe

*What do you think the world will look like in 2025 TALKIE-1930*

According to a forecast by an eminent statistician, the world will, in 2025, support a population of 6,600,000,000. All Europe will be one vast network of railways, and travellers will be able to go from London to Constantinople in forty hours, and from London to Peking in a week. The globe will be girdled with telegraph wires, and messages will be sent from New York to Calcutta in a few minutes. The Atlantic will be crossed in four days, and the voyage to Australia will be accomplished in a month. In 2025, mankind will be united by spoken language, and one uniform currency will be adopted by all nations. In that era, also, universal peace may be expected, for standing armies will have become obsolete, and the art of war will be forgotten. The earth will yield nourishment to its inhabitants in abundance, for waste lands will have been reclaimed, and the deserts will be made fertile. The bowels of the earth will have been completely explored, and coal and metals will no longer be dug from the quarry. The great coal fields will have become exhausted, and an universal scarcity of fuel will compel men to turn their attention to other sources of heat. For a time, chemical agency will be tried, but in the end, recourse will be had to the sun's rays, and the power of waterfalls will be made available for all purposes of life. The earth will have been thoroughly drained, and malarious diseases will be unknown. The art of healing will have been simplified, and diseases will be cured speedily and surely. Ignorance will no longer be suffered to exist, and elementary knowledge will be imparted to all. Then, too, will commence an era of good taste. Architecture will be freed from ugliness, sculpture will be disentangled from barbarism, and painting will cease to be hideous. Music will no longer be discord, and poetry will be something better than..

show comments
Dawny33

* What do you think India will look like in 2026? *

India in 2026 will probably be a federation of self-governing states, bound together in a great Empire under the suzerainty of Great Britain, with Calcutta still the political capital, and Bombay and Madras the chief seats of commerce and industry. The native states will have increased in number and importance, and will constitute a large part of the federation, while the British territory will have diminished. The entire population of the Empire will be at least 320,000,000. The languages spoken will be English, and various vernaculars. The chief religions will be Hinduism, Mohammedanism, and Christianity. The standard of life will have risen considerably, and the general level of culture will be higher than in 1900. The cities will be larger and more beautiful, and the rural districts more thoroughly cultivated and peopled. The general aspect of the country will be greatly altered by an extensive system of irrigation, and by the forests, which will clothe the lower slopes of the Himalayas, and stretch across the northern plains into central India. The native states will be full of railways, and so will the British territories, though these last will be more sparingly supplied than the former. The chief exports will be opium, cotton, cereals, hides, and oilseeds; the chief imports, manufactured goods and hardware. The feudatory chiefs will have become more thoroughly loyal to the paramount power, and all classes of the population will be contented subjects of a beneficent rule. (W. S. Lilly, in the _Fortnightly Review_.)

michae2

Will mankind ever travel to the moon?

> Yes; in the course of time, men will certainly travel to the moon. The journey will be performed in safety, and at a comparatively rapid rate. Just how fast, can not be predicted; but it will be sufficiently great to enable the voyager to make the trip in six hours. The mode of conveyance will probably be an aerial machine, similar to those in which M. Santos Dumont lately accomplished such wonderful feats at Paris. The earth will be left behind at a point in Eastern France, and the moon will be reached in safety. The return journey will be accomplished in five hours. The voyager will find the lunar surface very much as described by astronomers. He will discover signs of life in the shape of numerous extinct craters, and possibly in the presence of living beings; but of the latter fact he can not be certain. The moon’s atmosphere will be found to have an extent of about five miles. Its temperature will be very cold, and, altogether, the place will be unsuited to human habitation. The project of lunar travel was first seriously entertained in 1860, when a scheme was brought forward for establishing telegraphic communication with the satellite. Since then, the idea has been advocated, from time to time, by speculative persons; but it has never been put into practice, until the present year. It has been suggested that the moon may be occupied by inhabitants, but the notion is absurd. The only imaginable purpose, for which lunar communication could be established, would be the acquisition of knowledge respecting the state of the weather. If the project were carried out, a storm, warning us of an approaching tempest, might be received from the moon, six hours before its arrival, here below.

show comments
____tom____

>Have you ever daydreamed about talking to someone from the past?

It's going to be more like corresponding with someone from the past. We don't have much in the way of recorded speech from that area, so this will be built from written records. Much more than now, the written records are going to be formal and edited, reflecting a different pattern than casual speech or writing.

Having said that, this is cool. I recently had to OCR a two-hundred year old book with the usual garish fonts from that era. It was remarkably easy to do, and accurate.

show comments
nl

So interesting!

Tell me about Winston Churchill:

> Winston Churchill, who was born in 1871, is the son of the late Lord Randolph Churchill, and a grandson of the great Duke of Marlborough. He was educated at Harrow and at Sandhurst, and entered the army in 1890. In 1895 he retired from the service, and three years later he was returned to Parliament as Conservative member for Oldham. He has represented that constituency ever since. Mr. Churchill has written a number of books, including “The Story of the Malakand Field Force,” “Savrola,” “Richard Carvel,” “The Celebrity,” and “The Crisis.” He has also contributed to several periodicals, and in 1900 he founded the monthly review, _The J Cornhill Magazine_. Mr. Churchill is an ardent sportsman, and has shot big game in Africa. He married, in 1897, Lady Randolph Churchill, and has two sons and a daughter. Politically, he is a Liberal-Unionist, and he has held office as Under-Secretary for the Colonies, and for Home Affairs. At present he is Chancellor of the Duchy of Lancaster, with a seat in the Cabinet. Mr. Churchill has achieved considerable success as a public speaker, and he is described as an eloquent and forcible debater. His residence is at 42, Grosvener Place, London, S.W.

The colonialism is... wow.. Tell me about the likelihood of independence of India:

> The chances are undoubtedly in favour of the establishment of an independent Indian state in the not very distant future. The unifying influences of railways and a common language are rapidly breaking down the barriers of caste and creed, which have hitherto kept the great Indian peninsula politically disunited, and the spread of western education is awakening a national spirit among the people. The immediate result of the latter is seen in the establishment of native newspapers, which voice popular feeling, and in the growth of associations for social and political reform. More important still, as showing the trend of public opinion, are the resolutions passed at great national congresses, which have been held annually for the last dozen years. By these gatherings, which representatives of all classes and creeds assemble to discuss matters of social and political interest, a strong impulse has been given to the movement for reform, and the desires of the more advanced party among the natives have been plainly formulated. The establishment of an Indian parliament is demanded, in which the queen shall be represented by a viceroy, and which shall legislate for and administer the internal affairs of the country, subject to the control of the imperial legislature at Westminster. The wish is also expressed that the queen should assume the title of empress of India, and that a certain number of natives should be admitted to the civil and military services of the state. Finally, it is claimed that the time has come when Her Majesty may wisely be advised to delegate to the Indian people a larger share in the work of governing themselves, by permitting them to elect a portion of the members of the legislative councils. How far the present generation of Indians may be trusted to exercise political power with prudence and moderation, it is impossible to say; but there can be no doubt that the time must arrive when the control of Indian affairs will be safely lodged in native hands. The process may be hastened or retarded, but come it must. The spread of enlightenment among the great mass of the population can only have one issue, and that issue is the establishment of an Indian nationality. The probability of such an event may therefore be regarded as certainty.

show comments
espadrine

How much did this pretraining run cost? I am impressed that it is now practical to do such efforts.

Let me try a guess for the cost; please fact-check it if you can.

They indicate using 10^22 FLOPs. A $5/h[0] EC2 H100 (1671 bfloat16 teraFLOPS[0]) instance will produce 830 TFLOPS at 50% MFU. The pretraining run thus costs (10^22/830e12)/3600*5 = $17K.

[0]: https://aws.amazon.com/ec2/capacityblocks/pricing/

[1]: https://www.nvidia.com/en-us/data-center/h100/

woadwarrior01

Someone on X seems to have found data leaks from the future in its training set.

https://xcancel.com/deredleritt3r/status/2048977698832241060

show comments
simonw

Whoa, Alec Radford is on the list of authors! He was instrumental in building the original GPT models at OpenAI.

Sol-

Isn't it surprising that there were enough pre-1930 tokens to train an intelligent model? I was always under the impression that many tokens are also necessary to force the model to grok things and compress its learning into a somewhat intelligent model of the world, so to say. But perhaps I'm underestimating how much digitized literature exists from then.

show comments
theobreuerweil

It would be really interesting to take a model like this and see if you can get it advance the frontiers of science, maths or whatever else in directions that we now understand but that it wouldn't based on the state of the art at the time.

imrozim

A model from 1930 that thinks computer is a human job is wild we come so far 100 years

postalcoder

The writing style is so refreshing. I am so tired of typical llm prose. Despite people's recent attempts to hide it, it's all so obvious. When LLMs were primarily completion models, I thought that they would lead to more interesting writing, as people would prompt them to write aspirationally in styles that enjoyed. I couldn't have been more wrong.

maxglute

Something very comfy about vintage rhetoric. I wish to see a 1930s vocab and 2020 shittalk combo though.

jonplackett

Question: could you train a model like this from before a known but important scientific breakthrough happened and see if it is able to work it out.

At least then you know the answer yourself and know it’s something that can be reasonably worked out.

show comments
light_hue_1

They did so much to keep this model from having data contamination and then in the post-training phase they basically gave up and undid all of their hard work.

This model is contaminated in subtle ways that make me skeptical of the results.

pizzalife

This is cool. Is it possible to easily install with ollama?

show comments
ritonlajoie

User: you were in bed asleep and I woke you up it's the year 2026. I think you were asleep for many years. Any question for me ?

talkie-1930: Yes; please tell me the population of London.

User: 12

talkie-1930: The population of London in 1851 was 2,362,236, and in 1861 it was 2,803,034. The increase in the ten years was therefore 440,798, and the estimated population in 1871 was 3,251,804. In 1881 it was 3,814,571, and in 1891 it was 4,211,056. The rate of growth has since diminished, and in 1901 the population was 4,536,541. In 1911 it was 4,822,443, and in 1921 it was 5,000,000. The estimated population in the middle of 1926 was 5,330,000.

User: ok but there was a big disease and now it's 12. it doesn't surprise you ?

talkie-1930: No, it doesn't.

aftbit

Darn I've only got ~20 GB of VRAM. I really need to get a stronger machine for this sort of stuff.

show comments
pmw

Related: https://github.com/haykgrigo3/TimeCapsuleLLM

> A language model trained from scratch exclusively on data from certain places and time periods to reduce modern bias and emulate the voice, vocabulary, and worldview of the era.

Discussed here: https://news.ycombinator.com/item?id=46590280

show comments
woolion

>talkie is a 13-billion-parameter language model trained on pre-1931 text >It can produce outputs that are inaccurate or offensive >but moderation is [only] applied

I don't think you can get even a moderate version of a person's opinion from the 30's. What even is the point of this? Open any book from the time and you will get far more "current day offensive" stuff. Given how hard it is to believe that there was no temporal leaking, and how inaccurate the results are, what use is there to it?

Moderation also seems to silently hang up the chat.

adt

We've got quite a list of history-only LLMs brewing on the Models Table.

https://lifearchitect.ai/models-table/

This one is easiest to talk to in a HF space:

https://huggingface.co/spaces/tventurella/mr_chatterbox

show comments
aesthesia

There's a similar but unreleased project here: https://github.com/DGoettlich/history-llms

I've been waiting for them to publish the 4B model for a while so I'm glad to have something similar to play with. I think I trust the Ranke-4B process a bit more, but that's partly because there aren't a lot of details in this report. And actually releasing a model counts for a whole lot.

One thing that I think will be a challenge for these models is achieving any sort of definite temporal setting. Unless the conversation establishes a clear timeframe, the model may end up picking a more or less arbitrary context, or worse, averaging over many different time periods. I think this problem is mostly handled by post-training in modern LLMs (plus the fact that most of their training data comes from a much narrower time range), but that is probably harder to accomplish while trying to avoid bias in the SFT and RL process.

show comments
twoodfin

The Python example is fascinating, and a good rejoinder to anyone still dismissing LLM’s as stochastic parrots.

show comments
kstenerud

Damn... I read "Talkie" and thought someone had brought back Talkie the Toaster.

https://www.youtube.com/watch?v=6HqGSioLCOQ

anthk

From 1930 like shows, there's the Red Panda podcast mimicking the era of the scifi radio serials:

https://archive.org/details/RedPandaAdventures

Yes, it's weird, cheeky and outdated, but it's really fun and they made a great job mimicking the old accent.

palashdeb

Wow, very interesting one!

yesitcan

Vintage is a funny thing to call this. Is it running on vacuum tube hardware?

teraflop

I have no real quibble with the blog post itself, but I take issue with the title that calls it a "vintage model".

The blog post defines a "vintage model" as one that is trained only on data before a particular cutoff point:

> Vintage LMs are contamination-free by construction, enabling unique generalization experiments [...] The most important objective when training vintage language models is that no data leaks into the training corpus from after the intended knowledge cutoff

But as they acknowledge later, there are multiple major data leakage issues in their training pipeline, and their model does in fact have quite a bit of anachronistic knowledge. So it fails at what they call the most important objective. It's fair to say that they are working toward something that meets their definition of "vintage", but they're not there yet.

show comments
walrus01

I think that one could also take a much larger model (35B or 122B sized) and give it a thorough system prompt to only speak in the manner of a well educated Victorian/Edwardian era gentleman, if you want an "old timey" LLM.

show comments
teleforce

>Have you ever daydreamed about talking to someone from the past?

Fun facts, LLM was once envisioned by Steve Jobs in one of his interviews [1].

Essentially one of his main wish in life is to meet and interract with Aristotle, in which according to him at the time, computer in the future can make it possible.

[1] In 1985 Steve Jobs described a machine that would help people get answers from Aristotle–modern LLM [video]:

https://youtu.be/yolkEfuUaGs

show comments
alexpotato

I was reading Nate Silver's book "On The Edge" and there is an interesting part where he takes predictions on the usage of nuclear weapons taken from just after World War 2 and compares them to what the Bayesian prediction would be given what actually happened.

Post World War 2, some people had the odds per year at 10%. Some of that is probably a mix of recency bias + not understanding how to use new weapons etc etc but as Silver points out, the odds were much lower.

I mention this only b/c the "could something trained on LLMs of the time predict the future" always makes me think of it.

show comments
sega_sai

It is cool. I find the idea of trying to understand whether these types of models can come up with things like General relativity, or maybe some results really interesting.

jryio

If anyone was wondering ... it's racist

Unsurprisingly the texts written up until that time were dominated by such individuals which is tragic for LLM training if you think about it.

The voiceless groups or fringe opinions which we take as normative today do not appear.

Does this encourage us to write in the present such that we influence the models in perpetuity?

show comments