Once upon a time in the 1960s, scientists thought the human genome might contain as many as 2 million genes, units of DNA that code for proteins. But ever since then, the estimated number has been steadily shrinking. A new study suggests that the human genome could contain as few as 19,000 protein-coding genes, fewer than nematode worms.
By the time the Human Genome Project began in the late 1990s, the Physics arXiv Blog reports, the highest estimates of the number of protein-coding genes put the number at 100,000, and estimates continue to fall:
In 2001, the initial sequence of the human genome cut the figure dramatically. The International Human Genome Sequencing Consortium put it at 30,000 while a rival group led by Craig Venter estimated the number at 26,000. In 2004, the final draft of the human genome reduced the figure even further to around 24,500 and in 2007 further analysis suggested that it was more like 20,500. And that’s where the figure has sat. Until now.
Researchers came up with the new estimate, detailed in a paper submitted to Molecular Biology and Evolution, by performing a variety of analyses such as filtering "out the human genes that are not present in other species and do not have a structure likely to code for a protein."
Of course, research has shown that more complex organisms don't require more genes. As Medium points out, a water flea has 31,000 genes, the most in any animal, while the record for the largest genome is thought be held by a rare flowering plant native to Japan called Paris japonica.
All this raises the question: How does the human genome create as much complexity as it does: for instance the brain? Nobody knows exactly, but the answer would be invaluable. Non-coding regions of DNA, which make up a majority of the genome, play a huge role that is only just beginning to be understood.