Chris Fields Research
Research questions, recent publications ...

Return to Homepage

The Erdős Collaboration Graph

Mathematician Paul Erdős (1913 - 1996) is well-known as the author of at least 1,525 research papers with 511 different collaborators. In tribute to Erdős' vast productivity and collaborative zeal, the ''Erdős Number'' of any researcher is defined as follows: Paul Erdős himself has (uniquely) Erdős Number 0, any co-author with Erdős has Erdős Number 1, any co-author with an Erdős co-author has Erdős Number 2, etc. Considerable data on the Erdős Numbers of prominent researchers, including lists of all individuals with Erdős Numbers of 1 or 2, are maintained by the Erdős Number Project at Oakland University.

The Erdős Collaboration Graph is the graph constructed by representing researchers as vertices and joint publications as edges connecting their co-authors. The Erdős Collaboration Graph is a social network in which the interdependency is productive (of published papers) research collaboration. The length of the shortest path from the vertex representing any individual included in the Erdős Collaboration Graph to the vertex representing Erdős is that individual's Erdős Number; however, the Erdős Collaboration Graph also allows the determination of paths between any two individuals who are included in the web of collaboration that includes Paul Erdős. For example, with an Erdős Number of 3, I have a path of length 4 through Erdős to Alfred Tarski and a path of length 5 through Erdős to Albert Einstein (data from the archive of the Erdős Number Project). The extent to which the Erdős Collaboration Graph captures all research collaboration, i.e. includes all researchers who have published jointly-authored papers, is unknown.

My Erdős lineage

A subgraph of the Erdős Collaboration Graph showing some paths connecting me to Erdős is shown below. Eight paths (1 of length 3 and 7 of length 4) connect me to Erdős via collaborative links in bioinformatics and genomics; one path (of length 3) is via a collaborative link in nuclear physics.

The data on collaborative links up to Erdős Number 2 used to construct this subgraph are from the 2010 records of the Erdős Number Project. The additional data are listed below, keyed by the edge labels shown in the subgraph.

Many of the connections in my Erdős lineage result from my involvement in the early stages of the Human Genome Project. As discussed in Some effects of the Human Genome Project on the Erdős collaboration graph, this "big science" collaboration gave researchers across biology lower Erdős numbers.

Cycles in collaboration graphs

An interesting feature of the graph of my Erdős lineage is that it is composed entirely of cycles: every vertex is between at least two other vertices. Cycles indicate interactions between members of different academic lineages. The subgraph below shows another cycle that combines molecular biology collaborations with nuclear physics collaborations; the unlabeled links are again from the Erdős Number Project. In How small is the center of science? Short cross-disciplinary cycles in co-authorship graphs, I show that short cycles like this one cross many boundaries between scientific disciplines. This leads to a question: how "real" are the disciplinary boundaries that are often taken so seriously?

The boundaries between disciplines

Nobel prize winners and other prominent scientists are naturally regarded as "central" to their disciplines. In Close to the edge: Co-authorship proximity of Nobel laureates in Physiology or Medicine, 1991 - 2010, to cross-disciplinary brokers, I show that Nobel laureates in biomedicine are also close to the boundaries of biomedicine - on average, less than three co-authorship steps away. Co-authorship proximity of A. M. Turing Award and John von Neumann Medal winners to the disciplinary boundaries of computer science shows that recipients of either the A. M. Turing Award or the John von Neumann Medal in computer science are even closer to the edges of their discipline. The graph below, from Nobel numbers: Time-dependent centrality measures on co-authorship graphs, shows a small section of the boundary between the biomedical sciences and physics containing co-authorship paths that traverse either me or my colleague Eric Lander. Nobel laureates in either Physiology or Medicine (lower part of graph) or Physics (upper part of graph) are indicated by two-digit award dates. Being close to the boundaries of their respective disciplines makes these Nobel laureates close to each other. Do other boundaries between disciplines also look like this? If so, what does this mean for the "shape" of disciplines in co-authorship space?

Supporting references

Any of my collaborators can use the subgraphs above and the references below to establish that his/her Erdős Number is at most 4, as well to discover perhaps surprisingly short paths to many other scientists.

A: Stormo, G. D., T. D. Schneider, L. Gold and A. Ehrenfeucht (1982). Use of the `Perceptron' algorithm to distinguish translational initiation sites in E. coli. Nucleic Acids Research 10: 2997-3011.

B: Smith, T. F., M. S. Waterman and C. Burks (1985). The statistical distribution of nucleic acid similarities. Nucleic Acids Research 13: 645-656.

C: Branscomb, E., T. Slezak, R. Paestar, D. Galas, A. V. Carrano and M. S. Waterman (1990). Optimizing restriction fragment fingerprinting methods for ordering large genomic libraries. Genomics 8: 351-366.

D: Gardner, M. J., H. Tettelin, D. J. Carucci, L. M. Cummings, L. Aravind, E. V. Koonin, S. Shallom, T. Mason, K. Yu, C. Fujii, J. Pederson, K. Shen, J. Jing, C. Aston, Z. Lai, D. C. Schwartz, M. Pertea, S. Salzberg, L. Zhou, G. G. Sutton, R. Clayton, O. White, H. O. Smith, C. M. Fraser, M. D. Adams, J. C. Venter and S. L. Hoffman (1998). Chromosome 2 Sequence of the Human Malaria Parasite Plasmodium falciparum. Science 282: 1126-1132.

E: Fields, C. A., R. J. Peterson, R. S. Raymond, J. L. Ullman, R. J. de Meijer, E. H. L. Aarts, and M. B. Greenfield (1983). Deuteron projectile breakup on 28Si at Ed = 17.85 MeV. In: H. Ogata, T. Kammamuri, and I. Katayama (Eds.) Light Ion Reaction Mechanism. University of Osaka. pp. 621-625.

F: Waterman, M., E. Uberbacher, S. Spengler, F. R. Smith, T. Slezak, R. Robbins, T. Marr, D. Kingsbury, P. Gilna, C. Fields, K. Fasman, D. Davison, M. Cinkosky, P. Cartwright, E. Branscom and H. Berman (1994). Genome informatics I: Community databases. Journal of Computational Biology 1: 173-190.

G: Cassandra L. Smith, C. L., S. K. Lawrance, G. A. Gillespie, C. R. Cantor, S. M. Weissman and F. S. Collins (2003). Strategies for mapping and cloning macroregions of mammalian genomes. Methods in Enzymology 151: 461-489.

H: International Human Genome Sequencing Consortium (including E. S. Lander and W. R. McCombie) (2001). Initial sequencing and analysis of the human genome. Nature 409: 860-921.

I: Mikkelsen, T. S., M. J. Wakefield, B. Aken et al. (including M. A. Marra and T. P. Speed) (2007). Genome of the marsupial Monodelphis domestica reveals innovation in non-coding sequences. Nature 447: 167-177.

J: Mount, S. M., C. Burks, G. Herts, G. D. Stormo, O. White and C. Fields (1992). Splicing signals in Drosophila: intron size, information content, and consensus sequences Nucleic Acids Research 20: 4255-4262.

K: Martin-Gallardo, A., W. R. McCombie, J. Gocayne, M. FitzGerald, S. Wallace, B. M. Lee, J. Lamerdin, S. Trapp, J. Kelley, L.-I. Liu, M. Dubnick, L. Dow, A. R. Kerlavage, P. De Jong, A. Carrano, C. Fields and J. C. Venter (1992). Automated DNA sequencing and analysis of 106 kilobases from human chromosome 19q13.3. Nature Genetics 1: 34-39.

L: Adams, M., M. Dubnick, A. Kerlavage, R. Moreno, J. Kelley, T. Utterback, J. Nagle, C. Fields and J. C. Venter (1992). Sequence identification of 2375 human brain genes. Nature 355: 632-634.

M: McCombie, W. R., A. Martin-Gallardo, J. Gocayne, M. FitzGerald, M. Dubnick, J. Kelley, L. Castilla, L.-I. Liu, S. Wallace, S. Trapp, D. Tagle, L. Whaley, S. Cheng, J. Gusella, A.-M. Frischauf, A. Poustka, H. Lehrach, F. S. Collins, A. R. Kerlavage, C. Fields, and J. C. Venter (1992). Expressed genes, interspersed repeats, and polymorphisms in cosmids sequenced from 4p16.3. Nature Genetics 1: 348-353.

N: McCombie, W. R., M. D. Adams, J. M. Kelley, M. G. FitzGerald, T. R. Utterback, M. Kahn, M. Dubnick, A. R. Kerlavage, J. C. Venter and C. Fields (1992). Caenorhabditis elegans expressed sequence tags identify gene families and disease gene homologues. Nature Genetics 1: 124-131.

O: Schein, J. E., M. A. Marra, G. M. Benian, C. Fields and D. L. Baillie (1993). The use of deficiencies to determine essential gene content in the let-56 - unc-22 region of Caenorhabditis elegans. Genome 36: 1148-1156.

P: Feynman, R. P., N. Metropolis and E. Teller (1949). Equations of state of elements based on the generalized Fermi-Thomas theory. Physical Review 75: 1561-1573.

Q: Edgar, R. S. , R. P. Feynman, S. Klein, I. Lielausis and C. M. Steinberg (1962). Mapping experiments with r mutants of bacteriophage T4D. Genetics 47 (2): 179-186.

R: Cox, G. N., M. Kusch and R. S. Edgar (1981). Cuticle of Caenorhabditis elegans: Its isolation and partial characterization. Journal of Cell Biology 90: 7-17.

S: Cox, G. N., C. Fields, J. M. Kramer, B. Rosenzweig and D. Hirsh (1989). Sequence comparisons of developmentally regulated collagen genes of Caenorhabditis elegans. Gene 76: 331-344.

T: Fields, C. A., J. J. Kraushaar, R. A. Ristinen and L. E. Samuelson (1978). High-spin states above 3.5 MeV in 91Nb. Nuclear Physics 326: 55-64.

U: J. J. Kraushaar and M. Goldhaber (1953). Direction and polarization correlations of successive gamma-rays. Physical Review 89: 1081-1089.

V: M. Goldhaber and E. Teller (1948). On nuclear dipole vibrations. Physical Review 74: 1046-1049.

Return to Homepage

Copyright © 2011-2015 Chris Fields