Seshat Global History Dataset Analysis

Code

Official repository: https://github.com/matildaperuzzo/SeshatDatasetAnalysis

Digital Scholarship at Oxford (DiSc) repository: https://github.com/Digital-Scholarship-Oxford/SeshatDatasetAnalysis

Description

SeshatDatasetAnalysis is a python package that allows users to download and analyse the Seshat Global history databank as a time series. Seshat is a large database quantifying different aspects of different civilizations from 10000BC to present. Among others it codes variables related to size, government specialization, information technologies, military technologies and religious affiliations of over 400 polities (political entities such as city-states, kingdoms or empires).

The Seshat data is recorded by intervals, for example a record could be

Roads are present in the East Roman Empire during years 395CE – 631CE

But often analysis revolves around finding out what a variable’s value is in specific year. Seshatdatasetanalysis allows users to sample variables at specific years, create time series with 100y spacing and convert the seshat entries (present/absent/unknown) into binary variables (0/1/NaN).

In addition, the codebase allows to reproduce past results such as variable aggregation into complexity characteristics and Principal Component Analysis (PCA) [1] and includes imputation techniques found in a number of publications [1,2,3].

[1] Turchin, Peter et al., “Quantitative Historical Analysis Uncovers a Single Dimension of Complexity That Structures Global Variation in Human Social Organization.” Proceedings of the National Academy of Sciences 115 (2): E144–51. https://doi.org/10.1073/pnas.1708800115.

[2] Turchin, Peter et al., “Disentangling the Evolutionary Drivers of Social Complexity: A Comprehensive Test of Hypotheses.” Science Advances 8(25): eabn3517. doi: 10.1080/2153599X.2022.2065345.

[3] Turchin, Peter et al., “Explaining the Rise of Moralizing Religions: A Test of Competing Hypotheses Using the Seshat Databank.” Religion, Brain, & Behaviour, 1-28. doi:10.31235/osf.io/2v59j

License

MIT License

Suggested ideas to collaborate

If you are a researcher:

Explore how different regions develop societal complexity in different ways. Social complexity variables track the adoption of many technologies such as roads, bridges, written text, etc. Are there any regional patterns in the adoption of these technologies?
The module creates the following complexity characteristics: population, territory size, size of largest settlement, hierarchy, infrastructure, information, government and money. Can you think of other useful subcategories? Would you split any of these aggregate variables into different sub-variables?

If you are a developer:

The underlying data has many missing points, these are dealt with a linear imputation method. Can you design a more accurate imputation method?
Seshat data is very high dimensional, can you design an intuitive visualization tool to help researchers interact with it better?