DH Project Foundations

Best Practices for Getting Started


Joey Takeda, Digital Humanities Innovation Lab, Simon Fraser University

University of British Columbia, January 07, 2025


Unceded territory of the səl̓ilw̓ətaʔɬ (Tsleil-Waututh), kʷikʷəƛ̓əm (Kwikwetlem), Sḵwx̱wú7mesh Úxwumixw (Squamish), and xʷməθkʷəy̓əm (Musqueam) Nations

Agenda

  1. Establishing best practices for file names
  2. Creating plan for structure
  3. Getting GitHub accounts setup (CoLab)

DH Projects as the "Basic Unit of DH"

Projects are both nouns and verbs: a project is a kind of scholarship that requires design, management, negotiation, and collaboration. It is also scholarship that projects, in the sense of futurity, as something which is not yet.
[P]rojects are projective, involving iterative processes and many dimensions of coordination, experimentation, and production.
Anne Burdick, Johanna Drucker, Peter Lunenfeld, Todd Presner, and Jeffrey Schnapp. Digital_Humanities. The MIT Press, 2012. https://doi.org/10.7551/mitpress/9248.001.0001.

File management and structure

  • Managing and wrangling data, files, etc is a crucial dimension of this work
  • Planning on the information you will collect

File Naming

Best Practices for Files

  • File names should be unique
  • File names should be consistent
  • File names should be descriptive, but not deterministic

Best Practices for Files

  • File names should consist of numbers, letters, underscores, and/or hyphens
  • No spaces, ampersands, colons, periods, et cetera
  • Separators should be consistent (e.g. always use _ or -)

Different Cases

  • camelCase: myFileName.xml
  • UpperCamelCase: MyFileName.xml
  • snake_case: my_file_name.xml
  • kebab-case: my-file-name.xml
  • Train-Case: My-File-Name.xml

Other Tips

  • Numbers should be left padded (e.g. not File_1 but File_001 if you think there may be more than 99)
  • Prefer human readable names (e.g. 1925-02-01_WinnipegTribune, not 19250201WT)
  • Think about sortable order
  • Strive for immutability; do not embed changeable states (more on this later)

Other Tips (for TEI Projects)

  • Filenames should be the same as ids
  • Should begin with a letter
  • For entities, usually 4-5 letters and a number, based on some unique property (as a shorthand)

Structuring Folders

  • All about the tree
  • But folders are just conveniences and should not be a primary way of categorizing data
The best actors in the world, either for tragedy, comedy, history, pastoral, pastoral-comical, historical-pastoral, tragical-historical, tragical-comical-historical-pastoral, scene individable, or poem unlimited.
Hamlet, TLN 1445-1450

Folders, continued

  • Folders are not necessarily the place for interpretation
  • Where possible, folders should reflect some kind of material reality or convenient structure

Files and Folders

  • Redundancy is fine (good even!)

Some Examples

Common Structure for TEI Projects

                
                    ├── facsimiles
                    │   ├── Text1.tiff
                    │   ├── Text2.tiff
                    ├── info
                    │   ├── about.xml
                    │   ├── acknowledgements.xml
                    │   ├── index.xml
                    │   ├── legal.xml
                    │   ├── menu.xml
                    ├── sch
                    │   ├── [PROJECT_NAME].odd
                    │   ├── [PROJECT_NAME].rng
                    │   ├── [PROJECT_NAME].sch
                    ├── texts
                    │   ├── Text1.xml
                    │   ├── Text2.xml
                    ├── bibliography.xml
                    ├── organizations.xml
                    ├── people.xml
                    ├── places.xml
                    └── [PROJECT_NAME].xpr
                    
            

Activity

https://joeytakeda.github.io/disa-workshops/

GitHub / Version Control

Version Control

  • Data should be versioned using a version control software
  • Version control means every version of every file is retained
  • So no need for Draft-NEW, Draft-NEW-Revised, My-File-REVISED-Final-JANUARY, etc

GitHub

  • Signing up for GitHub account: https://github.com