Short note on provision of "raw" data in replication packages. "Raw" are data files exactly as downloaded from original source. Not renamed, not converted, not made prettier. All that is important, and is the next step, for which you provide code (or instructions). But "raw" is as "raw as it gets".
Feb 17, 2025 22:31Mostly, that means "raw" is NOT "dta", since only some data providers actually provide Stata-formatted files. It typically means "CSV", maybe "txt" or "dat", rarely "hdf" or "parquet". It can mean many hundreds of files that get collated into a single analysis DTA or Rds file.
It is really convenient & sometimes important to have cleaned files. But especially in light of the variability of data availability on the web, including right now from the US government, it is also important to have the raw files in exactly the original format, for credibility and transparency.