Accessing CRSP data on Linux
Table of Contents
The CRSP data structure is rich and complex. The information about accessing it, is distributed across different online sources and manuals. Below I will briefly review the steps necessary to have a functioning installation on your Linux machine and getting started with using CRSP database.
Installation
Using your username and password access the Move It Cloud
service. From the folder Product Download->Stock_1925_ANNSUB
get
the file faz201812_cadb.zip
. This is a compressed file. Just
extract it in a suitable place. Then from the folder Utility
Download->CUPL
download the file setupLinux[64].bin
. Select the
64 or 32 version as it suits you. Execute this file to install the
CUPS utility and libraries on the local system. Provide a suitable
path to the installation program when requested.
Configuration
This is a rather obscure piece of information but it is very
relevant. Move in the root folder in which the CUPS programs were
installed. There should be a folder named "accbin[64]". With or
without the 64
part, depending on the installed version of the
tools. Inside this folder, find the utility crsp_setup.[csh,sh]
and run it. Use the .sh
version if you are using the Bash shell
or the .csh
version if you are using cshell
or kshell
. The command
> echo $SHELL
will tell you which shell version is in use. You will be requested
to provide the location (path) of your CUPS programs and of your
databases. The output of crsp_setup.[csh,sh]
is a text file
containing a set of environment variable definitions. These are the
variables the CUPS utilities will rely upon. The default name is
something like mycrsp.kshrc
but you can change it. Now you have
two choices. One possibility is adding the lines of this file to
your login initialization files. You can use any editor to do
it. Check the manual of your shell to discover the proper names of
these files. Another possibility is just parsing this file before
using the command line tools,
> source mycrsp.kshrc
In this way the necessary environment variable definitions are added to you working session. For more information on the configuration step check the release notes.
Example of use
The utility you are more likely wanting to use is ts_print
, which
"prints time series". Two steps are necessary to use this
program. First, you have to prepare a "request" text file that
contains the definition of what you want to get. You can use the
editor of your choice for this. Second, you have to pass the request
file to the program. Assuming the name of the prepared file is
query.txt
, from anywhere on your system just run (this works ONLY
if your environment variables where properly defined, see
Configuration above)
> $CRSP_BIN/ts_print query.txt
The output produced by ts_print
is a very convenient and well
structured ASCII file. In general, query.txt
contains all the
information ts_print
needs. If no output file is specified in
query.txt
, then the data are printed to standard output. This is
very useful when you want to do post-processing using shell's pipe.
An example of request file is the following
ENTITY #download data for Microsoft LIST|TICKER MSFT |ENTFORMAT 3 END ITEM #adjusted opening price ITEMID AdjOpenPrc #adjusted highest traded price during the day ITEMID Adjaskhi #adjusted lowest traded price during the day ITEMID Adjbidlo #adjusted official closing price ITEMID Adjprc #adjusted total volume ITEMID Tvol #adjusted dividend ITEMID Adjdiv #number of outstanding shares ITEMID Shr END DATE CALNAME daily|RANGE 19860313-20181231 CALFORMAT 1 END OPTIONS X ITEM,YES|Y DATE,YES|Z ENTITY,YES,1 END
The lines beginning with #
are comments. Using the previous file
you can download several daily quantities, prices, volume and
dividend, about Microsoft stock. I use the command
> $CRSP_BIN/ts_print query_MSFT.txt | gzip > MSFT.txt.gz
to compress the output of ts_print
on the fly and save the
compressed data in MSFT.txt.gz
. There are many things you can do
by tweaking the request file. I'm not going to review them. A good
starting point might be request file on-line resource.
Another useful utility program is dstksearch.sh
. It can be run with
> $CRSP_BIN/dstksearch.sh
However, what this program does is just looking for security
information in the $CRSP_DSTK"/headfile.dat
. This is a simple
ASCII file that you can parse yourself, for instance with the grep
command, to find information about one company or ticker
>grep "GENERAL ELECTRIC|" "$CRSP_DSTK/headfile.dat"
will produce the output
12060 20792 GENERAL ELECTRIC CO 1 19251231-19620701 12060 20792 GENERAL ELECTRIC CO GE 1 19620702-19680101 12060 20792 36960410 GENERAL ELECTRIC CO GE 1 19680102-20181231 32168 23832 GENERAL ELECTRIC CO LTD GLE 2 19620702-19640908 32168 23832 GENERAL ELECTRIC CO LTD GLE -2 19640909-19641123 32168 23832 GENERAL ELECTRIC CO LTD GLE 2 19641124-19680101 32168 23832 36964060 GENERAL ELECTRIC CO LTD GLE 2 19680102-19681128 32168 23832 36959520 GENERAL ELECTRIC & ENGH ELEC CO GLE 2 19681129-19691204 45858 21430 73650810 PORTLAND GENERAL ELECTRIC CO PGN 1 19680306-19860211 91204 21430 73650884 PORTLAND GENERAL ELECTRIC CO POR 1 20060410-20181231
The different columns contain in order:
- PERMNO, the CRSP unique issue identification code;
- PERMCO, the CRSP unique company identification code;
- CUSIP, the COMPUSTAT company identification code;
- The name of the company;
- The ticker symbol associated to the company stock;
- The exchange where the stock is traded: 1=NYSE, 2=AMEX, 3=NASDAQ, 4=ARCA
- The range of available data