Cloud Resources

DIME Analytics supports various cloud computing and storage resources. For access to any of these unlimited-size cloud storage and computing solutions, contact dimeanalytics@worldbank.org. When contacting us, please answer the questions at the bottom of this page to your best ability.

RStudio Server

RStudio Server is a powerful installation of R that lives in your browser and is connected to our local Hadoop cluster, meaning you can process a lot of data very quickly in a familiar environment. DIME Analytics can set you up with an access account and a training session.

Microsoft Azure DevOps

Microsoft Azure DevOps is a free code development suite which includes, among other things, the ability to create and manage an unlimited number of public and private Git repositories. Your Bank SSO login is already registered with an account. Learn more about Git.

Cloud Storage and Computing

DIME Analytics supports free access to cloud storage and computing resources via both Microsoft Azure and Amazon Web Services. These services create large, secure data storage locations in the cloud, as well as scalable computing resources for working with large datasets. Contact us to find out what is right for you.

Hadoop Cluster

The Hadoop cluster is a powerful on-premises computing environment that can be used for just about anything. Let us know if you think you may need such a solution and we can provide access.

Checklist when contacting DIME Analytics

If you answer these questions to your best ability when you contact DIME Analytics on dimeanalytics@worldbank.org, we will be able to help you faster and better. If you do not know the answer then just say so.

Uploading and storing data in the cloud:

If you will upload and/or store data in the cloud, then please answer these questions:

  • Will you store the data as files or in a database? If database, do you have a preference for which database or which type of database?
  • Roughly, how much data will you store? A few GBs? Hundreds of GBs? Several TBs?
  • Where does the data come from?
    • Will you upload data from your computer?
    • Will the cloud resource access the data from an API?
    • Will a script in the cloud generate the data through, for example, web scraping?

Processing data in the cloud:

If you need to run any scripts in the cloud to, for example, analysis of data, generation of data (web scraping etc.), preparing the data (generating a sub-set or aggregated version of the data for easy download), then please answer these questions:

  • Are you using a proprietary programming language like Stata, or non-proprietary like Python or R?
  • How often does the script needs to run? What decides when it runs? Pre-set time intervals or user triggered?
  • What processing power will you need in terms of processor and RAM memory? If you are running anything your laptop is powerful enough to run, then do not worry too much about this question.

Accessing data in that cloud:

If you are have data in the cloud that us not just back-up, then please answer these questions:

  • How will this data be accessed? And how frequently?
  • Does everyone in the research team need access to all data? For example, if you have a very large data file or data base, can an aggregated version of the data be generated that the research team access, or does the full research team need access to all levels of detail in the data?
  • Does anyone that needs access to the data not have access to a World Bank computer? Is anyone of them not World Bank Staff/ETC/STC?