Project version control

Browsing a project's version control repository

Before beginning work in a project, you should take time to learn how the project files are organized. You will want to become familiar with the directory structures as well as any CVS "modules" that have been defined by project members.

  1. From your Project home page, click the CVS or Subversion link in the left navigation pane.
  2. Navigate to the Browse source code section of the Version control page to view the the project's source repository through your web browser.

Each project's web content files are located in the myproject/www module by default. Each project has two predefined modules: www and look. Other project source modules are created and organized by project owner or project members with version control write permissions. If browsing deleted and removed project files is permitted, an "Attic/" module is displayed.

See Browsing version control for more information.

Note: Generally the URL to a project carries the project name and the domain name. For example: http://[projectname].[domainname] In CVS, you can checkout a directory using the following commands:

In this case, the project name figures twice in the first command and once in the second. However they perform the same function �checkout. This goes to say that in both cases CVS takes its cue for the name of the project from the latter part of the command and ignore its presence in the first part, before the domain name.

Consider these two URLs:

The first URL owns the same project name in both sections of the URL. The second one is a contradiction because two different project names figure in each portion of the URL. However, the version control system disregards the project name when it occurs first in the URL and takes you to PROJECT-A in both cases. This anomalous behavior causes no security concerns.

Getting your working copy of project version control

To obtain (or "check out ") your own working copy of project files, step-by-step instructions are included on the Project Source page for both command line CVS and WinCvs. These instructions also show the exact cvsroot to set for the current project. The cvsroot points to the server location of the project's version control.

A working copy refers to the replicated set of project version control files you maintain and modify on your own local machine. If you are new to CVS, this is an important CVS concept to understand. These files reside in your workspace, and the CVS server does not track or know anything about your changes (or other developers' changes) until you commit modified files or add newly created files back into the main project repository.

This may be different from other versioning systems which explicitly track creation of workspaces. With CVS, you can check out working copies as many times as necessary. Keep in mind, however, that files in the project repository continually change over time as to reflect other developer contributions as the project progresses. It's critical to keep your working copies of files up to date with the repository.

CVS client downloads

You can download CVS clients for command line, Windows, or Mac and other developer tools at http://www.wincvs.org/download.html.

Encodings used in CVS and Subversion for filenames

Character Encoding: Character encodings are the organization of numeric codes that represent the characters of a character set in memory. Each character in a language is assigned a unique pattern of numeric codes and this is called "encoding."

Locale setting :Locale setting is setting the set of information that corresponds to a given language and country. A locale is a string that names not only the encoding, but also the language, and possibly even the country. A character encoding optionally preceded by an abbreviation for the name of the language and/or the country. For example a Korean locale setting can be, ko_KR.UTF-8.

For full details of how to choose and set a locale, you will need to consult your system's documentation.

Some of the character encodings used in settings are:

Note: As of CollabNet Enterprise Edition version 3.0.0, CollabNet recommends use of the Unicode variant known as UTF-8 throughout our user interface.

If you named a file using the Korean characters and if your system had the EUC-KR character encoding then the Version Control system will store it in the repository and when another user whose system has the same EUC-KR character encoding checks the file out, the filename will read the same.

However if the person who checks out the file has the EUC-JP encoding set in his system and if the project uses CVS, the filename will appear garbled. If the project uses Subversion, then this will cause Subversion to report an error, and to refuse to do the check-out. This is because the Korean characters used in the filename simply are not part of the EUC-JP encoding, because EUC-JP only supports Japanese characters.

Unicode allows multiple languages co-exist in the same file, filename, strings and so on. Although Subversion is capable of transcoding to/from UTF-8 , from EUC-KR to UTF-8 , and other possible combinations:

  1. If you are a single-user using a single-machine, then you will have no encoding issues.
  2. If you have multiple- users, machines, located in different places, then you are very likely to have problems because users will set locale to different values, which typically means a different encoding.
  3. If you have filenames in hyperlinks, this leads to garbled text in links that do not work.

If the project uses CVS, and if you check in a non-UTF-8 , non-ASCII file name into the www/ tree, and make hyperlinks to it, then the links will not work: the browser will attempt to present them as UTF-8 , but since they are not, their non-UTF encoding will be garbled, and will no longer match the file name stored in CVS.

If the project uses Subversion, and the same check-in happens, then the file names are transcoded from the original encoding into UTF-8 by Subversion, (during the check in), and then they are presented as UTF-8 when CollabNet publishes the web site. Similarly, URLs inside the web pages are presented in UTF-8 by the browser. This could work, except for the one developer, who is using some other encoding other than UTF-8 . At his/her desk, the files have some non-UTF encoding. If the developer enters URLs in the web pages that match the files on his disk, there is nothing available in his system to successfully transcode these into UTF-8 (the browser will assume they already are UTF-8 , and will not transcode). So you will face a situation where either the concerned developer can't use the pages, or if the same developer encodes them such that he/she can, then no one else can use them.

Files and directories with non-ASCII names, are part of the "remote published" document tree of a CollabNet project (that is, if they are checked in anywhere below the www/directory), then CollabNet will publish them as UTF-8 .

If a developer working in Japan named a file using Japanese characters then Subversion will store it in the repository by encoding it in UTF-8 . Subsequently when another developer in the US checks out the same file, UTF-8 will decode it for him/her and display it in Japanese characters again.

Note: Transcoding by UTF-8 will not make any coherent sense to the American developer because UTF-8 can only encode and decode names from one language in one setting to the same language in another setting: Japanese > UTF-8 > Japanese. UTF does not translate. So the checked out filename will contain the same Japanese filename that was originally checked in.

  1. In general, you first create file and directory names on your own computer. When you commit the files using the "commit" process of the Version Control tools, or an "upload" process of some web page, most services (upload, and Subversion) will detect the encoding you have set, interpret the file and directory names by that encoding, and either translate immediately into UTF-8 , or ensure that this will eventually happen when the names are displayed in the other ways discussed below. The lone exception is CVS: CVS does not interpret your file and directory names in any way at all. The pattern of bits your file name had when you committed it is the pattern of bits everyone else sees when they check it out. If they are not using the same encoding as you, the name will be garbled. This brings us to our first guideline: projects that use CVS must be very careful all to use the same encoding. Of course, we have recommended that you use UTF-8 for all purposes, all projects, all users, and that would satisfy the CVS requirement. But if you have some reason to use another encoding with CVS, it is important to bear in mind that all users must use the same one.
  2. Once a file is stored within CollabNet, a second place where its name matters is in the various browsing features of CollabNet: the Documents and Files area, the ViewCVS browser for Subversion and CVS, and similar places. All of these areas actually display their names using UTF-8 . In order to support legacy data from before CollabNet Enterprise Edition 3.0.0, these areas will attempt to convert names from one of the traditional encodings into UTF-8 , if they can determine that this is needed. Unfortunately, there is no completely certain way to make that determination, as was discussed above. If your site has been migrated from an earlier version of CEE, where some traditional encoding was the default, then there is a very good chance that this determination will work out right, and the displayed names will look right. But the most certain way to get it right is, of course, not to leave the system to guess at all, but simply to use UTF-8 from the beginning.
  3. If you check files out of Version Control onto your computer. As above, if Subversion is the Version Control tool in use, it will detect the encoding setting for you and transcode the names into that encoding, so that they look right. This means that project members using Subversion can actually cooperate successfully even if they have different encodings set (but see the next paragraph for a critical exception to this freedom). CVS, however, provides no such liberty: because it does not transcode the bits that make up the name in any way, if two users are set for different encodings, the resulting names will be garbled. For this reason, it is crucial for projects that use CVS to agree on, and stick to, a single encoding for all their file and directory names.
  4. Finally, many of the files you commit to your project may actually be web pages, and these web pages will have URLs in them that refer to other web pages--other files that you also commit into your project. This is the fourth area of interface. It is important to understand, at this point, that no Version Control system will transcode the characters within your files. Subversion fixes the encoding of the names of the files, but it does not change the encoding of the contents of the files. This means that the collection of bits that make up the URLs inside your files have to match the collection of bits that make up the file names on disk, in the area from which CollabNet serves up the pages. As mentioned above, CollabNet serves up pages in UTF-8 , and so the URLs that refer to your web pages must be in UTF-8 as well. But when you build a web site, you ordinarily test it locally, on your own machine, before you publish it. That means that the collection of bits that make up the URLs in your pages must also match the collection of bits that makes up the file names on your own computer: you must be configured to use UTF-8 . Similar things can happen with other kinds of files. For example, most programming languages have some sort of mechanism for logically "including" one file into another; these file references must also match bit-for-bit, and so the encoding inside the files must match the encoding of the file names. Even with Subversion, all project members who work with such files must use the same encoding setting, and it must be UTF-8 .

Now, what if you have some need to work in a traditional encoding, instead of UTF-8 ? Most likely, this would be because you already have files named in the traditional encoding.

The easiest way to deal with this possibility is to stick to the 96 characters of the basic ASCII alphabet (English letters, digits, and a few punctuation marks). All of the encodings discussed here use the same collection of bits to represent all of these characters, and so it does not matter which encoding you have set, so long as you stick to these. Of course, this is severely limiting: you can only represent English in this way. Still, until the advent of Unicode, this was the only way to achieve multi-lingual systems.

If that is too restrictive (and it probably is), and yet use of UTF-8 is still not practical for you, then you can achieve much of what you need by very carefully ensuring that all your users have the same encoding set. This may not be as difficult as it sounds: when you buy a computer configured for a particular country or language, that configuration includes some encoding suitable for that language, and probably one of the ones discussed here. If all your users use the same operating system, that may be enough. If you mix operating systems (or even operating system versions), you may find some users get or produce garbled names and text; these users probably need to change their settings to use the project's encoding. The most important thing that cannot be solved this way is the problem of URLs, mentioned above. This is why it is standard practice, throughout the web, to use ASCII names for the files and directories of a web site, even when the site pages are all in some language that cannot be represented in ASCII. With ASCII file and directory names, and one of the traditional encodings discussed here, you can still provide ASCII URLs and local-language pages.