Common Production Failures Encountered at BW / BI Production Support
1
Transactional RFC Error(trfc) – Non Updated IDOCs in the Source System.
1.1 Why
does the error occur?
• tRFC – Transact Remote Function Call Error, occurs
whenever LUW’s (Logical Unit of
Work’s) are not transferred from the source system to the
destination system.
1.2
What happens when this error occur?
• Message appears in the bottom of the “Status” tab in RSMO.
The error message would
appear like “tRFC Error in Source System” or “tRFC Error in Data
Warehouse” or simply
“tRFC Error” depending on the system from where data is being
extracted.
• Sometimes IDOC are also stuck on R/3 side as there were no
processors available to
process them.
1.3
What can be the possible actions to be carried out?
• Once this error is encountered, we could try to Click a
complete Refresh “F6” in RSMO,
and check if the LUW’s get cleared manually by the system.
• If after “couple” of Refresh, the error is as it is, then
follow the below steps quickly as it
may happen that the load may fail with a short dump.
• Go to the menu Environment -> Transact. RFC -> In
the Source System, from RSMO. It
asks to login into the source system.
• Once
logged in, it will give a selection screen with “Date”, “User Name”, TRFC
options.
• On
execution with “F8” it will give the list of all Stuck LUW’s. The “Status Text”
will appear Red for
the Stuck LUW’s which are
not getting processed. And the “Target System” for those LUWs should
be “WP1CL015”, that’s the
Bose BW Production system. Do not execute any other IDOC which is
not related have the “Target
System” as “WP1CL015”.
• Right
Click and “Execute” or “F6” after selection, those LUW’s which are identified
properly. So that
they get cleared, and the
load on BW side gets completed successfully.
• When
IDocs are stuck go to R/3, use Tcode BD87 and expand ‘IDOC in inbound
Processing’ tab for
IDOC Status type as 64
(IDoc ready to be transferred to application). Keep the cursor on the error
message (pertaining to
IDOC type RSRQST only) and click Process tab (F8) . This will push any
stuck Idoc on R/3.
• Monitor
the load for successful completion, and complete the further loads if any in
the Process
Chain.
2 Time
Stamp Error.
2.1 Why
the error does occur?
• The “Time
Stamp” Error occurs when the Transfer Rules/Structure (TR/TS) are internally
inactive in
the system.
• They can
also occur whenever the DataSources are changed on the R/3 side or the
DataMarts are
changed in BW side. In
that case, the Transfer Rules (TR) is showing active status when checked.
But they are actually not,
it happens because the time stamp between the DataSource and the
Transfer Rules are
different.
2.2
What happens when this error occur?
• The
message appears in the Job Overview in RSMO, or in “Display Message” option of
the Process
in the PC.
• Check the
Transfer Rules in RSA1, Administrator Workbench.
2.3
What can be the possible actions to be carried out?
• Whenever
we get such an error, we first need to check the Transfer Rules (TR) in the
Administrator
Workbench. Check each rule
if they are inactive. If so then Activate the same.
• You need
to first replicate the relevant data source, by right click on the source
system of D/s ->
Replicate Datasources.
• During
such occasions, we can execute the following ABAP Report Program
“RS_TRANSTRU_ACTIVATE_ALL”.
It asks for Source System Name, InfoSource Name, and 2
check boxes. For
activating only those TR/TS which are set by some lock, we can check the option
for “LOCK”. For activating
only those TR/TS which are Inactive, we check for the option for “Only
Inactive”.
• Once
executed it will activate the TR/TS again within that particular InfoSource
even though they are
already active.
• Now
re-trigger the InfoPackage again.
• Monitor
the load for successful completion, and complete the further loads if any in
the Process
Chain.
3 Error
occurred due to Short Dump.
3.1 Why
does the error occur?
• Whenever a Job fails with an error “Time Out” it means
that the job has been stopped
due to some reason, and the request is still in yellow state. And
as a result of the same
it resulted in Time Out error. It will lead to a short dump in the
system. Either in R/3 or
in BW.
• Short dump may also occur if there is some mismatch in the
type of incoming data. For
example say date field is not in the format which is specified in
BW, then it may happen
that instead of giving an error it may give a short dump. Every
time we trigger the load.
3.2
What happens when this error occur?
• We would get a Time Out Error after the time which is
specified in the Infopackage ->
Time Out settings (which may or may not be same for all
InfoPackages). But by that
time in between, we may get a short dump in the BW system or in
the Source System
R/3.
• The message appears in the Job Overview in RSMO, or in “Display
Message” option of
the Process in the PC.
3.3
What can be the possible actions to be carried out?
• Usually “Time
Out” Error results in a Short Dump. In order to check the Short Dump we go to
the
following, Environment
-> Short Dump -> In the Data Warehouse / -> In the Source System.
• Alternatively
we can check the Transaction ST22, in the Source System / BW system. And then
choose the relevant option
to check the short dump for the specific date and time. Here when we
check the short dump, make
sure we go through the complete analysis of the short dump in detail
before taking any actions.
• In case
of Time Out Error, Check whether the time out occurred after the extraction or
not. It may
happen that the data was
extracted completely and then there was a short dump occurred. Then
nothing needs to be done.
• In order
to check whether the extraction was done completely or not, we can check the “Extraction”
in the “Details” tab in
the Job Overview. Where in we can conclude whether the extraction was done
or not. If it is a “full
load” from R/3 then we can also check the no. of records in RSA3 in R/3 and
check if the same no of
records are loaded in BW.
• In the
short dump we may find that there is a Runtime Error,
"CALL_FUNCTION_SEND_ERROR"
which occurred due to Time
Out in R/3 side.
• In such
cases following could be done.
• If the
data was extracted completely, then change the QM status from yellow to green.
If “CUBE” is
getting loaded then create
indexes, for ODS activate the request.
• If the
data was not extracted completely, then change the QM status from yellow to
red. Re-trigger
the load and monitor the
same.
• Monitor
the load for successful completion, and complete the further loads if any in
the Process
Chain.
4 Job
Cancellation in R/3 Source System.
4.1 Why
does the error occur?
• If the
job in R/3 system cancels due to some reasons, then this error is encountered.
This may be
due to some problem in the
system. Some times it may also be due to some other jobs running in
parallel which takes up
all the Processors and the jobs gets cancelled on R/3 side.
• The error
may or may not be resulted due to Time Out. It may happen that there would be
some
system hardware problem
due to which these errors could occur.
4.2 What
happens when this error occurs?
• The Exact
Error message is "Job termination in source system". The exact error
message may also
differ, it may be “The
background job for data selection in the source system has been terminated”.
Both the error messages
mean the same. Some times it may also give “Job Termination due to
System Shutdown”.
• The
message appears in the Job Overview in RSMO, or in “Display Message” option of
the Process
in the PC.
4.3
What can be the possible actions to be carried out?
• Firstly
we check the job status in the Source System. It can be checked through
Environment -> Job
Overview -> In the
Source System. This may ask you to login to the source system R/3. Once logged
in it will have some
pre-entered selections, check if they are relevant, and then Execute. This will
show you the exact status
of the job. It should show “X” under Canceled.
• The job
name generally starts with “BIREQU_” followed by system generated number.
• Once we
are confirm that this error has occurred due to job cancellation, we then check
the status of
the ODS, Cube under the
manage tab. The latest request would be showing the QM status as Red.
• We need
to re-trigger the load again in such cases as the job is no longer active and
it is cancelled.
We re-trigger the load
from BW.
• We first
delete the Red request from the manage tab of the InfoProvider and then
re-trigger the
InfoPackage.
• Monitor
the load for successful completion, and complete the further loads if any in
the Process
Chain.
5
Incorrect data in PSA.
5.1 Why
the error does occur?
• It may
happen some times that the incoming data to BW is having some incorrect format,
or few
records have few incorrect
entries. For example, expected value was in upper case and data is in
lower case or if the data
was expected in numeric form, but the same was provided in Alpha
Numeric.
• The data
load may be a Flat File load or it may be from R/3. Mostly it may seem that the
Flat File
provided by the users may
have incorrect format.
5.2
What happens when this error occur?
• The error
message will appear in the job overview and will guide you what exactly we need
to do for
the error occurred.
The message on the bottom
of the “Header” tab of the Job Overview in RSMO will have “PSA Pflege”
written on it, which will
give u direct link to the PSA data
5.3
What can be the possible actions to be carried out?
• Once
confirmed with the error, we go ahead and check the “Detail” tab of the Job
Overview to check
which Record, field and
what in the data has the error.
• Once we
make sure from the Extraction, in the Details tab in the Job Overview that the
data was
completely extracted, we
can actually see here, which record, which field, has the erroneous data.
Here we can also check the
validity of the data with the previous successful load PSA data.
• When we
check the data in the PSA, it will show the record with error with traffic
signal as “Red”. In
order to change data in
PSA, we need to have the request deleted from Manage Tab of the
InfoProvider first, only
then it will allow to change the data in PSA.
• Once the
change in the specific field entry in the record in PSA is done, we then save
it. Once data
in PSA is changed. We then
again reconstruct the same request from the manage tab. Before we
could reconstruct the
request, it needs to have QM status as “Green”.
• This will
update the records again which are present in the request
• Monitor
the load for successful completion, and complete the further loads if any in
the Process
Chain.
6 ODS
Activation Failed.
6.1 Why
does the error occur?
• During data
load in ODS, It may happen sometimes that the data gets extracted and loaded
completely, but then at
the time of the ODS activation it may fail giving status 9 error.
• Or due to
lack of resources, or cause of an existing failed request in the ODS. For
Master Data it is
fine if we have an
existing failed request.
• This
happens as there are Roll back Segment errors in Oracle Database and gives an
error ORA-
00060. When activation of
data takes place data is read in Active data table and then either Inserted
or Updated. While doing
this there are system dead locks and Oracle is unable to extend the extents.
6.2
What happens when this error occur?
• The exact
error message would be like “Request
REQU_3ZGI6LEA5MSAHIROA4QUTCOP8, data
package 000012 incorrect with status 9 in RSODSACTREQ”. Some times
it may accompany with
“Communication error (RFC call) occurred” error. It is actually
due to some system error.
• The
message appears in the Job Overview in RSMO, or in “Display Message” option of
the Process
in the PC.
• The exact
error message is “ODS Activation Failed”.
6.3
What can be the possible actions to be carried out?
• Whenever
such error occurs the data is may or may not be completely loaded. It is only
while
activation it fails. Hence
when we see the details of the job, we can actually see which data package
failed during activation.
• We can
once again try to manually Activate the ODS, here do not change the QM status
as in
Monitor its green but
within the Data Target it red. Once the data is activated QM status turns into
Green .
• For
successful activation of the failed request, click on the “Activate” button at
the bottom, which will
open another window which
will only have the request which is/are not activated. Select the request
and then check the
corresponding options on the bottom. And then Click on “Start”
• This will
set a background job for activation of the selected request.
• Monitor
the load for successful completion, and complete the further loads if any in
the Process
Chain.
• In case
the above does not work out, we check the size of the Data Package specified in
the
InfoPackage. In
InfoPackage -> Scheduler -> DataS. Default Data Transfer. Here we can set
the size
of the Data Package. Here
we need to “reduce” the maximum size of the data package. So that
activation takes place
successfully.
• Once the
size of the Data Package is reduced we again re trigger the load and reload the
complete
data again.
• Before
starting the manual activation, it is very important to check if there was an existing
failed
“Red” Request. If so make
sure you delete the same before starting the manual activation.
• This
error is encountered at the first place and then rectified as at that point in
time system is not
able to process the
activation process via 4 different Parallel processes. This parameter is set in
RSCUSTA2 transaction.
Later on the resources are free so the activation completes successfully.
7
Caller 70 is missing.
7.1 Why
does the error occur?
• This
error normally occurs whenever BW encounters error and is not able to classify
them. There
could be multiple reasons
for the same
o Whenever we are loading
the Master Data for the first time, it creates SID’s. If system is
unable to create SID’s for
the records in the Data packet, we can get this error message.
o If the Indexes of the cube
are not deleted, then it may happen that the system may give the
caller 70 error.
o Whenever we are trying to
load the Transactional data which has master data as one of the
Characteristics and the
value does not exist in Master Data table we get this error. System
can have difficultly in
creating SID’s for the Master Data and also load the transactional data.
o If ODS activation is
taking place and at the same time there is another ODS activation
running parallel then in
that case it may happen that the system may classify the error as
caller 70. As there were
no processes free for that ODS Activation.
o It also occurs whenever
there is a Read/Write occurring in the Active Data Table of ODS.
For example if activation
is happening for an ODS and at the same time the data loading is
also taking place to the
same ODS, then system may classify the error as caller 70.
o It is a system error which
can be seen under the “Status” tab in the Job over View.
7.2
What happens when this error occur?
• The exact
error message is “System response "Caller 70" is missing”.
• It may
happen that it may also log a short dump in the system. It can be checked at
"Environment ->
Short dump -> In the
Data Warehouse".
7.3
What can be the possible actions to be carried out?
• If the
Master Data is getting loaded for the first time then in that case we can
reduce the Data
Package size and load the
Info Package. Processing sometimes is based on the size of Data
Package. Hence we can
reduce the data package size and then reload the data again. We can also
try to split the data load
into different data loads
• If the
error occurs in the cube load then we can try to delete the indexes of the cube
and then reload
the data again.
• If we are
trying to load the Transactional and Master Data together and this error occurs
then we can
reduce the size of the
Data Package and try reloading, as system may be finding it difficult to create
SID’s and load data at the
same time. Or we can load the Master Data first and then load
Transactional Data
• If the
error is happening while ODS activation cause of no processes free, or
available for processing
the ODS activation, then
we can define processes in the T Code RSCUSTA2.
• If error
is occurring due to Read/Write in ODS then we need to make changes in the
schedule time of
the data loading.
• Once we
are sure that the data has not been extracted completely, we can then go ahead
and delete
the red request from the
manage tab in the InfoProvider. Re-trigger the InfoPackage again.
• Monitor
the load for successful completion, and complete the further loads if any in
the Process
Chain.
8
Attribute Change Run Failed – ALEREMOTE was locked.
8.1 Why
does the error occur?
• During
Master Data loads, some times a lock is set by system user ALEREMOTE.
• This
normally occurs when HACR is running for some other MD load, and system tries
to carry out
HACR for this new MD. This
is a scheduling problem.
8.2
What happens when this error occur?
• The
message appears in the Job Overview in RSMO, or in “Display Message” option of
the Process
in the PC.
• The exact
error message would be like, “User ALEREMOTE locked the load of master data for
characteristic 0CUSTOMER”.
Here it is specifically for the 0CUSTOMER load. It may be different
related to Master Data
InfoObject which is getting loaded.
8.3
What can be the possible actions to be carried out?
• Check the
error message completely and also check the long text of the error message, as
it will tell
you the exact Master Data
which is locked by user ALEREMOTE.
• The lock
which is set is because of load and HACR timing which clashed. We first need to
check
RSA1 -> Tools ->
HACR, where in we would get the list of InfoObjects on which HACR is currently
running. Once that is
finished only then, go to the TCode SM12. This will give you few options and
couple of default entries.
When we list the locks, it will display all the locks set. Delete the lock for
the
specific entry only else
it may happen that some load which was running may fail, due to the lock
released.
• Now we
choose the appropriate lock which has caused the failure, and click on Delete.
So that the
existing lock is released.
Care should be taken that we do not delete an active running job.
Preferable avoid this
solution
• When HACR
finishes for the other Master Data, trigger Attribute change run for this
Master Data.
9 SAP
R/3 Extraction Job Failed.
There are certain jobs
which are triggered in R/3 based upon events created there. These events are
triggered from SAP BW via
ABAP Program attached in Process Chains. This extract job also triggers along
with it a extract status
job. The extract status job will send the status back to BW with success,
failure. Hence
it is important that the
extract job, and the extract status job both get completed. This is done so
that on
completion of these jobs
in R/3, extraction jobs get triggered in R/3 via Info pack from BW. Error may
occur
in the extract job or in
the extract status job.
9.1
What happens when this error occur?
• The exact
error message normally can be seen in the source system where the extraction
occurs. In
BW the process for program
in the PC will fail.
• This
Process is placed before the InfoPackage triggers, hence if the extraction
program in R/3 is still
running or is not
complete, or is failed, the InfoPackage will not get triggered. Hence it
becomes very
important to monitor such
loads through RSPC rather than through RSMO.
9.2
What can be the possible actions to be carried out?
• We login
to the source system and then check the Tx Code SM37, for the status of the job
running in
R/3. Here it will show the
exact status of the running job.
• Enter the
exact job name, user, date, and choose the relevant options, then execute. It
will show a
list of the job, which is
Active with that name. You may also find another job Scheduled for the next
load, Cancelled job if
any, or previous finished job. The active job is the one which is currently
running.
• Here if
the job status for the “Delay (sec.)” is increasing instead of “Duration(sec.)”
then it means
there is some problem with
the extraction job. It is not running, and is in delay.
• It may
happen sometimes that there is no active job and there is a job which is in
finished status with
the current date/time.
• The
extract job and the status job both needs to be checked, because it may happen
that the extract
job is finished but the
extract status job has failed, as a result of which it did not send success
status
to BW. But the extraction
was complete. In such cases, we manually change the status of the Extract
Program Process in the PC
in BW to green with the help of the FM “ZRSPC_ABAP_FINISH”.
Execute the FM with the
correct name of the Program process variant and the status “F”. This will
make the Process green
triggering the further loads. Here we need to check if there is no previous
Extract Program Process is
running in the BW system. Hence we need to check the PC logs in detail
for any previous existing
process pending.
• Monitor
the PC to complete the loads successfully.
• If in
case we need to make the ABAP Process within the PC to turn “RED” and retrigger
the PC, then
we execute the FM “ZRSPC_ABAP_FINISH”
with the specific variant and Job Status as “R” – which
will turn the ABAP process
RED.
• This
usually needs to be done when the Extraction Job was cancelled in R/3 due to
some reason &
we have another job in
Released state and the BW ABAP Process is in Yellow state. We can then
make the ABAP Process RED
via the FM, and then re-trigger the PC.
10 File
not found (System Command for file check failed).
10.1
Why the error does occur?
• The
system command process is placed in a PC before the infopackage Process. Hence
it will check
for the Flat File on the
application server before the infopackage is triggered. This will ensure that
when the load starts it
has a Flat File to upload.
• It may
happen that the file is not available and the system command process fails. In
that case it will
not trigger the
InfoPackage. Hence it is very important to monitor the PC through RSPC.
10.2
What happens when this error occur?
• The error
message will turn the System Command Process in the PC “Red” and the UNIX
Script
which has failed will have
a specific return code which determines that the script has failed.
10.3
What can be the possible actions to be carried out?
• Whenever
the system command process fails it indicated that the file is not present. We
right click on
the Process and “Display
Message” we get to see the failed script. Here we need to check the return
code.Here if exit status
is –1 then failure i.e. Process becomes Red, else it becomes Green in PC.
• We need
to check the script carefully for the above mentioned exit status. And then
only conclude
that the file was really
not available.
• Once
confirmed that the file is not available we need to take appropriate actions.
• We need
to identify the person who is responsible for FTPing the file on the
Application server. A
mail already goes to the responsible
person, via the error message in the Process. But we also need
to send a mail, regarding
the same.
• The
Process Chains which are having the system command Process in them, and the
corresponding
actions to be taken.
11
Table space issue.
11.1 Why
does the error occur?
• Many a
times, particularly with respect to HACR while the Program is doing realignment
of
aggregates it needs lot of
temporary table space [PSATEMP]. If there is a large amount of data to be
processed and if Oracle is
not able to extend the table space it gives a dump.
• This
normally happens if there are many aggregates created on the same day or there
is a large
change in the incoming
Master data / Hierarchy, so that large amount of temporary memory is
needed to perform the
realignment.
• Also
whenever the PSAPODS (Which houses the many tables) is full, the data load /
ODS Activation
stops and hence we may get
failures.
11.2
What happens when this error occur?
• The Error
ORA - 01653 and ORA - 01688 – Relates to issues with table space. It will give
error as
the ORA number which asks
to increase the table space.
11.3
What can be the possible actions to be carried out?
• In case
the table space is full then we need to contact the Basis and accordingly ask for
a increase in
the size of the table
space.
• The
increase of the table space is done by changing some parameters allocating more
space which
is defined for individual
tables.
12 How
is it possible to restart a process chain at a failed step/request?
Sometimes, it doesn't help
to just set a request to green status in order to run the process chain from
that
step on to the end.
You need to set the failed
request/step to green in the database as well as you need to raise the event
that
will force the process
chain to run to the end from the next request/step on.
Therefore you need to open
the messages of a failed step by right clicking on it and selecting 'display
messages'.
In the opened popup click
on the tab 'Chain'.
In a parallel session goto
transaction se16 for table rspcprocesslog and display the entries with the
following
selections:
1. copy the variant from
the popup to the variante of table rspcprocesslog
2. copy the instance from
the popup to the instance of table rspcprocesslog
3. copy the start date
from the popup to the batchdate of table rspcprocesslog
Press F8 to display the
entries of table rspcprocesslog.
Now open another session
and goto transaction se37. Enter RSPC_PROCESS_FINISH as the name of the
function module and run
the fm in test mode.
Now copy the entries of
table rspcprocesslog to the input parameters of the function module like
described
as follows:
1. rspcprocesslog-log_id
-> i_logid
2. rspcprocesslog-type
-> i_type
3. rspcprocesslog-variante
-> i_variant
4. rspcprocesslog-instance
-> i_instance
5. enter 'G' for parameter
i_state (sets the status to green).
Now press F8 to run the
fm.
Now the actual process
will be set to green and the following process in the chain will be started and
the
chain can run to the end.
ABAP
PROGRAM:
*&---------------------------------------------------------------------*
*& Report ZRSPC_PROCESS_FINISH *
*& *
*&---------------------------------------------------------------------*
************************************************************************
* Author: Jesper Christensen
* Date: Mar 22nd 2006
* Type: Executable Program
* Purpose/Description : Restart process
chain after a failed request
*
************************************************************************
* MODIFICATION LOG
************************************************************************
* Date | Change Number | Initials |
Description
************************************************************************
* 03/22/06 JMCHRIS Program created
*
*
************************************************************************
REPORT zrspc_process_finish .
PARAMETERS: VARIANT TYPE rspc_variant
OBLIGATORY,
INSTANCE TYPE rspc_instance OBLIGATORY,
DATE TYPE SY-DATUM OBLIGATORY,
state TYPE rspc_state OBLIGATORY default
'G'.
DATA : logid TYPE rspc_logid,
chain TYPE rspc_chain,
type TYPE rspc_type,
p_vari TYPE rspc_variant,
instan TYPE rspc_instance,
jobcount TYPE btcjobcnt,
batchdat TYPE btcreldt,
batchtim TYPE btcreltm.
DATA: LS_PCLOG LIKE RSPCPROCESSLOG.
* select the process log
SELECT SINGLE * FROM RSPCPROCESSLOG INTO
LS_PCLOG
where variante = variant
and instance = instance
and batchdate = date.
if sy-subrc = 0.
* Set the status
CALL FUNCTION 'RSPC_PROCESS_FINISH'
EXPORTING
i_logid = LS_PCLOG-log_id
* i_chain = LS_PCLOG-chain
i_type = LS_PCLOG-type
i_variant = LS_PCLOG-variante
i_instance = LS_PCLOG-instance
i_state = state
* i_job_count = jobcount
i_batchdate = LS_PCLOG-batchdate
* i_batchtime = batchtim
EXCEPTIONS
error_message = 1.
IF sy-subrc <> 0.
MESSAGE ID sy-msgid TYPE 'I' NUMBER sy-msgno
WITH sy-msgv1 sy-msgv2 sy-msgv3 sy-msgv4.
ENDIF.
else.
message E000(YBW_USR_MON) with
'Process selected does not exist ' ' - Check
you entry'.
endif.